From Data Analytics to Artificial Intelligence
As data generation has increased rapidly, statistical knowledge and machine learning have become critical assets in any career, business, or academic program.
In this 4 day workshop, we will break down popular and complex methods in artificial intelligence into simple concepts using a variety of hands-on exercises. We will first lay the foundation by covering the basics of data analytics and summary statistics.
Using a selected dataset, we will build a classifier using logistic regression, and then work our way up to building models using random forests and neural networks. At each step, we will walk through the R code used and teach you how to interpret the generated output.
During the last day, we will have successful Data Scientists speaking to you about their careers and the academic paths that led them to where they are now. All of the speakers have worked in both health research and industry. The theme of the talks will be the transferrable skills in Data Science. They will share with you the core skills, they found essential, in both of these fields.
Schedule:
Day 1 - Saturday, April 28 2018
Data Acquisition
- Types of big data
- Data sources: We will go over a list of data repositories that provides access to free data
- Workshop dataset overview
Programming in R
- A quick R programming overview
Data Analytics
- Data Distributions
- Summary Statistics
- Correlations
- P-values
Day 2 - Sunday, April 29 2018
Data Cleaning
- Feature engineering and data restructuring
- Dealing with missing data (imputations)
- Dealing with highly correlated features
- Dimension reduction and univariate analysis
Regression
- Linear Regression
- Logistic Regression
Machine Learning and Prediction I
- Training and validation
- Random Forest
Day 3 - Saturday, May 5 2018
Assessing Model Performance
- Calculating AUC, accuracy and other performance metrics
- How to use plots to visualize performance
Machine Learning and Prediction II
- Deep Learning and Neural Networks
- Cross Validation
- Dealing with imbalanced data
Day 4 - Sunday, May 6 2018
Careers and Networking
- Presentations by students
- Guest data scientists and Networking
- Wrap up and survey
NEW: Microsoft Student Partners will be speaking to you about Microsoft Cognitive Services and its AI applications.
Pre-Workshop Prep:
Please bring your laptop to this workshop. You will need to install the following two tools on the laptop prior to the workshop day:
R
- Go to https://cran.r-project.org/ and download R version 3.4.3 for your operating system (Windows: https://cran.r-project.org/bin/windows/base/R-3.4.3-win.exe; Mac: https://cran.r-project.org/bin/macosx/R-3.4.3.pkg)
- Follow the instructions in the video demo below
RStudio
- Go to https://www.rstudio.com/products/rstudio/download/ and download the RStudio Desktop installer for your operating system (Windows: https://download1.rstudio.org/RStudio-1.1.383.exe; Mac: https://download1.rstudio.org/RStudio-1.1.383.dmg)
- Follow the instructions in the video demo below to install RStudio. You must install R before you install RStudio.
The following videos provide step by step instructions on how to install the above tools:
- Installation demo for Mac laptops: https://goo.gl/Xp6U1f
- Installation demo for Windows laptops: https://goo.gl/A12b2Y
- Installation demo for Linux laptops: https://goo.gl/UKbKiu
Pre-Readings
- 50 Years of Data Science by David Donoho
- Module 0, Introduction to Coding in R
- both of the above documents can be found here: https://bit.ly/2JsmrtC
Datasets and Slides
- Dataset and slides will be posted here: https://bit.ly/2JsmrtC
Notebooks
Slack: https://bit.ly/2GHDier
Dataset Repos: https://bit.ly/2vHjItv
Shiny App: https://thecodinghive.shinyapps.io/cancer/
If you have any questions or if you have any problems with the installations, please feel free to email us at contact@thecodinghive.com and we will be more than happy to help you troubleshoot.
Date & Location
Date: April 28, April 29, May 5, and May 6 2018
Time: 9am-12pm
Level: Advanced
Location: McKinsey Experience Studio
Guest Speakers:
Dr. Leila Pishdad, Senior Data Scientist, Royal Bank of Canada
Dr. Pishdad received her Ph.D. from McGill University in Electrical Engineering. Her research was on using statistical signal processing, Bayesian inference and machine learning for indoor localization and positioning. She worked as a research scientist at Elekta LTD in Montreal for two years where her main focus was on the prediction and estimation of tumor location during radiation therapy sessions. She then joined a startup as a research and development engineer, and as of last year she has been working as a senior data scientist at Royal Bank of Canada. Her main areas of expertise are Bayesian inference and machine learning.
Hossein Sarshar, Data Scientist & AI Architect, Microsoft.
Hossein is a Data Scientist and AI Architect at Microsoft Canada which helps teams build their data science solutions on Microsoft Azure. Prior to his three-year data science journey, he had been a full-stack software engineer for almost a decade. Hossein Holds an MSC in Computer Science with focus on Machine Learning.
Dr. Geoffrey Hunter, AI Lead, TribalScale
Geoffrey Hunter leads the Artificial Intelligence and Machine Learning practice at TribalScale -- an innovation firm that creates digital products for web, mobile, and emerging platforms. At TribalScale, Geoffrey leverages the latest machine-intelligence enabled technologies to solve business problems and steer future AI initiatives for organizations.
Geoffrey has a combined seven years of data science and consulting experience for clients spanning the public sector, healthcare, finance, and media. Before joining TribalScale, Geoffrey led Deloitte’s design and implementation of end-to-end data science solutions. Geoffrey served as a subject matter expert and thought leader in data science, cognitive technologies, and robotic process automation at Deloitte and at Widgets and Digits: Data Science Consultants. Prior to consulting, he was a cancer researcher at the Ontario Institute for Cancer Research where he used machine learning to improve the prognosis of cancer patients.
Geoffrey holds a PhD and MS in Mathematical Physiology from the University of Utah and a B.Math in Applied Math from the University of Waterloo.
Erik Drysdale, Data Scientist, Ontario Institute for Cancer Research
Erik started his career as an Economist with the Bank of the Canada after obtaining his Bachelor's and Master's in Economics. He worked in Ottawa and Vancouver and was focused on analyzing and forecasting the Canadian housing market. While still enjoying his work, Erik's academic interests began to expand towards machine learning and topics in Biostatistics. He went back to university and obtained an MSc in Statistics from Queen's University to be able to transition to biomedical research. He currently works as a Bioinformatician/Data-Scientist in a genomics lab focused on cancer research in Toronto.
Erik research interests are focused on the intersection of statistics and machine learning as well as survival modelling. He has been coding in R, Python, and Matlab for more than five years and is passionate about reproducible research and the tidyverse ecosystem.
Workshop Instructors:
Fouad Yousif, Data Scientist
Fouad completed his bachelor degree in science from McMaster University. During his final year, he took a course in computational biology that sparked his interest in learning programming and using it to solve biological questions. To continue the development of his programming skills after graduation, he enrolled in Seneca College and completed a certificate in Bioinformatics.
Since then, he has obtained a Masters in Biostatistics from the University of Toronto, worked as a data scientist dealing with big sequencing data in cancer research, and been part of many high impact cancer research publications and computational developments that aim to improve cancer prognosis and enhance patients' quality of life.
Fouad is also a very active member in the teaching community. He has taught a variety of bioinformatics workshops in Toronto, New York, Montreal and Brazil that helped scientists learn programming and data analysis skills. In addition, he has extensive teaching experience through teaching statistics courses at Seneca College and tutoring high school students in the subjects of mathematics and statistics.
Fouad believes that one course CAN change a career, as it did for him. He is very excited to share his knowledge and passion with all the students joining The Coding Hive.
Cindy Yao, Data Scientist
Cindy lived in Shanghai and Vancouver prior to settling in Toronto for university and work. She obtained a Honours Bachelor’s Degree in Human Biology at the University of Toronto. During her final year of university, she became fascinated by the concept of combining computational work with understanding genomics data.
She went on to pursue her Master's degree at the Department of Medical Biophysics at the University of Toronto. She is now working as a data scientist in a research institute, developing biomarkers that improve cancer prognosis. She has over 6 years of experience working with R and big data visualization. She is excited to be joining The Coding Hive and to share some of her experiences and insights with the students.
Veronica Craine, Data Scientist
Veronica has always been fascinated by statistical and mathematical tools and techniques which can be used to solve numerous real life problems. After obtaining a M.A. in Applied Statistics from York University, it was a summer semester studying in Finland amongst interdisciplinary professionals that sparked her interest in biology. Afterwards she completed a M.Sc. in Biostatistics from the University of Victoria, with research focusing on applications to cancer patients.
She continued her research at OICR and was trained in bioinformatics techniques. Since 2013, Veronica has contributed to a multitude of genomic megaprojects, to developing in-house R software packages, and to creating biomarkers using machine learning techniques.
Veronica enjoys teaching and over the years has participated in running mathematics workshops for elementary school students, tutored math and stats to high school students, and was a teaching assistant and guest lecturer in University. She continues to provide statistical support as well as mentorship to new hires and co-ops at OICR.
Veronica is excited to inspire and be inspired at The Coding Hive.
Katie Houlahan, PhD Student, University of Toronto, Ontario Institute for Cancer Research
Katie started her training as a chemist earning an Honours Bachelor’s Degree in Chemical Biology at McMaster University. Early on in her degree she was introduced to the world of computational biology; first through an internship at the Ontario Institute for Cancer Research where she worked on an international collaboration to evaluate machine learning methods to detect cancer causing mutations. Excited by the application of machine learning in healthcare, she next worked on methods to predict cancer drug sensitivity alongside experts at the University of California, Santa Cruz and the Oregon Health & Science University.
Now, Katie is a PhD student in the Department of Medical Biophysics at the University of Toronto studying the genetic underpinnings of prostate cancer. She is both a Vanier Scholar as well as Vector Institute Postgraduate Affiliate and is excited to share her journey and experience.