Deep Dive: Statistical Concepts in AI
In this 2-day workshop, we will break down the 6 foundational statistical principals used in almost all AI applications. We will introduce the theories, go over how the formulas are derived, and finally code them in Python. No background in Python is required. This workshop is designed for TKS students only.
Agenda:
From Variance to Regression
Learn how to statistically describe your individual variables and assess the relationship between multiple variables. Understanding how variance and covariance are calculated will help you understand how weights are assigned to your features when you run a regression model. Learning the basics of regression will help you grasp the more complex machine learning models, such as neural networks.
Summary statistics
Variance , covariance and correlation calculations
Regression
Python Application: Correlation visualization using heatmaps
Probability and Hypothesis Testing
Probability is a fundamental concept in machine learning and data science. In this section, we will go over the different types of probability distributions and demonstrate when they are used. We will also break-down how A/B tests are executed and properly interpreted. You will gain knowledge in understanding what p-values are and how they are used to assess significance of any statistical test.
Probability distributions
Hypothesis testing
P-values
A/B testing
T-test, Chisq test and Shapiro wilk test
Python Application: A/B test implementation
Dealing With Missing Data
Any data you will be working with will contain gaps. In this section, we will cover the various statistical techniques that are used to deal with missing data, including data imputation.
Outlier detection
Missing data imputation.
Python Application: Missing Data Imputation
Dimensionality Reduction
Data might contain thousands of features that are redundant. Running prediction on the full dataset is not computationally efficient. In statistics, machine learning, and information theory, dimensionality reduction is the process of reducing the number of features in your dataset by obtaining a set of principal features. Principal component analysis (PCA) is a very popular and powerful dimensionality reduction method used in data science. In this section, we will break down how PCA works and implement the analysis in python.
PCA
Python Application: PCA Implementation
Model Tuning and Performance Assessment
Fitting a prediction model might be an easy task, the challenge is to assess its performance and identify the set of parameters that maximize it. In this section we will cover methods that are used to tune and assess the quality of your fitted models.
Cost functions
Gradient descent
Numeric modelling assessment
Classification modelling assessment
AUC
Python Application: AUC calculation and visualization
Bayesian Statistics
Bayes’ theorem is considered to be one of most important theorems in the field of mathematical statistics and probability theory. Bayesian Inference is widely used by data scientists in various fields and applications. In this section, we will walk through Bayes’ theorem derivation and application in developing a spelling collector.
Bayes’ theorem
Python Application: Spelling corrector
Date & Location
Date: Sunday February 3rd 2019
Time: 10am-5pm
Level: TKS Students Only
Location: DMZ Sandbox