Machine Learning Training
Machine Learning Internals
- Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. This trainig session provides a deep dive into machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
- Measuring and Tuning performance of ML algorithms
- You'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems
- Most effective machine learning techniques
- You will learn how to Prototype and then productionize
- Best practices in innovation as it pertains to machine learning and AI
- Experience in Programming
- An understanding of Intro to Statistics would be helpful.
- A familiarity with Probability Theory, Calculus, Linear Algebra and Statistics is required
- Model selection
- Supervised learning
- Discovering graph structure
- Types of machine learning
- Machine learning: what and why?
- Parametric vs non-parametric models
- No free lunch theorem
- Linear regression
- Some basic concepts in machine learning
- Discovering clusters
- Classification
- Regression
- Matrix completion
- Logistic regression
- Parametric models for classification and regression
- The curse of dimensionality
- Overfitting
- Unsupervised learning
- Discovering latent factors
- A simple non-parametric classifier: K-nearest neighbors
- Predictive Data Analytics Tools
- How Does Machine Learning Work?
- The Road Ahead
- What Can Go Wrong with Machine Learning?
- The Predictive Data Analytics Project Lifecycle: CRISP-DM
- What Is Machine Learning?
- What Is Predictive Data Analytics?
- Different Types of Data
- Different Types of Features
- Designing the Analytics Base Table
- Designing and Implementing Features
- Assessing Feasibility
- Converting Business Problems into Analytics Solutions
- Case Study: Motor Insurance Fraudmotor
- Implementing Features
- Handling Time
- Outliers
- Handling Missing Values
- Handling Outliers
- Missing Values
- Irregular Cardinality
- Handling Data Quality Issues
- The Data Quality Report
- The Normal Distribution
- Identifying Data Quality Issues
- Getting to Know the Data
- Advanced Data Exploration
- Measuring Covariance and Correlation
- Visualizing Relationships Between Features
- Binning
- Data Preparation
- Normalization
- Shannon’s Entropy Model
- Handling Continuous Descriptive Features
- Decision Trees
- Predicting Continuous Targets
- Extensions and Variations
- Fundamentals
- Information Gain
- Big Idea
- Standard Approach: The ID Algorithm
- Tree Pruning
- Alternative Feature Selection and Impurity Metrics
- Standard Approach: The Nearest Neighbor Algorithm
- Predicting Continuous Targets
- Fundamentals
- Other Measures of Similarity
- Extensions and Variations
- Data Normalization
- Feature Space
- Big Idea
- Measuring Similarity Using Distance Metrics
- Feature Selection
- Handling Noisy Data
- Efficient Memory Search
- Big Idea
- Smoothing
- Extensions and Variations
- Bayes’ Theorem
- Bayesian Networks
- Continuous Features: Probability Density Functions
- Continuous Features: Binning
- Bayesian Prediction
- Conditional Independence and Factorization
- Fundamentals
- Standard Approach: The Naive Bayes Model
- Setting the Learning Rate Using Weight Decay
- Error Surfaces
- Multinomial Logistic Regression
- Modeling Non-linear Relationships
- Handling Categorical Descriptive Features
- Interpreting Multivariable Linear Regression Models
- Simple Linear Regression
- Big Idea
- Handling Categorical Target Features: Logistic Regression
- Extensions and Variations
- Fundamentals
- Choosing Learning Rates and Initial Weights
- Standard Approach: Multivariable Linear Regression with Gradient Descent
- Gradient Descent
- Multivariable Linear Regression
- Measuring Error
- Performance Measures: Prediction Scores
- Designing Evaluation Experiments
- Evaluating Models after Deployment
- Performance Measures: Multinomial Targets
- Extensions and Variations
- Fundamentals
- Performance Measures: Continuous Targets
- Performance Measures: Categorical Targets
- Big Idea
- Standard Approach: Misclassification Rate on a Hold-out Test Set
- Matlab
- H2O
- Spark ML/Mlib
- Octave
- Regularization effects of big data
- Bayesian inference when ?^2 is unknown *
- Model specification
- Numerically stable computation *
- Computing the posterior
- Geometric interpretation
- Convexity
- Connection with PCA *
- Maximum likelihood estimation (least squares)
- Bayesian linear regression
- Derivation of the MLE
- Computing the posterior predictive
- EB for linear regression (evidence procedure)
- Ridge regression
- Basic idea
- Robust linear regression *
- Introduction
- Residual analysis (outlier detection) *
- Generative vs discriminative classifier
- Multi-class logistic regression
- Online learning and regret minimization
- Iteratively reweighted least squares (IRLS)
- Quasi-Newton (variable metric) methods
- Newton’s method
- Bayesian logistic regression
- A Bayesian view
- Laplace approximation
- l2 regularization
- Gaussian approximation for logistic regression
- Approximating the posterior predictive
- Derivation of the BIC
- Steepest descent
- Introduction
- MLE
- Model specification
- Online learning and stochastic optimization
- Dealing with missing data
- Fisher’s linear discriminant analysis (FLDA) *
- Model fitting
- Stochastic optimization and risk minimization
- Pros and cons of each approach
- The LMS algorithm
- Logistic regression
- The perceptron algorithm
- Introduction
- Maximum entropy derivation of the exponential family *
- Ordinal probit regression *
- Generalized linear mixed models *
- Examples
- semi-parametric GLMMs for medical data
- The pointwise approach
- Computational issues
- Application to domain adaptation
- ML and MAP estimation
- Learning to rank *
- ML/MAP estimation using gradient-based optimization
- Multinomial probit models *
- Probit regression
- Bayesian inference
- Generalized linear models (GLMs)
- Other kinds of prior
- Basics
- The exponential family
- The pairwise approach
- Log partition function
- Loss functions for ranking
- Bayes for the exponential family *
- Definition
- Hierarchical Bayes for multi-task learning
- Application to personalized email spam filtering
- The listwise approach
- Latent variable interpretation
- MLE for the exponential family
- Multi-task learning
- Chain rule
- Markov and hidden Markov models
- Introduction
- Graph terminology
- Naive Bayes classifiers
- d-separation and the Bayes Ball algorithm (global Markov properties)
- Learning with missing and/or latent variables
- Conditional independence
- Other Markov properties of DGMs
- Inference
- Genetic linkage analysis *
- Directed Gaussian graphical models *
- Influence (decision) diagrams *
- Learning
- Graphical models
- Directed graphical models
- Markov blanket and full conditionals
- Plate notation
- Learning from complete data
- Conditional independence properties of DGMs
- Other estimation principles *
- Principal components analysis (PCA)
- Probabilistic PCA
- Using EM
- The FastICA algorithm
- FA is a low rank parameterization of an MVN
- Fitting FA models with missing data
- Unidentifiability
- Choosing the number of latent dimensions
- Partial least squares
- Singular value decomposition (SVD)
- EM for factor analysis models
- Mixtures of factor analysers
- EM algorithm for PCA
- Model selection for FA/PPCA
- Supervised PCA (latent factor regression)
- Canonical correlation analysis
- PCA for categorical data
- Maximum likelihood estimation
- PCA for paired and multi-view data
- Inference of the latent factors
- Classical PCA: statement of the theorem
- Model selection for PCA
- Independent Component Analysis (ICA)
- Factor analysis
- Smoothing kernels
- Kernels for comparing documents
- The kernel trick
- SVMs for classification
- Kernelized ridge regression
- SVMs for regression
- Linear kernels
- Kernel machines
- Introduction
- Comparison of discriminative kernel methods
- Kernelized nearest neighbor classification
- Using kernels inside GLMs
- A probabilistic interpretation of SVMs
- Kernel functions
- RBF kernels
- Mercer (positive definite) kernels
- Kernel density estimation (KDE)
- Kernel PCA
- String kernels
- LVMs, RVMs, and other sparse vector machines
- Kernels for building generative models
- Choosing C
- Kernelized K-medoids clustering
- Pyramid match kernels
- Kernel regression
- Kernels derived from probabilistic generative models
- Locally weighted regression
- Summary of key points
- Support vector machines (SVMs)
- From KDE to KNN
- Matern kernels
- Agglomerative clustering
- The Dirichlet process
- Clustering datapoints and features
- Graph Laplacian
- Evaluating the output of clustering methods *
- Dirichlet process mixture models
- Multi-view clustering
- Spectral clustering
- Applying Dirichlet processes to mixture modeling
- Biclustering
- Fitting a DP mixture model
- Choosing the number of clusters
- Measuring (dis)similarity
- Affinity propagation
- Introduction
- From finite to infinite mixture models
- Bayesian hierarchical clustering
- Normalized graph Laplacian
- Hierarchical clustering
- Divisive clusterin
- Learning image features using d convolutional DBNs
- Deep generative models
- Information retrieval using deep auto-encoders (semantic hashing)
- Deep directed networks
- Data visualization and feature discovery using deep auto-encoders
- Deep Boltzmann machines
- Applications of deep networks
- Stacked denoising auto-encoders
- Learning audio features using d convolutional DBNs
- Greedy layer-wise learning of DBNs
- Deep neural networks
- Deep belief networks
- Deep multi-layer perceptrons
- Deep auto-encoders
- Introduction
- Handwritten digit classification using DBNs