Deep & Machine Learning
Introduction
- You’ve probably heard that Deep Learning is making news across the world as one of the most promising techniques in machine learning, especially for analyzing image data. With every industry dedicating resources to unlock the deep learning potential, to be competitive, you will want to use these models in tasks such as image tagging, object recognition, speech recognition, and text analysis. In this training session you will build deep learning models using neural networks, explore what they are, what they do, and how. To remove the barrier introduced by designing, training, and tuning networks, and to be able to achieve high performance with less labeled data, you will also build deep learning classifiers tailored to your specific task using pre-trained models, which we call deep features. Also, you’ll develop a clear understanding of the motivation for deep learning, and design intelligent systems that learn from complex and/or large-scale datasets.
- Combine different types of layers and activation functions to obtain better performance
- Describe how these models can be applied in computer vision, text analytics and speech recognition
- Describe how a neural network model is represented and how it encodes non-linear features
- Use pretrained models, such as deep features, for new classification tasks You will learn how to Prototype ideas and then productionize
- Explore a dataset of products, reviews and images
- This is an advanced level session and it assumes that you have good familiarity with Machine learning.
- Machine Learning Internals
- This is an instructor led course provides lecture topics and the practical application of Deep Learning and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design
- Parameter Hyperspace
- Minimizing Cost Entropy
- Normalized Inputs And Initial Weights
- Transition Into Practical Aspects Of Learning
- Measuring Performance
- Stochastic Gradient Descent
- Training your Logistic Classifier
- Transition: Overfitting -> Dataset Size
- Momentum And Learning Rate Decay
- Solving Problems
- Supervised Classification
- Lather Rinse Repeat
- Optimizing A Logistic Classifier
- Cross Entropy
- What is Deep Learning
- "2-layer" neural network
- Dropout
- Network Of ReLUs
- Intro to Deep Neural Network
- No Neurons
- Backprop
- Regularization Intro
- Linear Models Are Limited
- The Chain Rule
- Dropout Pt-2
- Regularization
- Training A Deep Learning Network
- Statistical Invariance
- Intro To CNNs
- Inception Module
- Convolutional Networks
- Explore The Design Space
- x Convolutions
- Convolutions Continued
- Play Legos
- WordVec
- Beam Search
- Train A Text Embedding Model
- Embeddings
- WordVec Details
- TSNE
- LSTM
- RNNs
- Semantic Ambiguity
- Captioning And Translation
- Memory Cel
- Regularization
- Vanishing / Exploding Gradients
- Unsupervised Learning
- Analogies
- Sequences Of Varying Length
- LSTM Cell
- Backprop Through Time
- Introduction to CUDA and OpenCL
- Fundamentals of GPU Algorithms(Applications of Sort and Scan)
- Dynamic Parallelism
- Optimizing GPU Programs
- Parallel Computation Patterns
- The GPU Hardware and Parallel Communication Patterns
- The GPU programming Model
- Parallel Optimization Patterns
- Fundamentals of GPU Algorithms(Reduce,Scan,Histograms)
- Deep Learning use of GPU
- Word representations
- Compositional Vector Grammars: Parsing
- Matrix-Vector RNNs: Relation classification
- Unsupervised word vector learning
- Learning word-level classifiers: POS and NER
- Backpropagation Training
- Recursive Neural Tensor Networks: Sentiment Analysis
- Recursive Neural Networks for Parsing
- Recursive Autoencoders: Paraphrase Detection
- Optimization and Backpropagation Through Structure
- Sharing statistical strength
- Assorted Speech and NLP applications
- Tensorflow, Matlab, Octave
- Hidden Units
- Architecture Design
- Back-Propagation and Other Differentiation Algorithms
- Gradient-Based Learning
- Learning XOR
- Bagging and Other Ensemble Methods
- Dataset Augmentation
- Tangent Distance, Tangent Prop, and Manifold Tangent Classifier
- Parameter Tying and Parameter Sharing
- Semi-Supervised Learning
- Early Stopping
- Sparse Representations
- Multi-Task Learning
- Regularization and Under-Constrained Problems
- Dropout
- Norm Penalties as Constrained Optimization
- Noise Robustness
- Parameter Norm Penalties
- Adversarial Training
- Random or Unsupervised Features
- Convolutional Networks
- Structured Outputs
- Efficient Convolution Algorithms
- Challenges in Neural Network Optimizatio
- Variants of the Basic Convolution Function
- The Convolution Operation
- Pooling
- Parameter Initialization Strategies
- Motivation
- How Learning Differs from Pure Optimization
- Basic Algorithms
- Optimization Strategies and Meta-Algorithms
- The Neuroscientific Basis for Convolutional Networks
- Convolution and Pooling as an Infinitely Strong Prior
- Data Types
- Approximate Second-Order Methods
- Algorithms with Adaptive Learning Rates
- Unfolding Computational Graphs
- The Challenge of Long-Term Dependencies
- Bidirectional RNNs
- Echo State Networks
- Deep Recurrent Networks
- Recursive Neural Networks
- Leaky Units and Other Strategies for Multiple Time Scales
- Explicit Memory
- The Long Short-Term Memory and Other Gated RNNs
- Encoder-Decoder Sequence-to-Sequence Architectures
- Recurrent Neural Networks
- Optimization for Long-Term Dependencies
- Selecting Hyperparameters
- Debugging Strategies
- Example : Facial Recognition
- Performance Metrics
- Determining Whether to Gather More Data
- Default Baseline Models
- Other Applications
- Computer Vision
- Natural Language Processing
- Large Scale Deep Learning
- Speech Recognition
- Linear Factor Models
- Probabilistic PCA and Factor Analysis
- Independent Component Analysis (ICA)
- Manifold Interpretation of PCA
- Slow Feature Analysis
- Sparse Coding
- Representational Power, Layer Size and Depth
- Contractive Autoencoders
- Stochastic Encoders and Decoders
- Predictive Sparse Decomposition
- UndercompleteAutoencoders
- Regularized Autoencoders
- Learning Manifolds with Autoencoders
- Applications of Autoencoders
- DenoisingAutoencoders
- Distributed Representation
- Transfer Learning and Domain Adaptation
- Greedy Layer-Wise Unsupervised Pretraining
- Providing Clues to Discover Underlying Causes
- Semi-Supervised Disentangling of Causal Factors
- Exponential Gains from Depth
- The Deep Learning Approach to Structured Probabilistic Models
- Advantages of Structured Modeling
- Inference and Approximate Inference
- Using Graphs to Describe Model Structure
- Sampling from Graphical Models
- Learning about Dependencies
- The Challenge of Unstructured Modeling
- The Challenge of Mixing between Separated Modes
- Gibbs Sampling
- Sampling and Monte Carlo Methods
- Markov Chain Monte Carlo Methods
- Importance Sampling
- Deep Boltzmann Machines
- Back-Propagation through Random Operations
- Restricted Boltzmann Machines
- Generative Stochastic Networks
- Boltzmann Machines for Structured or Sequential Outputs
- Boltzmann Machines
- Other Boltzmann Machines
- Other Generation Schemes
- Directed Generative Nets
- Boltzmann Machines for Real-Valued Data
- Evaluating Generative Models
- Drawing Samples from Autoencoders
- Deep Belief Networks
- Convolutional Boltzmann Machines
- Integrating Learning and Planning
- Policy Gradient Methods
- Model-Free Control
- Exploration and Exploitation
- Markov Decision Processes
- Case Study: RL in Classic Games
- Introduction to Reinforcement Learning
- Planning by Dynamic Programming
- Model-Free Prediction
- Value Function Approximation