about banner




Hadoop Ecosystem (Tools)
  • HBase Operations
    • Co-Processor
    • Scan Operations
    • Column Value & Key Pair
    • Column Families
    • Index & Query
    • Counters
    • CRUD Operations
    • Result Scanner
    • Batch and Caching
    • MapReduce and HBase
    • Filters
    • Creating Table – Shell and Programming
    • Importing into HBase
  • Deep Dive in Hive
    • Understanding Hive , Architecture, Physical Model, Data Model, Data Types
    • Hive QL- DDL,DML,other Operations
    • Playing with huge data and Querying extensively.
    • User defined Functions,Optimizing Queries, Tips and Tricks for performance tuning
    • Tables in Hive, Partitioning, Indexes,Bucketing,Sub Queries, Joining Tables,Data Load and appending data to exisiting Table
  • Deep Dive in Pig
    • Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem
    • Grunt, Script Mode, Data Model,
    • Real time use cases
  • HBase DB Design
    • Handling Index
    • Designing Keys
    • Transaction
    • Integration for search
    • Schema Design
    • Flume
Map Reduce Design Patterns
  • Join Patterns
  • Metapatterns
  • Summarization Patterns
  • The Effects of YARN
  • Data Organization Patterns
  • Filtering Patterns
  • Input and Output Patterns
  • Final Thoughts
Hadoop-2
  • Apache Tez
    • Apache Tez: A New Chapter in Hadoop Data Processing
    • Data Processing API in Apache Tez
    • Writing a Tez Input/Processor/Output
    • Runtime API in Apache Tez
    • Apache Tez: Dynamic Graph Reconfiguration
  • Apache YARN
    • Agility
    • global ResourceManager
    • per-node slave NodeManager
    • Scalability
    • Support for workloads other than MapReduce
    • Compatibility with MapReduce
    • Per-application Container running on a NodeManager
    • Improved cluster utilization
    • per-application ApplicationMaster
  • HDFS-2
    • High Availability for HDFS
    • HDFS-append support
    • HDFS Federation
    • HDFS Snapshots
Analytics
  • Clustering
  • Measuring the similarity of items
  • Exploring distance measures
  • Clustering basics
  • Clustering algorithms in Mahout
    • Fuzzy k-means clustering
    • Model-based clustering
    • K-means clustering
    • Beyond k-means: an overview of clustering techniques
    • Topic modeling using latent Dirichlet allocation (LDA)
  • Taking clustering to production
    • Batch and online clustering
    • Tuning clustering performance
    • Quick-start tutorial for running clustering on Hadoop
  • Evaluating and improving clustering quality
    • Inspecting clustering output
    • Analyzing clustering output
    • Improving clustering quality
  • Clustering algorithms in Mahout
    • Topic modeling using latent Dirichlet allocation (LDA)
    • K-means clustering
    • Beyond k-means: an overview of clustering techniques
    • Inspecting clustering output
    • Analyzing clustering output
    • Fuzzy k-means clustering
    • Evaluating and improving clustering quality
    • Improving clustering quality
    • Model-based clustering
  • Representing data
    • Improving quality of vectors using normalization
    • Representing text documents as vectors
    • Visualizing vectors
    • Generating vectors from documents
  • Classification
    • Work flow in a typical classification project
    • The fundamentals of classification systems
    • Introduction to classification
    • How classification works
    • Mahout for classification
    • Classification example
  • Training a classifier
    • Classifying the 20 newsgroups data set with SGD
    • Preprocessing raw data into classifiable data
    • Converting classifiable data into vectors
    • Mahout classifier
    • Choosing an algorithm to train the classifier
    • Classifying the 20 newsgroups data with naive Bayes
    • Evaluating and tuning a classifier
    • The classifier evaluation API
    • Process for deployment in huge systems
    • Thrift-based classification server
    • Building a training pipeline for large systems
    • When classifiers go bad
    • Classifier evaluation in Mahout
    • Determining scale and speed requirements
    • Deploying a classifier
Recommendations
  • Introducing recommenders
    • Evaluating the GroupLens data set
    • Defining recommendation
    • Evaluating precision and recall
    • Evaluating a recommender
  • Real-world applications of clustering
    • Finding similar users on Twitter
    • Analyzing the Stack Overflow data set
    • Suggesting tags for artists on Last.fm
  • Representing recommender data
    • Coping without preference values
    • In-memory DataModels
    • Representing preference data
    • Making recommendations
    • Exploring similarity metrics
    • Slope-one recommender
    • New and experimental recommenders
    • Comparison to other recommenders
    • Understanding user-based recommendation
    • Item-based recommendation
    • Exploring the user-based recommender
  • Distributing recommendation computations
    • Designing a distributed item-based algorithm
    • Implementing a distributed algorithm with MapReduce
    • Analyzing the Wikipedia data set
    • Pseudo-distributing a recommender
    • Taking recommenders to production
    • Analyzing example data from a dating site
    • Finding an effective recommender
    • Recommending to anonymous users
    • Injecting domain-specific information