Big Data Analytics Training in Pune
Hadoop Ecosystem (Tools)
- HBase Operations
- Co-Processor
- Scan Operations
- Column Value & Key Pair
- Column Families
- Index & Query
- Counters
- CRUD Operations
- Result Scanner
- Batch and Caching
- MapReduce and HBase
- Filters
- Creating Table – Shell and Programming
- Importing into HBase
- Deep Dive in Hive
- Understanding Hive , Architecture, Physical Model, Data Model, Data Types
- Hive QL- DDL,DML,other Operations
- Playing with huge data and Querying extensively.
- User defined Functions,Optimizing Queries, Tips and Tricks for performance tuning
- Tables in Hive, Partitioning, Indexes,Bucketing,Sub Queries, Joining Tables,Data Load and appending data to exisiting Table
- Deep Dive in Pig
- Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem
- Grunt, Script Mode, Data Model,
- Real time use cases
- HBase DB Design
- Handling Index
- Designing Keys
- Transaction
- Integration for search
- Schema Design
- Flume
- Join Patterns
- Metapatterns
- Summarization Patterns
- The Effects of YARN
- Data Organization Patterns
- Filtering Patterns
- Input and Output Patterns
- Final Thoughts
- Apache Tez
- Apache Tez: A New Chapter in Hadoop Data Processing
- Data Processing API in Apache Tez
- Writing a Tez Input/Processor/Output
- Runtime API in Apache Tez
- Apache Tez: Dynamic Graph Reconfiguration
- Apache YARN
- Agility
- global ResourceManager
- per-node slave NodeManager
- Scalability
- Support for workloads other than MapReduce
- Compatibility with MapReduce
- Per-application Container running on a NodeManager
- Improved cluster utilization
- per-application ApplicationMaster
- HDFS-2
- High Availability for HDFS
- HDFS-append support
- HDFS Federation
- HDFS Snapshots
- Clustering
- Measuring the similarity of items
- Exploring distance measures
- Clustering basics
- Clustering algorithms in Mahout
- Fuzzy k-means clustering
- Model-based clustering
- K-means clustering
- Beyond k-means: an overview of clustering techniques
- Topic modeling using latent Dirichlet allocation (LDA)
- Taking clustering to production
- Batch and online clustering
- Tuning clustering performance
- Quick-start tutorial for running clustering on Hadoop
- Evaluating and improving clustering quality
- Inspecting clustering output
- Analyzing clustering output
- Improving clustering quality
- Clustering algorithms in Mahout
- Topic modeling using latent Dirichlet allocation (LDA)
- K-means clustering
- Beyond k-means: an overview of clustering techniques
- Inspecting clustering output
- Analyzing clustering output
- Fuzzy k-means clustering
- Evaluating and improving clustering quality
- Improving clustering quality
- Model-based clustering
- Representing data
- Improving quality of vectors using normalization
- Representing text documents as vectors
- Visualizing vectors
- Generating vectors from documents
- Classification
- Work flow in a typical classification project
- The fundamentals of classification systems
- Introduction to classification
- How classification works
- Mahout for classification
- Classification example
- Training a classifier
- Classifying the 20 newsgroups data set with SGD
- Preprocessing raw data into classifiable data
- Converting classifiable data into vectors
- Mahout classifier
- Choosing an algorithm to train the classifier
- Classifying the 20 newsgroups data with naive Bayes
- Evaluating and tuning a classifier
- The classifier evaluation API
- Process for deployment in huge systems
- Thrift-based classification server
- Building a training pipeline for large systems
- When classifiers go bad
- Classifier evaluation in Mahout
- Determining scale and speed requirements
- Deploying a classifier
- Introducing recommenders
- Evaluating the GroupLens data set
- Defining recommendation
- Evaluating precision and recall
- Evaluating a recommender
- Real-world applications of clustering
- Finding similar users on Twitter
- Analyzing the Stack Overflow data set
- Suggesting tags for artists on Last.fm
- Representing recommender data
- Coping without preference values
- In-memory DataModels
- Representing preference data
- Making recommendations
- Exploring similarity metrics
- Slope-one recommender
- New and experimental recommenders
- Comparison to other recommenders
- Understanding user-based recommendation
- Item-based recommendation
- Exploring the user-based recommender
- Distributing recommendation computations
- Designing a distributed item-based algorithm
- Implementing a distributed algorithm with MapReduce
- Analyzing the Wikipedia data set
- Pseudo-distributing a recommender
- Taking recommenders to production
- Analyzing example data from a dating site
- Finding an effective recommender
- Recommending to anonymous users
- Injecting domain-specific information