HBase Training
Introduction
- Dynamo and Bigtable
- Key/Value Stores
- Schemaless
- Common Advantages
- What am I giving up?
- BigTable and HBase(C + P)
- Data Mode
- Table, Column Families,Rows and Columns
- Physical Architecture
- Role of Zookeeper
- HMaster and HRegionServer
- Root Table and Meta Table
- MemStore
- WAL
- HFile
- Read and Write Path
- How Data is Store in Hfile
- Key Format
- HFile Format
- Limitations of Binary Trees
- Limitations of B+ Trees
- LogStructured Merge tree as the back bone of storage
- Compaction
- Exploring OFF-Heap Storage
- MMap for bloom filters and Block indexes
- HBase Operations
- Access Patterns
- Gets
- Caching
- Put
- Batching
- Scanning
- Filters
- Key Design
- Concepts
- Tall-Narrow Versus Flat-Wide Tables
- Partial Key Scans
- Pagination
- Time Series Data
- Time-Ordered Relations
- Advanced Schemas
- Secondary Indexes
- Controlling MapReduce Execution with InputFormat
- Implementing InputFormat for Compute-Intensive Applications
- Implementing InputFormat to Control the Number of Maps
- Implementing InputFormat for Multiple HBase Tables
- Reading Data Your Way with Custom RecordReaders
- Implementing a Queue-Based RecordReader
- Implementing RecordReader for XML Data
- Organizing Output Data with Custom Output Formats
- Implementing OutputFormat for Splitting MapReduce
- Job’s Output into Multiple Directories
- Writing Data Your Way with Custom RecordWriters
- Implementing a RecordWriter to Produce Output tar Files
- Optimizing Your MapReduce Execution with a Combiner
- Controlling Reducer Execution with Partitioners
- Implementing a Custom Partitioner for One-to-Many Joins
- Using Non-Java Code with Hadoop
- Pipes
- Hadoop Streaming
- Using JNI
Real-Time Case-Study
- Using HBase for Implementing Real-Time Applications
- Using HBase as a Picture Management System
- Understanding HBase Limitations and Opportunities
- Understanding indexing with HBase
- Optimizing HBase for indexing
- Where is the data stored?
- Combining HBase and HDFS
- Lucene Refresher
- Limitaions of Lucene
- Designing Lucene to Use Sharded Index with HBase
- Where is the data stored?
- Combining Lucene, HBase and HDFS
- Hey, did we just design Solr?