about banner

Introduction
  • Dynamo and Bigtable
  • Key/Value Stores
  • Schemaless
  • Common Advantages
  • What am I giving up?
  • BigTable and HBase(C + P)
  • Data Mode
  • Table, Column Families,Rows and Columns
HBase Storage Architecture
  • Physical Architecture
  • Role of Zookeeper
  • HMaster and HRegionServer
  • Root Table and Meta Table
  • MemStore
  • WAL
  • HFile
  • Read and Write Path
  • How Data is Store in Hfile
  • Key Format
  • HFile Format
Log Structures Merge Trees
  • Limitations of Binary Trees
  • Limitations of B+ Trees
  • LogStructured Merge tree as the back bone of storage
  • Compaction
Future Directions
  • Exploring OFF-Heap Storage
  • MMap for bloom filters and Block indexes
  • HBase Operations
  • Access Patterns
  • Gets
  • Caching
  • Put
  • Batching
  • Scanning
  • Filters
Desiging HBase Tables and Schemas
  • Key Design
  • Concepts
    • Tall-Narrow Versus Flat-Wide Tables
    • Partial Key Scans
    • Pagination
    • Time Series Data
    • Time-Ordered Relations
  • Advanced Schemas
  • Secondary Indexes
Advanced Map Reduce
  • Controlling MapReduce Execution with InputFormat
    • Implementing InputFormat for Compute-Intensive Applications
    • Implementing InputFormat to Control the Number of Maps
    • Implementing InputFormat for Multiple HBase Tables
  • Reading Data Your Way with Custom RecordReaders
    • Implementing a Queue-Based RecordReader
    • Implementing RecordReader for XML Data
  • Organizing Output Data with Custom Output Formats
    • Implementing OutputFormat for Splitting MapReduce
    • Job’s Output into Multiple Directories
    • Writing Data Your Way with Custom RecordWriters
  • Implementing a RecordWriter to Produce Output tar Files
    • Optimizing Your MapReduce Execution with a Combiner
    • Controlling Reducer Execution with Partitioners
    • Implementing a Custom Partitioner for One-to-Many Joins
  • Using Non-Java Code with Hadoop
    • Pipes
    • Hadoop Streaming
    • Using JNI
  • Real-Time Case-Study
    • Using HBase for Implementing Real-Time Applications
    • Using HBase as a Picture Management System
    • Understanding HBase Limitations and Opportunities
    • Understanding indexing with HBase
    • Optimizing HBase for indexing
    • Where is the data stored?
    • Combining HBase and HDFS
    Using HBase as a Lucene Back End
    • Lucene Refresher
    • Limitaions of Lucene
    • Designing Lucene to Use Sharded Index with HBase
    • Where is the data stored?
    • Combining Lucene, HBase and HDFS
    • Hey, did we just design Solr?