Tutorial 6
HBase
A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides random real-time read/write to data in the Hadoop File System.
| HDFS (Write Once Read Many) | HBase |
|---|---|
| Not good for record lookup, only file lookup | Fast record lookup |
| Not good for incremental addition of small batches | Support for record-level insertion |
| Not good for updates | Support for updates |
HBase Architecture
1. Hbase table, column family::column and cell.

2. When update happens, each cell with new version number.

3. When the table grow too long, it splits. Regions == Partition. 1 region equal to 1 column family of table.

4. Region aka hfile is later stored in HDFS data node.

HBase Storage Mechanism
- Table is a collection of rows
- Row is a collection of column families
- Column family is a collection of columns
- Column is a collection of key value pairs.
HBase Components
- Region – subset of table row, like partition (128MB)
- Region Server – manager region read and write
- HMaster – coordinate slave, assign region, admin function
- Zookeeper – monitor region server (which got free space)
Write Steps
- Put ‘table_name’,’row_key’,’column_family.column’,’value’
- Data write into memstore
- if memstore is full, data flush into HFile (Disc), this will update Name Node new file created.
- refer https://blog.csdn.net/whiteblacksheep/article/details/99867552

Leave a comment