Starting off as a muggle that naïve to the Math's and Data Science world.

Day 7

Tutorial 6

HBase

A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides random real-time read/write to data in the Hadoop File System.

HDFS (Write Once Read Many)HBase
Not good for record lookup, only file lookupFast record lookup
Not good for incremental addition of small batchesSupport for record-level insertion
Not good for updatesSupport for updates

HBase Architecture

1. Hbase table, column family::column and cell.

source. https://www.edureka.co/blog/hbase-architecture/

2. When update happens, each cell with new version number.

source. https://community.cloudera.com/t5/Community-Articles/Hbase-security-model-part1/ta-p/248482

3. When the table grow too long, it splits. Regions == Partition. 1 region equal to 1 column family of table.

source. https://data-flair.training/blogs/hbase-architecture/

4. Region aka hfile is later stored in HDFS data node.

source. http://bigdatariding.blogspot.com/2013/12/hbase-architecture.html

HBase Storage Mechanism

  • Table is a collection of rows
  • Row is a collection of column families
  • Column family is a collection of columns
  • Column is a collection of key value pairs.

HBase Components

  • Region – subset of table row, like partition (128MB)
  • Region Server – manager region read and write
  • HMaster – coordinate slave, assign region, admin function
  • Zookeeper – monitor region server (which got free space)

Write Steps

  1. Put ‘table_name’,’row_key’,’column_family.column’,’value’
  2. Data write into memstore
  3. if memstore is full, data flush into HFile (Disc), this will update Name Node new file created.
  4. refer https://blog.csdn.net/whiteblacksheep/article/details/99867552

Leave a comment