Starting off as a muggle that naïve to the Math's and Data Science world.

Day 10 (1)

Hive

Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during insertion, example copy flat file into HDFS; update metadata in Hive Table (“msck repair table” command).

Hive itself has specific folder and if data stored outside hive folder are known are external table. Hive is not a database, it only store table structure (schema).

Hive Directory/Physical Layout
hive/warehouse/<<table name>>/<<partition (still folder)>>/<<bucket aka filename>>

Hive Components

  • Shell – allow interactive queries (web ui like hue, hd insight server side)
  • Driver – session handler, fetch, execute
  • Compiler – parse, plan, optimize
  • Execution engine – dag of stage (MR, HDFS, metadata) hive ql process engine
  • Metastore – schema, location in hdfs

Leave a comment