Hive
Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during insertion, example copy flat file into HDFS; update metadata in Hive Table (“msck repair table” command).
Hive itself has specific folder and if data stored outside hive folder are known are external table. Hive is not a database, it only store table structure (schema).
Hive Directory/Physical Layout
hive/warehouse/<<table name>>/<<partition (still folder)>>/<<bucket aka filename>>
Hive Components
- Shell – allow interactive queries (web ui like hue, hd insight server side)
- Driver – session handler, fetch, execute
- Compiler – parse, plan, optimize
- Execution engine – dag of stage (MR, HDFS, metadata) hive ql process engine
- Metastore – schema, location in hdfs

Leave a comment