Starting off as a muggle that naïve to the Math's and Data Science world.

Row store vs Column store

RowColumn
“select *” run faster“select *” run slower as need to combine data
Seek slower if not index. Traverse to block row 3, read all column until R3C5 (5 steps)Seek faster. Traverse to block column 5, read until R3C5 (3 steps)
Aggregation slow, whole row of data is read out to memory before aggregateAggregate faster, only use column that needed.
faster write, jus appendharder to write; normally has a buffer write before commit into disc.
Saving empty value into data block (1 block = 1 row).Did not save anything to data block because each cell in column has row key.
hard to compress dataeasier to compress data as data in single column are more alike
RDMS has row size limit of 8kb, due to page size.NoSQL can store bigger row but for big file/data, advise to store in hadoop and kept the file path here.
RDMS has number of column limitNoSQL no column limit but you will face OOME

Leave a comment