Starting off as a muggle that naïve to the Math's and Data Science world.

Day 24

Data Warehouse

Online Analytical Processing (OLAP).

Store structured data, data source from Online Transactional Processing (OLTP), live database.

Storage repository; not storing live data, only store pass/historical data.

Mostly used for model training, analysis and pattern discovery.

Example, student data stored in database but graduated student data stored in data warehouse.

Data Lake

Store both structured data and unstructured data (logs, images, audio, video, binary)

Staging location before processed into data warehouse.

Hadoop is mostly used as data lake whereby csv is zipped and stored in it later used by Hive for querying.

Data Mart

Mini data warehouse.

Custom built for specific group of user.

Split from data warehouse. For example, regional base data warehouse.


Hadoop

Not a standalone framework but come with a set of tools (Hive, Pig) .

Leave a comment