Research breakdowns, practical implementation notes, and opinionated takes from real-world data and AI work.
-
Day 10 (1)
Hive Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during…
-
Day 9 (2)
Data Structure the way how to store data in memory. For example, store data in matrix form, 5row x 3column. factor(data,levels,labels) →…
-
-
Day 8
Spark Cluster computing framework, processing framework using memory. To overcome map reduce performance issue. Cluster manager resource allocation base on job (request).…
-
Day 7
Tutorial 6 HBase A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides…
-
Row store vs Column store
Row Column “select *” run faster “select *” run slower as need to combine data Seek slower if not index. Traverse to…
-
Day 6
Map Reduce refer to– https://informationit27.medium.com/hadoop-mapreduce-in-action-b7c723b604ba– https://www.slideshare.net/mudassarmulla/tutorial-hadoop-hdfsmapreduce– https://cwiki.apache.org/confluence/display/HADOOP2/JobTracker– https://www.youtube.com/watch?v=ULtOZqlZnCw Tools built on top of Map Reduce Shortcoming of Map Reduce
-
Quiz
Remain original value if x > 0, else replace with 0. 1. Using Max 2. Using logical 3. Using absolute Any other…
-
Day 5 (2)
Condition ==,<,<=,>,>=,!= → boolean operators if/else Result: switch → if not integer/index, must define result. Result: out of condition return NULL Result:…
-
Day 5 (1)
Tutorial 4 Tutorial 5 Discuss and evaluate suitable techniques/methods being used in literature while performing the big data analytics on the following:a)…
-
Day 4
5 Daemon “Daemon” that sound like demon, is the background service that not initiated by user. HDFS Map Reduce Hadoop is distributed…
-
Day 3
What is the benefit of distributed? Using parallel concept, original task might complete in 11 hour, but if parallel in 4 machine,…
