Module – BDAT – Kim 2 ML

Big Data Analytics & Technologies

Day 1 Day 2 Day 3 Day 4 Day 6 Day 7 Day 8 Day 9 Day 10

Kim Ng

June 4, 2023

Day 12 (1)

Tutorial 11 Question 1:Explain the cycle of big data management. Example Answer Big data management involves a series of processes aimed at handling large volumes of data effectively. The cycle of big data management typically involves the following stages:Capture: This is the process of collecting data from various sources, including social media platforms, IoT devices,…

Kim Ng

May 10, 2023

Master, Module – BDAT

Tutorial

Day 11

Tutorial 10.5

Kim Ng

May 9, 2023

Master, Module – BDAT

Hive, Tutorial

Day 10 (1)

Hive Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during insertion, example copy flat file into HDFS; update metadata in Hive Table (“msck repair table” command). Hive itself has specific folder and if data stored outside hive folder are known are external table.…

Kim Ng

May 6, 2023

Master, Module – BDAT

Tutorial

Day 9

Spark (Con’t) Execution step Tutorial 9 Tutorial 10

Kim Ng

April 22, 2023

Master, Module – BDAT

Spark, Tutorial

Day 8

Spark Cluster computing framework, processing framework using memory. To overcome map reduce performance issue. Cluster manager resource allocation base on job (request). Spark create Resilient Distributed Datasets (RDD) :: partition once it receive the data. Once RDD is ready, it uses graph transformation (Directed Acyclic Graph DAG) Consists of 2 phases: Transformation and Action **…

Kim Ng

April 21, 2023

Master, Module – BDAT

Spark, Tutorial

Day 7

Tutorial 6 HBase A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides random real-time read/write to data in the Hadoop File System. HDFS (Write Once Read Many) HBase Not good for record lookup, only file lookup Fast record lookup Not good for incremental addition of…

Kim Ng

April 20, 2023

Master, Module – BDAT

HBase, Tutorial

Day 6

Map Reduce refer to– https://informationit27.medium.com/hadoop-mapreduce-in-action-b7c723b604ba– https://www.slideshare.net/mudassarmulla/tutorial-hadoop-hdfsmapreduce– https://cwiki.apache.org/confluence/display/HADOOP2/JobTracker– https://www.youtube.com/watch?v=ULtOZqlZnCw Tools built on top of Map Reduce Shortcoming of Map Reduce

Kim Ng

April 20, 2023

Master, Module – BDAT

Hadoop, HDFS, MapReduce

Day 5 (1)

Tutorial 4 Tutorial 5 Discuss and evaluate suitable techniques/methods being used in literature while performing the big data analytics on the following:a) Market Basket Analysis.b) Customer Churn Prediction Analysis.Please support your discussion based on a research paper. Example AnswerThe big data analytics on Market Basket Analysis could help to· Provide combo offers based on products…

Kim Ng

March 31, 2023

Master, Module – BDAT

Hadoop, HDFS, Tutorial

Day 4

5 Daemon “Daemon” that sound like demon, is the background service that not initiated by user. HDFS Map Reduce Hadoop is distributed storage and processing. It only means the data node (storage & processing), not for name node; Name node (master) must be high availability hardware (expensive); Secondary name node come in to make name…

Kim Ng

March 28, 2023

Master, Module – BDAT

Hadoop, HDFS

Day 3

What is the benefit of distributed? Using parallel concept, original task might complete in 11 hour, but if parallel in 4 machine, it would took only 3 hour. Challenges Hadoop Core Principle Hadoop Components Why Hadoop? (feature) Hadoop Definition Hadoop is an open-source software framework (LICENSE) for distributed storage and distributed parallel processing (HOW) of…

Kim Ng

March 27, 2023

Master, Module – BDAT

Hadoop, Tutorial

Day 2 (2)

Concept and terminology What is dataset? Collection or groups of related data. Dataset is like when a new student join in, he/she share the same common attribute/properties like other. What is algorithm? Algorithm is a set of rule/step/instruction for problem solving, that later can be implement into program. Algo vs program – You can execute…

Kim Ng

March 25, 2023

Master, Module – BDAT

Tutorial

Day 1

What is data? Data is a series of measurement, series of observation, series of raw facts does not convey any meaning. Data is not equal to information. Information is generated when data is processed. Due to the exponential grow of data and several era of technological advancement. Huge electronic generation happens lead to huge deposit…

Kim Ng

March 24, 2023

Master, Module – BDAT

Category: Module – BDAT