-
Day 9
Spark (Con’t) Execution step Tutorial 9 Tutorial 10
-
Day 8
Spark Cluster computing framework, processing framework using memory. To overcome map reduce performance issue. Cluster manager resource allocation base on job (request). Spark create Resilient Distributed Datasets (RDD) :: partition once it receive the data. Once RDD is ready, it uses graph transformation (Directed Acyclic Graph DAG) Consists of 2 phases: Transformation and Action **…
-
Day 7
Tutorial 6 HBase A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides random real-time read/write to data in the Hadoop File System. HDFS (Write Once Read Many) HBase Not good for record lookup, only file lookup Fast record lookup Not good for incremental addition of…
-
Row store vs Column store
Row Column “select *” run faster “select *” run slower as need to combine data Seek slower if not index. Traverse to block row 3, read all column until R3C5 (5 steps) Seek faster. Traverse to block column 5, read until R3C5 (3 steps) Aggregation slow, whole row of data is read out to memory…
-
Day 6
Map Reduce refer to– https://informationit27.medium.com/hadoop-mapreduce-in-action-b7c723b604ba– https://www.slideshare.net/mudassarmulla/tutorial-hadoop-hdfsmapreduce– https://cwiki.apache.org/confluence/display/HADOOP2/JobTracker– https://www.youtube.com/watch?v=ULtOZqlZnCw Tools built on top of Map Reduce Shortcoming of Map Reduce
-
Quiz
Remain original value if x > 0, else replace with 0. 1. Using Max 2. Using logical 3. Using absolute Any other suggestion?
-
Day 5 (2)
Condition ==,<,<=,>,>=,!= → boolean operators if/else Result: switch → if not integer/index, must define result. Result: out of condition return NULL Result: for Result: while Result: repeat Result:
-
Day 5 (1)
Tutorial 4 Tutorial 5 Discuss and evaluate suitable techniques/methods being used in literature while performing the big data analytics on the following:a) Market Basket Analysis.b) Customer Churn Prediction Analysis.Please support your discussion based on a research paper. Example AnswerThe big data analytics on Market Basket Analysis could help to· Provide combo offers based on products…
-
Day 4
5 Daemon “Daemon” that sound like demon, is the background service that not initiated by user. HDFS Map Reduce Hadoop is distributed storage and processing. It only means the data node (storage & processing), not for name node; Name node (master) must be high availability hardware (expensive); Secondary name node come in to make name…
-
Day 3
What is the benefit of distributed? Using parallel concept, original task might complete in 11 hour, but if parallel in 4 machine, it would took only 3 hour. Challenges Hadoop Core Principle Hadoop Components Why Hadoop? (feature) Hadoop Definition Hadoop is an open-source software framework (LICENSE) for distributed storage and distributed parallel processing (HOW) of…
-
Day 2 (2)
Concept and terminology What is dataset? Collection or groups of related data. Dataset is like when a new student join in, he/she share the same common attribute/properties like other. What is algorithm? Algorithm is a set of rule/step/instruction for problem solving, that later can be implement into program. Algo vs program – You can execute…
-
Day 2 (1)
UI in R Studio Code Code reusability, introduce “package”, which is a public directory from CRAN. also similar to Operation+,-,*,/,%% → Basic Maths operatorsfirst *, / later +, – → Operator Precedence=, ->, <-, assign → variable declarationsrm(<<variable name>>), remove(<<variable name>>) → remove a variableclass → to tell datatypes(numeric,integer,character,logical,date)is.<<datatype>>(<<variable name>>) → checking/testing/validation the data typeas.<<datatype>>(<<variable…
-
Day 1
What is data? Data is a series of measurement, series of observation, series of raw facts does not convey any meaning. Data is not equal to information. Information is generated when data is processed. Due to the exponential grow of data and several era of technological advancement. Huge electronic generation happens lead to huge deposit…
