Tag: Tutorial
-
Day 106
Question 1 As a variable at the student level that is essential for explaining Economics score, we use the measure for revision hours per month taken from a study. The revision hours has been centered, so that its mean is 0. The results are presented below. Revision hours here is the variable with overall centering…
-
Day 104
Question 1 Compare the Multiple Linear Regression (MLR) and Random Intercept Model (RIM). Example Answer MLR RIM Question 2 Provide two conditions where random intercept model should be used. Example Answer Question 3 As a variable at the student level that is essential for explaining English score, we use the measure for IQ taken from…
-
Day 99 (2), 101 (2), 103
Question 1 Discuss the disadvantages of “aggregation” Example Answer When we are using aggregation, the detail about the data point or micro data is ignored in order to conclude the overall picture; which is the macro level. By reducing each aspect or characteristic, it may lead to misleading conclusion. For example, determine the performance of…
-
Day 98
Question 1 Discuss “dependence as a nuisance”. Example Answer Student motivation level depends on teacher experience. Hypothesis teacher perform well in the classroom, the teacher know how to tackle the student; it may increase student motivation level. So if to examine student motivation level in the classroom, i’ll pick up a classroom and sample randomly…
-
Day 95
Question 1 Mixed effect model is a statistical method used in ANOVA and regression. Discuss the similarity between ANOVA and regression. Example Answer Both are similar statistical method use to analyst relationship between independent variable and dependent variable; taking account variation in data and explain the impact of the predictor. ANOVA focus on mean differences…
-
Day 72
Tutorial 2 Question 1 Example Answer Exclude Q7 and Q8 Exclude Q6 as it is cross loading, show almost similar magnitude in factor 1 and factor 2. Factor 1 can be named as “Eating habits”.Factor 2 can be named as “Food preparation”.Surrogate variable = Q1 for factor 1, Q10 for factor 2.Summated scale f1 =…
-
Day 70
Tutorial 1 Question 1 Example Answer Question 2 Example Answer Exclude Pulse Exclude BSA By using stepwise method, independent variable are introduce/remove step by step while training the model.
-
Day 65
Tutorial 5 Question 1 A volatile model is fitted to a stock return data and produce the following results (a) Write down the equation to forecast volatility based on the output above. (b) What can we conclude based on the results in ARCH LM-test? (c) Are all the coefficients in the model significant? Explain your…
-
Day 63
Tutorial 4 Question 1 The table below shows the result after fitting a time series data by an ARIMA model. (a) Write the ARIMA model equation based on the results above. (b) Represent the ARIMA model in ARIMA (p, d, q). (c) Is the ARIMA model adequate? Justify your answers. (d) Are all the AR…
-
Day 60
Measuring Predictive Accuracy Raw data (only for illustration) X Y Forecast Value 1 0.3324 1 2 2.9232 2 3 1.4348 3 4 4.0073 4 5 3.7612 5 6 5.1456 6 7 8.1008 7 8 8.3195 8 9 8.4495 9 10 10.9755 10 11 12.1784 11 12 13.6671 12 13 14.6767 13 14 17.7715 14 15…
-
Day 59
Recap. Differencing vs smoothing = differencing is about removing trend from time series, by “station” the data back to its mean; smoothing in the other hand focus on removing irregular signal to uncover pattern (trend & seasonality). Tutorial 2 Question 1 (a) Compute a three-quarter moving average forecast for quarters 4 through 13. (b) Compute…
-
Day 56
Time Series 4 Components that made up time series 1. Irregular fluctuations – Does not follow any available pattern and not predictable; normally short period. Eg. Rise in the steel due the strike in the factory. 2. Cyclical – Large sine wave cycle about 8-10 year. Exhibit 4 phases (Peak → Recession → Depression →…
-
Day 12 (1)
Tutorial 11 Question 1:Explain the cycle of big data management. Example Answer Big data management involves a series of processes aimed at handling large volumes of data effectively. The cycle of big data management typically involves the following stages:Capture: This is the process of collecting data from various sources, including social media platforms, IoT devices,…
-
Day 10 (1)
Hive Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during insertion, example copy flat file into HDFS; update metadata in Hive Table (“msck repair table” command). Hive itself has specific folder and if data stored outside hive folder are known are external table.…
-
Day 8
Spark Cluster computing framework, processing framework using memory. To overcome map reduce performance issue. Cluster manager resource allocation base on job (request). Spark create Resilient Distributed Datasets (RDD) :: partition once it receive the data. Once RDD is ready, it uses graph transformation (Directed Acyclic Graph DAG) Consists of 2 phases: Transformation and Action **…
-
Day 7
Tutorial 6 HBase A distributed column-oriented data store built on top of HDFS. It is a part of Hadoop ecosystem that provides random real-time read/write to data in the Hadoop File System. HDFS (Write Once Read Many) HBase Not good for record lookup, only file lookup Fast record lookup Not good for incremental addition of…
-
Day 5 (1)
Tutorial 4 Tutorial 5 Discuss and evaluate suitable techniques/methods being used in literature while performing the big data analytics on the following:a) Market Basket Analysis.b) Customer Churn Prediction Analysis.Please support your discussion based on a research paper. Example AnswerThe big data analytics on Market Basket Analysis could help to· Provide combo offers based on products…
-
Day 3
What is the benefit of distributed? Using parallel concept, original task might complete in 11 hour, but if parallel in 4 machine, it would took only 3 hour. Challenges Hadoop Core Principle Hadoop Components Why Hadoop? (feature) Hadoop Definition Hadoop is an open-source software framework (LICENSE) for distributed storage and distributed parallel processing (HOW) of…
-
Day 2 (2)
Concept and terminology What is dataset? Collection or groups of related data. Dataset is like when a new student join in, he/she share the same common attribute/properties like other. What is algorithm? Algorithm is a set of rule/step/instruction for problem solving, that later can be implement into program. Algo vs program – You can execute…
