Starting off as a muggle that naïve to the Math's and Data Science world.

Blog


  • Day 20, 35 (1)

    Start programming

  • Day 19, 33, 34

    Start programming

  • Day 18, 33

    Start programming

  • Day 17, 30

    Start programming

  • Day 16, 28 (2), 29

    What is dataset ? holding data about particular subject.row = observationcolumn = variable What is SAS? Statistical Analysis System. What is sas library? To store data permanently, maximum 8 character long. What is sas work? To store data temporary. Start programming 1. Create working folder. 2. Upload excel. 3. Create library. 4. Import excel to…

  • Day 15, 28 (1)

    Get Started 1. Sign-up. https://www.sas.com/en_my/home.html 2. If not responding, look for SAS chatroom. https://www.sas.com/en_us/contact.html 3. Look for “Launch” icon. What is Dataset? Holding data about particular subject, a collection of data that is relate to a particular subject. What is Observation? Row in dataset. What is Variable? Column in dataset, maximum 32 character long. Must…

  • Introduction to R-programming

    Day 2 Day 5 Day 9 Day 10 Day 12

  • Big Data Analytics & Technologies

    Day 1 Day 2 Day 3 Day 4 Day 6 Day 7 Day 8 Day 9 Day 10

  • Day 12 (2)

    Perks! Add “dplyr” package to enable following function select(variable_name, column_name) → similar like SQL select, can perform column alias, range column name, start/end with column name and exclude column. Result: filter(variable_name, condition) → single or multiple logical condition; similar to subset(). Result: mutate(variable_name, column_name) → create, edit or delete column (fill value with NULL). Result:…

  • Day 12 (1)

    Tutorial 11 Question 1:Explain the cycle of big data management. Example Answer Big data management involves a series of processes aimed at handling large volumes of data effectively. The cycle of big data management typically involves the following stages:Capture: This is the process of collecting data from various sources, including social media platforms, IoT devices,…

  • Day 11

    Tutorial 10.5

  • TED : TikTok’s CEO on its future — and what makes its algorithm different

    CA: If you were advising parents here what time they should actually recommend to their teenagers, what do you think is the right setting? SC: Well, 60 minutes, we did not come up with it ourselves. So I went to the Digital Wellness Lab at the Boston Children’s Hospital, and we had this conversation with them. And 60 minutes was the…

  • Day 10 (2)

    Data Structure (con’t) data.frame(data) → two dimension, multiple datatype Result: Add data rbind(variable_name, variable_name) Result: cbind(variable_name, variable_name) Result: Import data read.csv(file_path, header=TRUE/FALSE) Access column Result: Delete column Result: Data frame function nrow(variable_name) Result: 4 ncol(variable_name) Result: 3 dim(variable_name) Result: names(variable_name) Result: head(variable_name, quantity) Result: tail(variable_name, quantity) Result: list(data) → multi dimension, multiple datatype Result:

  • Day 10 (1)

    Hive Engine translate HQL to Map Reduce. Data stored in table stored in HDFS as flat files. Data is not verify during insertion, example copy flat file into HDFS; update metadata in Hive Table (“msck repair table” command). Hive itself has specific folder and if data stored outside hive folder are known are external table.…

  • Day 9 (2)

    Data Structure the way how to store data in memory. For example, store data in matrix form, 5row x 3column. factor(data,levels,labels) → single dimension, kinda like relational database dim::fact concept Result: so what happen here is that data is “a”, “a”, “b”, 1. But due to levels parameter, “1” is remove to end and all…