Tutorial 11
Question 1:
Explain the cycle of big data management.
Example Answer

Big data management involves a series of processes aimed at handling large volumes of data effectively. The cycle of big data management typically involves the following stages:
Capture: This is the process of collecting data from various sources, including social media platforms, IoT devices, and enterprise systems. The data may be structured, semi-structured, or unstructured.
Organize: After data acquisition, the next stage involves storing the data in a way that it can be easily accessed and processed. This may involve using distributed file systems, cloud-based storage solutions, or data warehouses.
Integrate: Once the data is stored, it needs to be processed to derive insights and generate value. This may involve various techniques such as data cleaning, data transformation, data integration, and data enrichment.
Analyze: In this stage, the data is analyzed to derive insights and generate value. This may involve various techniques such as statistical analysis, machine learning, and data visualization.
Act: Once the insights are generated, the next stage involves acting upon to project the findings in a visual form that is easy to understand. This may involve using charts, graphs, and other visual aids. After the insights have been generated and presented, monetize the data analysis to optimize processes, improve customer experiences, or develop new revenue streams.
Question 2:
Describe the processes in descriptive analytics.
Example Answer
Descriptive analytics is the process of analyzing historical data to gain insights into past events and trends. The goal of descriptive analytics is to understand what happened and why it happened by summarizing and visualizing data. The following are the processes involved in descriptive analytics:
Data Collection: The first step in descriptive analytics is to collect relevant data from various sources. This can include internal sources such as databases and spreadsheets, as well as external sources such as customer feedback, social media, and surveys.
Data Cleaning: Once the data has been collected, it is essential to clean and preprocess it. This involves removing any errors, inconsistencies, and outliers from the data set. The data is then transformed into a structured format that can be easily analyzed.
Data Exploration: The next step is to explore the data using various techniques such as summary statistics, histograms, and scatter plots. This helps in understanding the distribution of the data and identifying any patterns or trends.
Data Visualization: Data visualization techniques such as charts, graphs, and dashboards are used to represent the data visually. This makes it easier to interpret the data and communicate insights to stakeholders.
Data Analysis: The data is then analyzed using various statistical techniques such as regression analysis, time series analysis, and clustering. This helps in identifying correlations, trends, and patterns in the data.
Report Generation: Finally, the insights gained from the analysis are presented in the form of reports and dashboards. These reports highlight the key findings, trends, and patterns in the data and provide recommendations for future actions.
Question 3:
Among the four types of big data analytics, which one is the most valuable? Explain your answer.
Example Answer
Descriptive Analytics: It is valuable in identifying the trends and patterns in past data and help business to make decisions based on past experiences.
Diagnostic Analytics: It is valuable in identifying the root cause of the problems and help business to prevent the similar problems from occurring in the future.
Predictive Analytics: It is valuable in identifying the trends and patterns that are likely to occur and help business to make decisions based on future scenario.
Prescriptive Analytics: It is valuable in optimizing the business processes and help to make decisions based on data-driven insights.
Each type of big data analytics is valuable in it own way and purpose. Hence, it depends on the needs and goals of every business.
Question 4:
What are the typical skills of a data scientist?
Example Answer
Skill sets of Data scientist
1. Mathematics (Statistics)
2. Programming skills
3. Machine Learning
3. Knowledge in the problem domain
My Try
Ans 1
Generate -> Collect -> Store -> Analyze
Data generation from human recording or machine generating logs.
Data collecting using parallel processing.
Data store in distributed storage.
Data Analyze by using statistic, mathematic and machine learning.
Ans 2
Descriptive analytics is the earliest stage of understanding the data. Normally this would represent the stage of visualize date to elaborate the meaning of data, also known as information. Descriptive analytics tells what are happened using the data collected. Typical visualization consists of report and dashboard. Descriptive analytics normally is the first steppingstone before went into diagnostic (drill down).
Ans 3
There are 4 type of big data analytics, they are known as diagnostic, descriptive, predictive and prescriptive. I would say that prescriptive analytics is the most valuable among all. Prescriptive analytics is performed after all the stages of diagnostic, descriptive and predictive. This would mean that it contains of all the learning performed ahead. Prescriptive analytics prevent future re-occurrence of event by attending any earlier signal with corrective action before actual event happen. This would reduce similar mistake happen again or able to tackle same opportunity if it is rehappening.
Ans 4
Data scientist is the person that has interdisciplinary knowledge to harvest insight and information from data. From my understanding, a data scientist would have followed skill.
1. Data Extraction – ability to get data from various sources loaded into tool for processing.
2. Data Engineer – ability to clean and transform data into meaningful feature
3. Data Analytic – ability to study and research to explain the data; look for pattern and signal from the data
4. Data Modeling – ability to justify a model to explain the data using statistic and machine learning.
5. MLOps – continuous testing and retrain modal to fit the data changes overtime
6. Business skill – ability to translate insight into valuable information for business

Leave a comment