Concept and terminology

What is dataset?
Collection or groups of related data. Dataset is like when a new student join in, he/she share the same common attribute/properties like other.
What is algorithm?
Algorithm is a set of rule/step/instruction for problem solving, that later can be implement into program.
Algo vs program – You can execute a program but cannot execute an algo. It is because program are wrote in code but algo is more like a regression “y = mx + b”, this can be wrote in different program but share same logic.
**Good point by kevin, normally we would just show other people what we had wrote, show program but by right we should should the pseudocode about what we had implement.
What is Business Intelligent?
Derive insight for decision making, same as age/era it is limited by structured data/relational/table, more tend towards diagnostic analysis.
What is Data Science?
More to predictive analysis

Forecasting vs Predictive – forecasting more to numerical linear regression to predict; predictive more like Facebook Phophet, able to predict by periodical.
Data generation
- human generate: i record this
- machine generate: iot, sensor
Data Lineage
data generation —–> strategic decision making
table process
file |
database analysis
distributed database |
data warehouse analytic
cloud |
big data analytic
One example of big data application will be super market selling bread, bottom of rack is milk and pampers. This is also known as market bucket analysis or association rule mining.
Driver for Data Analytic
- Desire to optimize business operations – Sales, pricing, profitability, efficiency
- Desire to identity business risk – customer churn, fraud, defaulter
- Predict new business opportunities – upsell, cross-sell, best new customer prospects
- Comply with laws or regulatory requirements – anti-money laundering, HIPAA, DPA
Data Analytic Stage
- Diagnostic (WHY) – FA/interactive/drill down/comparison, finding root cause
- Descriptive (WHAT) – report dashboard, telling what already happen
- Predictive (WHEN) – forecast line, process mining (conformance check + linear regression)
- Prescriptive (HOW) – Gold Silver Bronze/Red Yellow Green flagging with corrective action
Big Data Characteristic
- volume
- variety (structure, un-structure (not follow specific format), semi-structure)
- velocity
- later more and more people intro more “V”
- veracity – IBM, how trustworthy is the data, uncertainty of data
- value

Leave a comment