0% found this document useful (0 votes)

62 views

Unit 1; Data Analytics (KCA-034)

This document provides an introduction to data analytics, outlining its significance for businesses in understanding customers and improving operations. It details the four types of data analytics: descriptive, diagnostic, predictive, and prescriptive, as well as various data types and their characteristics. Additionally, it discusses the evolution of analytic scalability and the importance of big data platforms in managing and analyzing large datasets.

Uploaded by

fictionalmenslove

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Unit 1; Data Analytics (KCA-034)

Uploaded by

fictionalmenslove

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Unit-1: Introduction to Data Analytics

Data Analytics: It refers to the process of examining datasets to draw conclusions about the information
they contain. Data analytics techniques enable you to take raw data and uncover patterns to extract
valuable insights from it.

Why Data Analytics?

 Data is the fuel that can drive a business to the right path.
 Data can help businesses better understand their customers, improve their advertizing campaigns,
personalize their contents and improve their bottom lines.
 The advantages of data are many, but you cannot access these benefits without the proper data
analytics tools and processes.
 While raw data has a lot of potential, you need analytics to unlock the power to grow your
business.
Types of Data Analytics
Data analytics is broken down into four basic types
1. Descriptive Analytics
2. Diagnostic Analytics
3. Predictive Analytics
4. Prescriptive Analytics
Descriptive Analytics:
 Descriptive statistics is the type of statistics which is used to summarize and describe the
datasets. It is used to describe characteristics of data. It describes what has happened over a given
period of time.
 Have the number of views gone up?
 Are Sales stronger this month than last?
 Descriptive statistics can be useful for two purposes:
1. To provide basic information about variables in a dataset and
2. To highlight potential relationships between variables.
Diagnostic Analytics
 The diagnostic analysis is a special type of analytical technique. In this, data is interpreted and
analyzed properly to find out what happened or caused a particular cyber breach. Diagnostic
Analytics helps you understand why something happened in the past. It focuses more on why
something happens. This involves more diverse data inputs and a bit of hypothesizing.
 Did the weather affect ice-cream sales?
 Did that marketing campaigns impact sales?
Predictive Analytics
Predictive Analytics is the branch of advanced analytics which is used to make predictions about
unknown future events. It is the use of data, statistical algorithms and machine learning techniques to
identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing
what has happened to providing a best assessment of what will happen in the future. It serves as decision
making tool.
Prescriptive Analytics
Prescriptive analytics focuses on finding the best course of action in a scenario, given the available data.
It is related to both descriptive analytics and predictive analytics. But it emphasizes actionable insights
instead of data monitoring. Prescriptive analytics is the final step of business analytics. It suggests a
course of action.
Data
Data are a set of values of quantitative or qualitative variables about one or more persons or objects.
When data is processed, organized, structured or presented in a given context so as to make it useful, it
is called information.
Types of Data
When this data has so much importance in our life then it becomes important to properly store and
process this without any error.
When dealing with datasets, the category of data plays an important role to determine which
preprocessing strategy would work for a particular set to get the right results or which type of statistical
analysis should be applied for the best results.
Data can be categorized as
 Qualitative Data type
 Quantitative Data Types
Qualitative or Categorical Data Types
Qualitative or categorical data describes the object under consideration using a finite set of discrete
classes. It means that this type of data cannot be counted or measured easily using numbers and
therefore divided into categories.
Example: Gender of person (male, female etc).
These are usually extracted from audio, images, or text medium. Another example can be of a
Smartphone brand that provides information about the current rating, the color of the phone, category of
the phone, and so on. All this information can be categorized as qualitative data.
Qualitative Data Type
There are two subcategories under this:
1. Nominal
2. Ordinal
Nominal Data Type
These are the set of values that don’t possess a natural ordering. For example, color of a Smartphone can be
considered as a nominal data type as we can’t compare one color with others. It is not possible to state that
‘Red’ is greater than ‘Blue’.
Other example can be the gender of a person where we can’t differentiate between male, female, or others.
Ordinal Data Type
These types of values have a natural ordering while maintaining their class of values. If we consider the size of a
clothing brand then we can easily sort them according to their name tag in the order of small < medium < large.
• The grading system while marking candidates in a test can also be considered as an ordinal data type where A+ is
definitely better than B grade.
For nominal data type where there is no comparison among the categories, one-hot encoding can be
applied which is similar to binary coding considering there are in less number. For the ordinal data type,
label encoding can be applied which a form of integer is encoding.
One-Hot Encoding Scheme
In one-hot encoding, for each level of a categorical feature, we create a new variable. Each category is
mapped with a binary variable. Containing either 0 or 1
An ordinal encoding involves mapping each unique label to an integer value. This type of encoding is
really only appropriate if there is a known relationship between the categories. For example, “S” is 38,
“L” is 40, and “XL”is 42...
Quantitative Data Type
This data type tries to quantify things and it does by considering numerical values that make it countable in
nature. The price of a Smartphone, discount offered, number of ratings on a product, the frequency of
processor of a Smartphone, or ram of that particular phone, all these things fall under the category of
Quantitative data types.
The key thing is that there can be an infinite number of values a feature can take. For instance, the price of a
Smartphone can vary from x amount to any value and it can be further broken down based on fractional values.
The two subcategories which describe them clearly are:
1. Discrete
2. Continuous

Discrete Data Type

The numerical values which fall under are integers or whole numbers are placed under this category.
The number of speakers in the phone, cameras, cores in the processor, the number of sims supported all
these are some of the examples of the discrete data type
Continuous Data Type
It can be divided up as much as you want, and measured to many decimal places. The fractional
numbers are considered as continuous values. These can take the form of the operating frequency of the
processors, the android version of the phone, wifi frequency, temperature of the cores, and so on.
Data Source
A data source, in the context of computer science and computer applications, is the location, where data
that is being used come from. In a database management system, the primary data source is the database,
which can be located in a disk or a remote server. The data source for a computer program can be a file,
a datasheet, a spreadsheet, an XML file or even hard-coded data within the program.
The following are the two sources of data:

1. Internal sources

 When data is collected from reports and records of the organization itself, they are known as the
internal sources.
 For example, a company publishes its annual report’ on profit and loss, total sales, loans, wages,
etc.

2. External sources

 When data is collected from sources outside the organisation, they are known as the external
sources. For example, if a tour and travel company obtains information on Karnataka tourism
from Karnataka Transport Corporation, it would be known as an external source of data.

Types of Data

A) Primary data

 Primary data means first-hand information collected by an investigator.

 It is collected for the first time.
 It is original and more reliable.
 For example, the population census conducted by the government of India after every ten years is
primary data.
B) Secondary data

 Secondary data refers to second-hand information.

 It is not originally collected and rather obtained from already published or unpublished sources.
 For example, the address of a person taken from the telephone directory or the phone number of
a company taken from Just data. Dial is secondary

Types of Data (Data Classification)

1. Structured Data
It is the data containing a defined data type, format, and structure. For example: transaction data,
traditional RDBMS, CSV files, and even simple spreadsheets etc

Structured data adheres to a pre-defined data model. It is, therefore, straightforward to analyze.
Structured data conforms to a tabular format with relationship between the different rows and
columns. This makes structured data extremely powerful.

It is possible to quickly aggregate data from various locations in the database. Structured data is
considered the most ‘traditional’ form of data storage, since the earliest versions of database management
systems (DBMS) were able to store, process and access structured data. RDBMS is the example of
structured data.

The RDBMS may store characteristics of the support calls as typical structured data, with at
tributes such as time stamps, machine type, problem type, and operating system
2. Unstructured Data
Unstructured data is information that is not arranged according to a pre set data model or
schema. Therefore it cannot be stored in a traditional relational database or RDBMS. Text and
multimedia are two common types of unstructured content. Many
business documents are unstructured, as are email messages, videos, photos, web pages, and
audio files.

From 80 to 90 percent of data generated and collected by organizations, is unstructured. And its
volumes are growing rapidly many times faster than the rate of growth for structured databases.

Unstructured data stores contain a wealth of information that can be used to guide business
decisions. However, unstructured data has historically been very difficult to analyze. With the help of AI
and machine learning, it is possible to uncover beneficial and actionable business intelligence out
of this type of data.
1. Semi-structured Data
Semi-structured data is a form of structured data that contain tags or other markers to separate semantic
elements and enforce hierarchies of records and fields with in the data. Therefore, it is also known as
self-describing structure. Examples of semi-structured data include JSON and XML are forms of
semi-structured data.
The reason that this third category exists (between structured and unstructured data) is because
semi-structured data is considerably easier to analyze than unstructured data. Many Big Data
solutions and tools have the ability to ‘read’ and process either JSON or XML. This reduces the
complexity to analyze structured data, compared to unstructured data.

Characteristics of Data

Data has several key characteristics that define its nature and usability. These characteristics include:

1. Accuracy – Data should be correct, reliable, and free from errors.

2. Completeness – Data should have all the necessary elements required for its intended use.
3. Consistency – Data should be uniform across different sources and applications.
4. Validity – Data should conform to predefined rules or formats.
5. Timeliness – Data should be up to date and available when needed.
6. Relevance – Data should be applicable and useful for the specific purpose.
7. Uniqueness – Data should be free from duplicate records.
8. Accessibility – Data should be easily retrievable and available for authorized users.
9. Security – Data should be protected against unauthorized access, alterations, or destruction.
10. Interpretability – Data should be structured in a way that allows easy understanding and
analysis.

Introduction to Big Data Platform

Big Data

Big data refers to extremely large and complex datasets that traditional data processing tools cannot
efficiently handle. It involves the collection, storage, analysis, and management of vast amounts of
structured, semi-structured, and unstructured data.

Key Characteristics of Big Data (The 5 Vs)

1. Volume – Large amounts of data generated from various sources (social media, IoT, sensors,
transactions, etc.).
2. Velocity – The speed at which data is generated and processed in real-time or near real-time.
3. Variety – Different types of data (text, images, videos, structured databases, logs, etc.).
4. Veracity – The quality and reliability of the data.
5. Value – Extracting meaningful insights and actionable intelligence from data.

A Big Data Platform is an integrated system that combines various big data tools and technologies to
enable data ingestion, storage, processing, and analysis of large and complex datasets. These
platforms help organizations handle structured, semi-structured, and unstructured data efficiently.

Key Characteristics of a Big Data Platform

1. Scalability – Can handle massive volumes of data.

2. Distributed Processing – Uses multiple servers (nodes) to process data in parallel.
3. Real-time and Batch Processing – Supports both real-time streaming data and historical batch
data.
4. High Availability & Fault Tolerance – Ensures system reliability even if some components
fail.
5. Multi-Source Data Support – Works with different data types, including structured (databases),
semi-structured (JSON, XML), and unstructured (videos, images, logs).

Components of a Big Data Platform

1. Data Ingestion – Tools like Apache Kafka, Flume, Sqoop for collecting data.
2. Data Storage – Distributed storage systems like Hadoop HDFS, Amazon S3, and Apache
Cassandra.
3. Data Processing – Frameworks like Apache Spark, Hadoop MapReduce for transforming and
analyzing data.
4. Data Analytics – Machine learning and analytics with tools like TensorFlow, Apache Mahout,
or business intelligence (BI) tools.
5. Data Visualization – Tools like Tableau, Power BI, and Grafana for insights.

Popular Big Data Platforms

 Apache Hadoop – Open-source framework for big data storage and processing.
 Apache Spark – Fast data processing engine for big data and machine learning.
 Google BigQuery – Cloud-based data warehouse for analytics.
 Amazon EMR (Elastic MapReduce) – Cloud-based Hadoop and Spark service.
 Microsoft Azure Synapse Analytics – Scalable cloud analytics service.

Need of Data Analytics

Data analytics is used to identify patterns, trends, and insights in large and complex data sets, and then
to use this information to make informed decisions. The need for data analytics arises from the
following factors:
1. Large and complex data sets: With the growth of data-driven businesses and the increasing
amount of data generated by organizations, the need for data analytics has grown.
2. Business insights: By analyzing large and complex data sets, organizations can gain valuable
insights into customer behavior, market trends, and other important aspects of their business.
3. Competitive advantage: Organizations that are able to effectively analyze data are better able to
compete in today's fast-paced business environment, as they can quickly identify new
opportunities and respond to changing market conditions.
4. Improved decision making: Data analytics helps organizations make better decisions by
providing them with a clearer picture of what is happening in their business. This includes
identifying risks, detecting anomalies, and tracking performance metrics.
5. Customer engagement: Data analytics can be used to gain a deeper understanding of customer
behavior, preferences, and opinions. This information can be used to improve customer
engagement and customer satisfaction.
6. Cost savings: Data analytics can help organizations identify areas where they can improve
efficiency, reduce waste, and lower costs.
7. Compliance: Data analytics can help organizations comply with regulations by tracking and
analyzing data related to key performance metrics, such as safety and security.
In summary, the need for data analytics arises from the need to gain business insights, improve
decision making, and stay competitive in today's data-driven business environment.
Evolution of Analytic Scalability
The evolution of analytic scalability refers to the development of technologies and methods that allow
organizations to analyze increasing amounts of data more efficiently and effectively. This evolution has
been driven by a number of factors, including:
1. Big Data: The growth of data-driven businesses has led to the development of new technologies
and methods for managing, storing, and analyzing big data.
2. Cloud computing: Cloud computing has made it possible for organizations to store and analyze
large amounts of data at a lower cost and with greater flexibility.
3. Artificial intelligence: Artificial intelligence (AI) and machine learning algorithms have made it
possible to analyze data at scale and automate many of the tasks involved in data analysis.
4. Parallel processing: Parallel processing technology allows organizations to distribute data
processing across multiple computers, making it possible to analyze larger data sets in a shorter
amount of time.
5. Real-time analytics: Real-time analytics technologies have made it possible to process and
analyze data in near real-time, which is essential for organizations that need to make data-
driven decisions quickly.
These advancements have allowed organizations to analyze data at an increasingly large scale, which
has led to more accurate and actionable insights. The continue evolution of analytic scalability will
likely result in even more sophisticated and efficient methods for analyzing data in the future.
The evolution of analytic scalability can be divided into the following stages:
1. Traditional Analytics: Early analytics relied on traditional databases and centralized
systems, limiting scalability.
2. Distributed Analytics: The rise of big data led to the development of distributed systems
such as Hadoop and NoSQL databases to handle the increasing volume of data.
3. Cloud Analytics: The adoption of cloud computing has enabled the use of cloud-based
analytics platforms for greater scalability and cost-effectiveness.
Analytic Process and Tools:
Data analytics involves several steps supported by various tools
Steps:
 Data Collection: Gathering relevant data from various sources.
 Data Cleaning: Remove or correcting inaccuracies in the data.
 Data Analysis: Using statistical and machine learning models to extract insights.
 Data Visualization: Presenting data in an understandable and actionable format.
Tools:
 Data Collection: Apache Kafka, Google Data Flow.
 Data Cleaning: OpenRefine, Python libraries like pandas.
 Data Analysis: Python, SAS.
 Data Visualization: tableau, Power BI, Matplotlib in Python
Analysis vs reporting

Definition

 Reporting:The process of collecting, organizing, and presenting data in a structured format,

often in the form of dashboards, summaries, or charts. It focuses on "what happened."
 Analysis: The process of interpreting data, identifying patterns, and drawing insights to explain
"why something happened" and "what could happen next."

Purpose

 Reporting: Provides raw data or aggregated metrics without deep interpretation. It is useful for
tracking key performance indicators (KPIs) and monitoring trends.
 Analysis: Digs deeper into the data to uncover trends, correlations, and insights that can inform
decision-making and strategy.

Process

 Reporting: Involves pulling data from sources and presenting it in structured formats (e.g.,
spreadsheets, dashboards, presentations).
 Analysis: Involves statistical techniques, data modeling, and critical thinking to provide
actionable recommendations.

Tools Used

 Reporting: BI tools (Tableau, Power BI), Excel, SQL queries, automated reports.
 Analysis: Advanced analytics tools (Python, R, SAS), machine learning, predictive modeling,
statistical tests.

Example

 Reporting: A sales report shows that revenue increased by 10% last quarter.
 Analysis: An investigation reveals that the increase was due to a seasonal promotion and a shift
in customer demographics.

Outcome

 Reporting: Provides visibility into business performance.

 Analysis: Helps drive decision-making by identifying root causes and predicting future trends.

Modern Data Analytic Tools:

Modern data analytics tools help businesses and individual’s process, visualize, and analyze large
datasets efficiently. Here are some of the most widely used tools categorized by their purpose:

1. Data Processing & Engineering

 Apache Spark – Distributed computing for big data processing.

 Apache Hadoop – Open-source framework for distributed storage and processing.
 Databricks – Cloud-based data engineering and AI platform.
 Google BigQuery – Serverless data warehouse with fast querying capabilities.

2. Business Intelligence (BI) & Visualization

 Tableau – Powerful data visualization and analytics platform.

 Power BI – Microsoft’s BI tool for interactive dashboards.
 Looker – Google Cloud’s BI tool with deep data exploration.
 Qlik Sense – Self-service BI and analytics platform.

3. Machine Learning & AI

 TensorFlow – Open-source ML framework by Google.

 PyTorch – Popular deep learning framework by Meta.
 H2O.ai – AutoML and AI-driven analytics.
 DataRobot – Automated machine learning platform.

4. SQL & Databases

 Snowflake – Cloud-based data warehousing solution.

 Amazon Redshift – Scalable data warehouse service by AWS.
 MySQL/PostgreSQL – Popular open-source relational databases.
 MongoDB – NoSQL database for flexible data structures.

5. Data Integration & ETL (Extract, Transform, Load)

 Apache Airflow – Workflow orchestration tool for data pipelines.

 Talend – ETL and data integration platform.
 Fivetran – Automated data movement and pipeline management.
 Informatica – Enterprise data integration tool.
6. Programming & Statistical Analysis

 Python (pandas, NumPy, SciPy) – Widely used for data science and analytics.
 R – Statistical computing and graphics.
 MATLAB – Used in technical computing and data analysis.

7. Cloud Data Platforms

 Google Cloud Dataflow – Stream and batch data processing.

 AWS Glue – Serverless data integration and ETL service.
 Microsoft Azure Synapse Analytics – Enterprise analytics service.

Applications of Data Analytics

Data analytics has a wide range of applications across industries. Here are some key areas where it is
used:

1. Business & Marketing

 Customer segmentation: Identifying target audiences for personalized marketing.

 Sales forecasting: Predicting future sales trends based on historical data.
 Churn prediction: Identifying customers likely to leave a service.
 Sentiment analysis: Understanding customer opinions from reviews and social media.

2. Finance & Banking

 Fraud detection: Identifying suspicious transactions in real time.

 Risk management: Assessing financial risks for investments and loans.
 Algorithmic trading: Using data models to automate stock trading.
 Customer credit scoring: Evaluating loan eligibility based on historical data.

3. Healthcare

 Predictive analytics: Forecasting disease outbreaks and patient deterioration.

 Medical imaging analysis: AI-powered diagnostics using MRI and CT scans.
 Personalized medicine: Tailoring treatments based on genetic data.
 Operational efficiency: Optimizing hospital resource allocation.

4. Manufacturing & Supply Chain

 Predictive maintenance: Preventing machine failures before they happen.

 Inventory optimization: Reducing waste by forecasting demand.
 Quality control: Identifying defects through image and sensor analysis.
 Supply chain analytics: Improving logistics and supplier performance.

5. Retail & E-Commerce

 Recommendation engines: Personalizing product suggestions (e.g., Amazon, Netflix).

 Price optimization: Adjusting prices based on demand and competitor pricing.
 Demand forecasting: Ensuring the right products are stocked at the right time.
 Customer behavior analysis: Understanding shopping patterns to improve user experience.

6. Sports & Entertainment

 Performance analysis: Analyzing athlete data to improve training.

 Fan engagement: Personalizing content recommendations for users.
 Game strategy optimization: Using data to develop winning strategies.
 Ticket pricing: Adjusting prices based on demand and competitor events.

7. Government & Public Policy

 Crime prediction: Identifying high-risk areas for better policing.

 Traffic management: Analyzing congestion patterns to optimize city planning.
 Public health monitoring: Detecting disease outbreaks and resource needs.
 Social program evaluation: Assessing the impact of policies and welfare programs.

8. Energy & Environment

 Smart grids: Optimizing electricity distribution based on usage patterns.

 Renewable energy forecasting: Predicting solar and wind power generation.
 Carbon footprint analysis: Tracking and reducing environmental impact.
 Disaster response: Using data to predict and mitigate the effects of natural disasters.

9. Education

 Student performance prediction: Identifying students at risk of failing.

 Personalized learning: Adapting lessons based on student data.
 Resource allocation: Optimizing school funding and staffing.
 Curriculum improvements: Using data to refine teaching methods.

Data Analytics Life Cycle

Need

The Data Analytics Life Cycle is essential for systematically processing and analyzing data to extract
valuable insights, make informed decisions, and solve business problems. It provides a structured
approach to data analysis, ensuring accuracy, efficiency, and repeatability. Here's why it's needed:

1. Structured Problem-Solving:

 It provides a clear roadmap for tackling complex data problems, ensuring that every step is
methodically approached.
 Helps in defining clear objectives and goals, reducing ambiguity and scope creep.

2. Efficient Data Handling:

 Ensures organized data collection, cleaning, and preprocessing, which enhances data quality and
reliability.
 Minimizes errors and biases that could impact analytical outcomes.
3. Improved Decision Making:

 Enables data-driven decision-making, leading to more accurate and strategic business insights.
 Facilitates predictive and prescriptive analytics, helping organizations anticipate trends and
optimize operations.

4. Cost and Time Efficiency:

 By following a systematic approach, resources are better utilized, reducing redundant efforts and
costs.
 Speeds up the data analysis process by streamlining workflows and methodologies.

5. Repeatability and Scalability:

 Provides a repeatable framework that can be reused for similar projects or scaled for larger
datasets.
 Ensures consistency across different data analytics projects.

6. Enhanced Communication and Collaboration:

 Promotes better communication among stakeholders by clearly defining each phase and expected
outcomes.
 Facilitates collaboration among data analysts, data scientists, and business teams.

7. Compliance and Security:

 Helps maintain data governance and compliance with legal regulations and industry standards.
 Ensures data security and privacy throughout the analytics process.

Key Role for Successful Analytics Projects:

Successful analytics projects typically involve a multidisciplinary team with clearly defined roles. Here
are the key roles that are crucial for success:
1. Project Sponsor/Stakeholder

 Role: Provides strategic direction, secures funding, and ensures alignment with business
objectives.
 Responsibilities:
o Define project goals and success metrics.
o Advocate for the project within the organization.
o Make high-level decisions and resolve escalated issues.

2. Project Manager

 Role: Manages timelines, resources, and communication among team members.

 Responsibilities:
o Develop and maintain the project plan.
o Coordinate team activities and stakeholder meetings.
o Monitor progress, budget, and risk management.

3. Data Analyst

 Role: Interprets and analyzes data to provide actionable insights.

 Responsibilities:
o Perform exploratory data analysis.
o Generate reports and dashboards.
o Communicate findings to stakeholders.

4. Data Scientist

 Role: Develops complex models and algorithms to extract insights from data.
 Responsibilities:
o Design and implement predictive and prescriptive models.
o Validate models for accuracy and reliability.
o Collaborate with data engineers for data processing needs.
5. Data Engineer

 Role: Manages data architecture, pipelines, and infrastructure.

 Responsibilities:
o Build and maintain data pipelines.
o Ensure data quality and integrity.
o Optimize database performance and storage.

6. Domain Expert/Subject Matter Expert (SME)

 Role: Provides industry or domain-specific knowledge to contextualize data.

 Responsibilities:
o Help interpret data in the context of business operations.
o Validate assumptions and model outcomes.
o Aid in defining relevant KPIs.

7. Data Architect

 Role: Designs the overall data architecture to ensure scalability and security.
 Responsibilities:
o Define data governance policies.
o Design data storage and retrieval systems.
o Ensure compliance with data security regulations.

8. Business Analyst

 Role: Bridges the gap between business needs and technical solutions.
 Responsibilities:
o Gather and document requirements.
o Ensure alignment between stakeholders and technical teams.
o Translate business needs into data and analytics requirements.
9. Visualization Expert/BI Developer

 Role: Designs and develops dashboards and visualizations.

 Responsibilities:
o Create intuitive and interactive dashboards.
o Ensure data visualization best practices.
o Maintain and optimize BI tools.

10. Data Governance Officer

 Role: Ensures data compliance, security, and governance.

 Responsibilities:
o Implement data governance frameworks.
o Manage data access controls and compliance.
o Ensure data privacy and ethical usage

Phases of Data Analytics Life Cycle:

1. Discovery: The discovery phase in data analytics is the initial stage where the project team
defines the scope, objectives, and feasibility of the analytics initiative. It involves understanding
the business problem, identifying relevant data sources, and formulating hypotheses to guide the
analysis. This phase sets the foundation for the entire project, ensuring alignment between
stakeholders and the analytics team.
2. Data Preparation: The data preparation phase is a critical step in data analytics where raw
data is cleaned, transformed, and organized to make it suitable for analysis. It is often the most
time-consuming part of the analytics workflow but is essential for ensuring accurate and reliable
results.
3. Model Planning: The model planning phase in data analytics is where the team designs the
analytical approach and selects suitable models and algorithms to address the business problem.
This phase involves defining the modeling techniques, evaluating different algorithms, and
creating a blueprint for the model development process..
4. Model Building: The model building phase is where the selected algorithms are implemented,
trained, and optimized using the prepared data. This phase involves coding, training models,
tuning hyper parameters, and testing performance to ensure the best possible outcome..
5. Communication Insights: The communicating insights phase is the final and crucial stage of
the data analytics lifecycle. In this phase, the team translates analytical findings into actionable
insights and effectively communicates them to stakeholders. The goal is to present data-driven
recommendations clearly and compellingly, enabling informed decision-making.
6. Operationalization : The operationalization phase is the stage where developed models and
insights are deployed into production environments to deliver business value. This phase ensures
that analytical solutions are integrated into business processes, monitored for performance, and
maintained for long-term use.

Linguaskill Speaking Complete Guide
No ratings yet
Linguaskill Speaking Complete Guide
4 pages
Blank Fillable Social Security Card Template
No ratings yet
Blank Fillable Social Security Card Template
3 pages
Unit - 1 Notes - Introduction To Data-Analytics PDF
50% (2)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
The Alcohol-Textbook PDF
100% (3)
The Alcohol-Textbook PDF
449 pages
Recipes From The Sweet Life in Paris by David Lebovitz
15% (121)
Recipes From The Sweet Life in Paris by David Lebovitz
14 pages
Steady Series A Deck
No ratings yet
Steady Series A Deck
20 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
Module 3 Data Types
No ratings yet
Module 3 Data Types
10 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
DA MOD 1
No ratings yet
DA MOD 1
60 pages
DATA ANALYSIS_Full_Note_Immersive 2
No ratings yet
DATA ANALYSIS_Full_Note_Immersive 2
13 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
DA(Unit 1)
No ratings yet
DA(Unit 1)
91 pages
Dou 10 06 2024 DBMS
No ratings yet
Dou 10 06 2024 DBMS
14 pages
I. Data Collection What Is Data?
No ratings yet
I. Data Collection What Is Data?
12 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
ML-Lecture-4-data
No ratings yet
ML-Lecture-4-data
22 pages
How data is col
No ratings yet
How data is col
11 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
Basics of Data and Types of Data
No ratings yet
Basics of Data and Types of Data
3 pages
Unit 2 1
No ratings yet
Unit 2 1
70 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
1 - Structured Analysis Methodology and Tools (20241204172416)
No ratings yet
1 - Structured Analysis Methodology and Tools (20241204172416)
30 pages
business Analytics (tanya pandey) mba m3a
No ratings yet
business Analytics (tanya pandey) mba m3a
64 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
Types of Data
No ratings yet
Types of Data
14 pages
Dr. Ayaz_Data Science Presentation
No ratings yet
Dr. Ayaz_Data Science Presentation
164 pages
Moduke 2 (1)
No ratings yet
Moduke 2 (1)
55 pages
Unit 3
No ratings yet
Unit 3
30 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
module 2
No ratings yet
module 2
55 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
ANL201 Study Unit 3 - 2023
No ratings yet
ANL201 Study Unit 3 - 2023
48 pages
Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
CHAPTER-1
No ratings yet
CHAPTER-1
149 pages
Classes of Data
No ratings yet
Classes of Data
10 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Introduction to Business Intelligence 2
No ratings yet
Introduction to Business Intelligence 2
23 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Lesson 4 Data, Data Analysis, Database, Database Management
No ratings yet
Lesson 4 Data, Data Analysis, Database, Database Management
42 pages
BA Th Exam
No ratings yet
BA Th Exam
38 pages
APznzab7l5VWoN_0b231os0Y7FdKa3_9cjevjnNWPzCSvJOaupzrsNt0kGceg6-X1WDd1Z12_vNl5AHrKfLTNkreibuZztkhanNTF55KHKNWaJjfmcvKbQe2Nb0-0NeG6wf8-FlBmB-qvgFS5iWpo4z6OGYmRV9bRfICmHb7Hqug4XDvpOSE5Y66_hVZnvPjQJGGy8WZZnWYa_7JiTLnPtptXaKjsEfeVNecE4wZ0-l
No ratings yet
APznzab7l5VWoN_0b231os0Y7FdKa3_9cjevjnNWPzCSvJOaupzrsNt0kGceg6-X1WDd1Z12_vNl5AHrKfLTNkreibuZztkhanNTF55KHKNWaJjfmcvKbQe2Nb0-0NeG6wf8-FlBmB-qvgFS5iWpo4z6OGYmRV9bRfICmHb7Hqug4XDvpOSE5Y66_hVZnvPjQJGGy8WZZnWYa_7JiTLnPtptXaKjsEfeVNecE4wZ0-l
4 pages
data 2
No ratings yet
data 2
48 pages
DAV 1 UNIT
No ratings yet
DAV 1 UNIT
30 pages
Introduction to Emerging Technologies (EMTE1012).
No ratings yet
Introduction to Emerging Technologies (EMTE1012).
6 pages
W1L1,2,3 Lecture Script
No ratings yet
W1L1,2,3 Lecture Script
17 pages
EDA 1
No ratings yet
EDA 1
137 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
Course 3
No ratings yet
Course 3
22 pages
Data Analytics v1.2 for DA only
No ratings yet
Data Analytics v1.2 for DA only
41 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
EDA
No ratings yet
EDA
52 pages
Unit 2 Data Analytics (1)
No ratings yet
Unit 2 Data Analytics (1)
33 pages
Hacks For GD
No ratings yet
Hacks For GD
2 pages
Prepare Well For The Topic
No ratings yet
Prepare Well For The Topic
1 page
Inclusion and Exclusion
No ratings yet
Inclusion and Exclusion
6 pages
Hass Diagram Lattice
No ratings yet
Hass Diagram Lattice
43 pages
Mca 1 Sem Discrete Mathematics Rca103 2021
No ratings yet
Mca 1 Sem Discrete Mathematics Rca103 2021
2 pages
Mca 1 Sem Discrete Mathematics Rca103 2020
No ratings yet
Mca 1 Sem Discrete Mathematics Rca103 2020
2 pages
Fcet Model Paper 2020-21
No ratings yet
Fcet Model Paper 2020-21
42 pages
Mithun cv-2
No ratings yet
Mithun cv-2
2 pages
Mock Aqe 1
No ratings yet
Mock Aqe 1
15 pages
Hacked Feasibility Report
0% (1)
Hacked Feasibility Report
2 pages
CN 7001 Advanced Concrete Technology1
0% (1)
CN 7001 Advanced Concrete Technology1
3 pages
No-Objection Certificate-Rashed
No ratings yet
No-Objection Certificate-Rashed
83 pages
McKenzie Model
No ratings yet
McKenzie Model
34 pages
DELTA IABG C EN 20220420 Web
No ratings yet
DELTA IABG C EN 20220420 Web
160 pages
CV - Website
No ratings yet
CV - Website
3 pages
The Why, How and What of Overhead Projectors
No ratings yet
The Why, How and What of Overhead Projectors
1 page
QXDM Logs On HTC Desire
No ratings yet
QXDM Logs On HTC Desire
1 page
Pathways 4 U. 2 Reading Section - Answer Key
No ratings yet
Pathways 4 U. 2 Reading Section - Answer Key
2 pages
Business Processes in SAP S/4HANA Discrete Shopfloor Control
No ratings yet
Business Processes in SAP S/4HANA Discrete Shopfloor Control
93 pages
Notes Chapter 2.1 Linear Motion
No ratings yet
Notes Chapter 2.1 Linear Motion
58 pages
GPB 316 Plant Biotechnology (2+1) - Online Study Material
No ratings yet
GPB 316 Plant Biotechnology (2+1) - Online Study Material
150 pages
Liner Notes LomaxCollection
No ratings yet
Liner Notes LomaxCollection
22 pages
International Banking and Finance
No ratings yet
International Banking and Finance
1 page
Nirdpr TC 2024 25
No ratings yet
Nirdpr TC 2024 25
158 pages
CIE IGCSE Physics Formula Sheet
No ratings yet
CIE IGCSE Physics Formula Sheet
2 pages
Esia Bridep Kazabe-rutsiro Road Final
No ratings yet
Esia Bridep Kazabe-rutsiro Road Final
436 pages
Soal Klas 9 Sem 1 Uh 1
0% (1)
Soal Klas 9 Sem 1 Uh 1
36 pages
Working Methodology: Planning For Green Field International Airport Near Agra, Up
No ratings yet
Working Methodology: Planning For Green Field International Airport Near Agra, Up
2 pages
RONNITEC Catalogue Version 2024
No ratings yet
RONNITEC Catalogue Version 2024
40 pages
Anatomy of The Psoas Muscle
No ratings yet
Anatomy of The Psoas Muscle
2 pages
Overlord Volume 1 - The Undead King (v2.13)
No ratings yet
Overlord Volume 1 - The Undead King (v2.13)
17 pages
Sowmya COMMUNITY OF KSHETRIYA RAJUS 3rd Assessment
No ratings yet
Sowmya COMMUNITY OF KSHETRIYA RAJUS 3rd Assessment
32 pages

Uploaded by

Uploaded by

Unit-1: Introduction to Data Analytics

Why Data Analytics?

Discrete Data Type

 Primary data means first-hand information collected by an investigator.

 Secondary data refers to second-hand information.

Types of Data (Data Classification)

1. Accuracy – Data should be correct, reliable, and free from errors.

Introduction to Big Data Platform

Key Characteristics of Big Data (The 5 Vs)

Key Characteristics of a Big Data Platform

1. Scalability – Can handle massive volumes of data.

Components of a Big Data Platform

Popular Big Data Platforms

Need of Data Analytics

 Reporting:The process of collecting, organizing, and presenting data in a structured format,

 Reporting: Provides visibility into business performance.

Modern Data Analytic Tools:

1. Data Processing & Engineering

 Apache Spark – Distributed computing for big data processing.

2. Business Intelligence (BI) & Visualization

 Tableau – Powerful data visualization and analytics platform.

3. Machine Learning & AI

 TensorFlow – Open-source ML framework by Google.

4. SQL & Databases

 Snowflake – Cloud-based data warehousing solution.

5. Data Integration & ETL (Extract, Transform, Load)

 Apache Airflow – Workflow orchestration tool for data pipelines.

7. Cloud Data Platforms

 Google Cloud Dataflow – Stream and batch data processing.

Applications of Data Analytics

1. Business & Marketing

 Customer segmentation: Identifying target audiences for personalized marketing.

2. Finance & Banking

 Fraud detection: Identifying suspicious transactions in real time.

 Predictive analytics: Forecasting disease outbreaks and patient deterioration.

4. Manufacturing & Supply Chain

 Predictive maintenance: Preventing machine failures before they happen.

5. Retail & E-Commerce

 Recommendation engines: Personalizing product suggestions (e.g., Amazon, Netflix).

6. Sports & Entertainment

 Performance analysis: Analyzing athlete data to improve training.

7. Government & Public Policy

 Crime prediction: Identifying high-risk areas for better policing.

8. Energy & Environment

 Smart grids: Optimizing electricity distribution based on usage patterns.

 Student performance prediction: Identifying students at risk of failing.

Data Analytics Life Cycle

2. Efficient Data Handling:

4. Cost and Time Efficiency:

5. Repeatability and Scalability:

6. Enhanced Communication and Collaboration:

7. Compliance and Security:

Key Role for Successful Analytics Projects:

 Role: Manages timelines, resources, and communication among team members.

 Role: Interprets and analyzes data to provide actionable insights.

 Role: Manages data architecture, pipelines, and infrastructure.

6. Domain Expert/Subject Matter Expert (SME)

 Role: Provides industry or domain-specific knowledge to contextualize data.

 Role: Designs and develops dashboards and visualizations.

10. Data Governance Officer

 Role: Ensures data compliance, security, and governance.

Phases of Data Analytics Life Cycle:

You might also like