0% found this document useful (0 votes)

6 views

FHA UNIT 1 INTRODUCTION

This document provides an introduction to data and its significance in the information age, detailing types of data, sources, and the journey of data science. It also covers key concepts in statistics, including variables, populations, samples, and types of statistics such as descriptive and inferential statistics. Additionally, it highlights the applications of computational science and the tools and skills necessary for data analysis and machine learning.

Uploaded by

ancy rodhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

FHA UNIT 1 INTRODUCTION

Uploaded by

ancy rodhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT 1 INTRODUCTION

Introduction:
We are frequently reminded of the fact that we are living in the information age.
Appropriately, then, this book is about information—how it is obtained, how it is analyzed, and
how it is interpreted. The information about which we are concerned we call data, and the data
are available to us in the form of numbers.

Basic Concepts:
What is data?
Data is a collection of facts, such as numbers, words, measurements, observations.
Types of Data:
1.Structured Data: highly organized (Example Spread Sheet and Databases)
2.Unstructured Data: no regular structure (emails, social media posts, online blogs,
newspapers, books, and scientific publications)
3.Big Data: structured(DATABASES), semi-structured(XML, HTML), unstructured (photo,
video)
What are the sources of Data?
1.Routinely kept records
2.Surveys.
3.Experiments.
4.External sources(already existing datas)

What is the differences between data and information and knowledge?

Big Data goes beyond the simple concept of the data type or volume used. It also integrates
(i)analytical techniques(e.g., machine learning)
(ii)technologies that make it possible(e.g., parallel and cloud computing), and
(iii)modern visualization solutions(e.g., interactive graphing and infographics). (iv)Big Data
processing by applying a specialized combination of scientific approaches (e.g., statistics,
mathematics, informatics, and background knowledge in a specific area)

Storage -Data at Rest

Transfer -Data in Transit
Data in transit is digital data which is exchanged between the computing machines at the exact
moment of the transfer.
Secure File Transfer Protocol (SFTP), Secure HyperTextTransfer Protocol (HTTP), Off the Record
Messaging (OTR), Peer to Peer Communication (P2P), and Secure Sockets Layer (SSL) for data
encryption.
Process -Data in Use
Data in use is digital data being actively processed by computer applications at the exact
moment. That data is temporarily stored in random-access memory (RAM), processor registers, or
hardware cache.
Journey of Data Science:
•1962: John Tukey writes The Future of Data Analysis, where he envisions a new field for learning
insights from data
• 1977: Tukey publishes the book Exploratory Data Analysis, which is a key part of data science
today
• 1991: Guido Van Rossum publishes the Python programming language online for the first time,
which goes on to become the top data science language used at the time of writing
•1993: The R programming language is publicly released, which goes on to become the second
most-used data science general-purpose language
•2008: Jeff Hammerbacherand DJ Patil use the term "data scientist" in job postings after trying to
come up with a good job title for their work
•2010: Kaggle.com launches as an online data science community and data science competition
website
• 2010s: Universities begin offering masters and bachelor's degrees in data science; data science job
postings explode to new heights year after year; big breakthroughs are made in deep learning; the
number of data science software libraries and publications burgeons.
• 2015: TensorFlow (a deep learning and machine learning library) is released.
• 2018: Google releases cloud AutoML, democratizing a new automatic technique for machine
learning and data science.
• 2020: Amazon SageMaker Studio is released, which is a cloud tool for building, training,
deploying, and analyzing machine learning models

Data Science Competitions:

• Kaggle ($10K)
•Analytics Vidhya
• HackerRank
• DrivenData(focused on social justice)
• AIcrowd
A couple of websites that list data science competitions are:
ods.ai
www.mlcontests.com
The top data science tools and skills:
•Programming Languages: Python, R.
•Libraries: Pandas, NumPy, Scikit-learn.
•Databases: SQL, NoSQL.
•Big Data Tools: Hadoop, Spark
•GUI-Excel, GraphPad Prism
•Cloud tools
•Amazon Web Services (AWS) (general purpose)
•Google Cloud Platform (GCP) (general purpose)
•Microsoft Azure (general purpose)
•IBM (general purpose)
•Databricks (data science and AI platform)
•Snowflake (data warehousing)
Statistical methods and math:
• Exploratory analysis statistics (exploratory data analysis, or EDA), like statistical plotting
and aggregate calculations such as quantiles
• Statistical tests and their principles, like p-values, chi-squared tests, t-tests, and ANOVA
• Machine learning modeling, including regression, classification, and clustering methods
• Probability and statistical distributions, like Gaussian and Poisson distributions
Software development:
Python, Git & GitHub, Docker and Kubernetes
Statistical methods and math:
Jupyter Notebook to create a presentation allows one to actively demo Python or other
code during the presentation, unlike classic presentation software
Scope of the field:

Applications of Computational Science:

Data analysis-used to analyze large datasets to extract meaningful insights and patterns
Modeling and Simulation-used to develop and analyze mathematical models of complex systems
Machine learning-used to develop and apply algorithms automatically learning from data and
making predictions
Optimization-used to find the best or the most efficient/robust solution to a problem
Visualization-used to create visual representations of data and models

Data :
The raw material of statistics is data. For our purposes we may define data as numbers.
The two kinds of numbers that we use in statistics are numbers that result from the taking—in
the usual sense of the term—of a measurement, and those that result from the process of
counting. For example, when a nurse weighs a patient or takes a patient’s temperature, a
measurement, consisting of a number such as 150 pounds or 100 degrees Fahrenheit, is
obtained. Quite a different type of number is obtained when a hospital administrator counts the
number of patients—perhaps 20—discharged from the hospital on a given day. Each of the
three numbers is a datum, and the three taken together are data.
Statistics
Statistics is a field of study concerned with the collection, organization, summarization,
and analysis of data; and the drawing of inferences about a body of data when only a part of
the data is observed. For example The person who performs these statistical activities must be
prepared to interpret and to communicate the results to someone else as the situation demands.
Simply put, we may say that data are numbers, numbers contain information, and the purpose
of statistics is to investigate and evaluate the nature and meaning of this information.
Biostatistics
The tools of statistics are employed in many fields—business, education, psychology,
agriculture, and economics, to mention only a few. When the data analyzed are derived from
the biological sciences and medicine, we use the term biostatistics to distinguish this particular
application of statistical tools and concepts.
Variable -If, as we observe a characteristic, we find that it takes on different values in different
persons, places, or things, we label the characteristic a variable. Some examples of variables
include diastolic blood pressure, heart rate, the heights of adult males, the weights of preschool
children, and the ages of patients seen in a clinic.
1.Quantitative Variables -A quantitative variable is one that can be measured in the usual
sense. We can, for example, obtain measurements on the heights of adult males, the weights of
preschool children, and the ages of patients seen in a dental clinic. These are examples of
quantitative variables.
2.Qualitative Variables -Some characteristics are not capable of being measured in the sense
that height, weight, and age are measured. Many characteristics can be categorized only, as, for
example, when an ill person is given a medical diagnosis, a person is designated as belonging
to an ethnic group, or a person, place, or object is said to possess or not to possess some
characteristic of interest. In such cases measuring consists of categorizing. We refer to variables
of this kind as qualitative variables.

3.Random Variable -Whenever we determine the height, weight, or age of an individual, the
result is frequently referred to as a value of the respective variable. When the values obtained
arise as a result of chance factors, so that they cannot be exactly predicted in advance, the
variable is called a random variable. An example of a random variable is adult height. When a
child is born, we cannot predict exactly his or her height at maturity. Attained adult height is
the result of numerous genetic and environmental factors. Values resulting from measurement
procedures are often referred to as observations or measurements.
4.Discrete Random Variable -Variables may be characterized further as to whether they are
discrete or continuous. discrete variable is characterized by gaps or interruptions in the values
that it can assume. The number of daily admissions to a general hospital is a discrete random
variable since the number of admissions each day must be represented by a whole number,
such as 0, 1, 2, or 3. The number of admissions on a given day cannot be a number such as 1.5,
2.997, or 3.333.
5.Continuous Random Variable- A continuous random variable does not possess the gaps or
interruptions characteristic of a discrete random variable. A continuous random variable can
assume any value within a specified relevant interval. of values assumed by the variable.
Examples of continuous variables include the various measurements that can be made on
individuals such as height, weight, and skull circumference. No matter how close together the
observed heights of two people, for example, we can, theoretically, find another person whose
height falls somewhere inbetween.
4.Population-A population or collection of entities may, however, consist of animals,
machines, places, or cells. For our purposes, we define a population of entities as the largest
collection of entities a population of values as the largest collection of values of a random
variable for which we have an interest at a particular time. for example, we are interested in
the weights of all the children enrolled in a certain county elementary school system, our
population consists of all these weights. If our interest lies only in the weights of first-grade
students in the system, we have a different population—weights of first-grade students enrolled
in the school system. Hence, populations are determined or defined by our sphere of interest.
Populations may be finite or infinite. If a population of values consists of a fixed number of
these values, the population is said to be finite. If, on the other hand, a population consists of
an endless succession of values, the population is an infinite one.
5.Sample -A sample may be defined simply as a part of a population. Suppose our population
consists of the weights of all the elementary school children enrolled in a certain county school
system. If we collect for analysis the weights of only a fraction of these children, we have only
a part of our population of weights, that is, we have a sample.

Types Of Statistics:
Consider an example of a book dealing with lot of information. The objectives of this
book are twofold: (1) to teach the student to organize and summarize data, and (2) to teach the
student how to reach decisions about a large body of data by examining only a small part of it.
The concepts and methods necessary for achieving the first objective are presented under the
heading of descriptive statistics, and the second objective is reached through the study of what
is called inferential statistics.

Definition of Descriptive statistics:

It summarize and organize data to make it easier to understand. Instead of presenting
raw data, descriptive statistics help describe, show, or summarize data meaningfully.
Types of Descriptive Statistics:
Measures of Central Tendency: Describe the central point of a dataset.(Mean,
Median and Mode)
Measures of Dispersion (Spread): Describe the variability or spread of data points.
(Range, Variance and Standard Deviation)
Measures of Shape: Describe the shape of the distribution of data.(Skewness,
Kurtosis)

Red Cross Trauma Teddy Pattern
100% (4)
Red Cross Trauma Teddy Pattern
5 pages
Recipes From The Sweet Life in Paris by David Lebovitz
15% (121)
Recipes From The Sweet Life in Paris by David Lebovitz
14 pages
EDA 1
No ratings yet
EDA 1
137 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
DV - Unit 1
No ratings yet
DV - Unit 1
40 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
24 pages
Module 1 Introduction To DataScience and Analytics
No ratings yet
Module 1 Introduction To DataScience and Analytics
10 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
Chapter 1 (6)
No ratings yet
Chapter 1 (6)
62 pages
data evolution unit 1 material.docx
No ratings yet
data evolution unit 1 material.docx
28 pages
Unit 1
No ratings yet
Unit 1
28 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
Introduction To Data Science A Beginners Guide
No ratings yet
Introduction To Data Science A Beginners Guide
26 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
Andrews M. Doing Data Science in R. an Introduction...2021
No ratings yet
Andrews M. Doing Data Science in R. an Introduction...2021
486 pages
Week 1 - Ch 1
No ratings yet
Week 1 - Ch 1
70 pages
CH 1
No ratings yet
CH 1
34 pages
An Introduction to Data Science 1st Edition, (Ebook PDF) pdf download
100% (2)
An Introduction to Data Science 1st Edition, (Ebook PDF) pdf download
59 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
CHAPTER 1 - Introduction to Data Science
No ratings yet
CHAPTER 1 - Introduction to Data Science
67 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Statistical Methods For Social Sciences
No ratings yet
Statistical Methods For Social Sciences
31 pages
Unit-1
No ratings yet
Unit-1
84 pages
Session 2 More on Data and R - Data Organization Andf Visualization
No ratings yet
Session 2 More on Data and R - Data Organization Andf Visualization
77 pages
DS_Module 1
No ratings yet
DS_Module 1
57 pages
Data Science 5
100% (3)
Data Science 5
216 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets Maria C. Marianipdf download
100% (2)
Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets Maria C. Marianipdf download
56 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
Data Science
No ratings yet
Data Science
87 pages
C20 Combined
No ratings yet
C20 Combined
291 pages
Semana 1: The Data Scientist's Toolbox
No ratings yet
Semana 1: The Data Scientist's Toolbox
20 pages
Data Science
No ratings yet
Data Science
12 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Data Science Class X Notes
No ratings yet
Data Science Class X Notes
3 pages
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
No ratings yet
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
57 pages
Probability and Statistics Introduction to Statistics
No ratings yet
Probability and Statistics Introduction to Statistics
26 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
B Ei
No ratings yet
B Ei
44 pages
ITDS Unit 1_merged
No ratings yet
ITDS Unit 1_merged
86 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Data Science
No ratings yet
Data Science
8 pages
A65 Expt No 9 Sem II
No ratings yet
A65 Expt No 9 Sem II
28 pages
Data Science
No ratings yet
Data Science
40 pages
Data Science
No ratings yet
Data Science
59 pages
Lecture 1_ Introduction to Data Science
No ratings yet
Lecture 1_ Introduction to Data Science
38 pages
SM Session 1 IPL 2024 Post Session Slides
No ratings yet
SM Session 1 IPL 2024 Post Session Slides
44 pages
Carmichael MArron 2018 OJO
No ratings yet
Carmichael MArron 2018 OJO
22 pages
Data Science and Ai Education For Young Minds
No ratings yet
Data Science and Ai Education For Young Minds
75 pages
Intro to DS
No ratings yet
Intro to DS
37 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Experiment 4 UO
No ratings yet
Experiment 4 UO
10 pages
Indian Cyber Army Training Information Brochure
100% (1)
Indian Cyber Army Training Information Brochure
12 pages
Mixture and Alligation Class Sheet
No ratings yet
Mixture and Alligation Class Sheet
18 pages
PDF The Economics of Brexit: Revisited 1st Edition Edition Philip B. Whyman Download
100% (4)
PDF The Economics of Brexit: Revisited 1st Edition Edition Philip B. Whyman Download
49 pages
CTS Leave Policy
No ratings yet
CTS Leave Policy
23 pages
Skin Typing: Fitzpatrick Grading and Others:, MD,, MD
No ratings yet
Skin Typing: Fitzpatrick Grading and Others:, MD,, MD
7 pages
Gaz Band
No ratings yet
Gaz Band
2 pages
Technical Information: Gecko Frontal Uni
No ratings yet
Technical Information: Gecko Frontal Uni
3 pages
Manaseer Cluster4 Jordan
No ratings yet
Manaseer Cluster4 Jordan
7 pages
Jsce Recommendations For Upgrading of Concrete Structures With Use of Continuous Fiber Sheets
No ratings yet
Jsce Recommendations For Upgrading of Concrete Structures With Use of Continuous Fiber Sheets
91 pages
Scaffolds Scaffolding Work General Guide
No ratings yet
Scaffolds Scaffolding Work General Guide
18 pages
Project Report - Deekshith A M (19MG503158)
No ratings yet
Project Report - Deekshith A M (19MG503158)
119 pages
Acc224-Ais Final Exam Reviewer
No ratings yet
Acc224-Ais Final Exam Reviewer
3 pages
Idiots Guide To Writing
100% (1)
Idiots Guide To Writing
31 pages
Mock Aqe 1
No ratings yet
Mock Aqe 1
15 pages
CV - Website
No ratings yet
CV - Website
3 pages
Coursework Questions Twelfth Night
100% (2)
Coursework Questions Twelfth Night
8 pages
Smartbright Led Downlight 427761 Ffs Aen
No ratings yet
Smartbright Led Downlight 427761 Ffs Aen
4 pages
Exceptional Handling-1[1]
No ratings yet
Exceptional Handling-1[1]
11 pages
Fedex vs. UPS The Battle For Value Case
100% (1)
Fedex vs. UPS The Battle For Value Case
18 pages
Flow Around Modified Circular Cylinders: Paper CIT02-0357
No ratings yet
Flow Around Modified Circular Cylinders: Paper CIT02-0357
10 pages
MSDS Sr417a
No ratings yet
MSDS Sr417a
12 pages
City Government of San Pablo, Laguna Vs Reyes Degest
100% (1)
City Government of San Pablo, Laguna Vs Reyes Degest
2 pages
Discussion Session 4-11
No ratings yet
Discussion Session 4-11
12 pages
Private Browsing v0.2.4
No ratings yet
Private Browsing v0.2.4
12 pages
Jurnal KB Suntik
No ratings yet
Jurnal KB Suntik
5 pages
Food Product Technology Lab - I 17FP2023 Lab Manual
No ratings yet
Food Product Technology Lab - I 17FP2023 Lab Manual
15 pages
ReishiMax GLP PIP - SG (51112)
No ratings yet
ReishiMax GLP PIP - SG (51112)
3 pages

Uploaded by

Uploaded by

UNIT 1 INTRODUCTION

What is the differences between data and information and knowledge?

Storage -Data at Rest

Data Science Competitions:

Applications of Computational Science:

Definition of Descriptive statistics:

You might also like