0% found this document useful (0 votes)
64 views

Statistical Thinking in Python Ii: Welcome To The Course!

This document provides an overview of the course "Statistical Thinking in Python II". The course will teach students to estimate parameters, compute confidence intervals, perform linear regressions, and test hypotheses using Python. It will use simulations to literally demonstrate probability and apply statistical principles broadly. Topics will include optimal parameter estimation, checking normality of data, and performing least squares linear regression. The importance of exploratory data analysis is demonstrated through Anscombe's quartet dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Statistical Thinking in Python Ii: Welcome To The Course!

This document provides an overview of the course "Statistical Thinking in Python II". The course will teach students to estimate parameters, compute confidence intervals, perform linear regressions, and test hypotheses using Python. It will use simulations to literally demonstrate probability and apply statistical principles broadly. Topics will include optimal parameter estimation, checking normality of data, and performing least squares linear regression. The importance of exploratory data analysis is demonstrated through Anscombe's quartet dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

STATISTICAL THINKING IN PYTHON II

Welcome
to the course!
Statistical Thinking in Python II

You will be able to


Estimate parameters

!
ta
Compute confidence intervals

da
Perform linear regressions

al
re
Test hypotheses

ith

w
Statistical Thinking in Python II
Statistical Thinking in Python II

We use hacker statistics

Literally simulate probability


Broadly applicable with a few principles
Statistical Thinking in Python II

Statistical analysis of the beak of the finch

Geospiza fortis

Geospiza scandens

Source: John Gould, public domain


STATISTICAL THINKING IN PYTHON II

Let's start
thinking statistically!
STATISTICAL THINKING IN PYTHON II

Optimal parameters
Statistical Thinking in Python II

Histogram of Michelson's measurements

Data: Michelson, 1880


Statistical Thinking in Python II

CDF of Michelson's measurements

Data: Michelson, 1880


Statistical Thinking in Python II

Checking Normality of Michelson data


In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: mean = np.mean(michelson_speed_of_light)

In [4]: std = np.std(michelson_speed_of_light)

In [5]: samples = np.random.normal(mean, std, size=10000)


Statistical Thinking in Python II

CDF of Michelson's measurements

Data: Michelson, 1880


Statistical Thinking in Python II

CDF with bad estimate of st. dev.

Data: Michelson, 1880


Statistical Thinking in Python II

CDF with bad estimate of mean

Data: Michelson, 1880


Statistical Thinking in Python II

Optimal parameters

Parameter values that bring the model in


closest agreement with the data
Statistical Thinking in Python II

Mass of MA large mouth bass

CDF for "optimal" parameters


of a bad model

Source: Mass. Dept. of Environmental Protection


Statistical Thinking in Python II

Packages to do statistical inference


scipy.stats

statsmodels

hacker stats
with numpy

Knife image: D-M Commons, CC BY-SA 3.0


STATISTICAL THINKING IN PYTHON II

Lets practice!
STATISTICAL THINKING IN PYTHON II

Linear regression
by least squares
Statistical Thinking in Python II

2008 US swing state election results

Data retrieved from Data.gov (h!ps://www.data.gov/)


Statistical Thinking in Python II

2008 US swing state election results

slope

intercept

Data retrieved from Data.gov (h!ps://www.data.gov/)


Statistical Thinking in Python II

2008 US swing state election results

Data retrieved from Data.gov (h!ps://www.data.gov/)


Statistical Thinking in Python II

Residuals

residual

Data retrieved from Data.gov (h!ps://www.data.gov/)


Statistical Thinking in Python II

Least squares

The process of finding the parameters for which


the sum of the squares of the residuals is minimal
Statistical Thinking in Python II

Least squares with np.polyfit()


In [1]: slope, intercept = np.polyfit(total_votes,
...: dem_share, 1)

In [2]: slope
Out[2]: 4.0370717009465555e-05

In [3]: intercept
Out[3]: 40.113911968641744
STATISTICAL THINKING IN PYTHON II

Lets practice!
STATISTICAL THINKING IN PYTHON II

The importance of
EDA: Anscombe's
quartet
Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Look before you leap!

Do graphical EDA first


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


Statistical Thinking in Python II

Anscombe's quartet

Data: Anscombe, The American Statistician, 1973


STATISTICAL THINKING IN PYTHON II

Lets practice!

You might also like