CS3361 - Data Science Lab Manual-1
CS3361 - Data Science Lab Manual-1
ENGINEERING
Name :_____________________
Register No :_____________________
External Examiner:
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical
education and Management education and to become a World Class University.
Mission of Institution
To equip students with values, ethics and life skills needed to enrich their lives and
M3
enable them to meaningfully contribute to the progress of society
M4 To prepare students for higher studies and lifelong learning, enrich them with the
practical and entrepreneurial skills necessary to excel as future professionals and
contribute to Nation’s economy
Vision of Department
To emerge as a globally prominent department, developing ethical computer professionals,
innovators and entrepreneurs with academic excellence through quality education and research.
Mission of Department
M3 To produce engineers with good professional skills, ethical values and life skills for the
betterment of the society.
PEO1 To address the real time complex engineering problems using innovative approach
with strong core computing skills.
PEO3 Apply ethical knowledge for professional excellence and leadership for the
betterment of the society.
PEO4 Develop life-long learning skills needed for better employment and
entrepreneurship
An ability to understand the core concepts of computer science and engineering and to enrich
PSO1 problem solving skills to analyze, design and implement software and hardware based
systems of varying complexity.
To interpret real-time problems with analytical skills and to arrive at cost effective and
PSO2 optimal solution using advanced tools and techniques.
COURSE OUTCOMES:
CO2: Make use of the basic Statistical and Probability measures for data science.
i. NUMPY PACKAGE
Description:
NumPy is the fundamental package needed for scientific computing with Python.
OUTPUT:
1
ii. SCIPY PACKAGE
Description
OUTPUT:
2
OUTPUT:
Description
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and
perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting
functions, and result statistics are available for different types of data and each estimator.
Researchers across fields may find that statsmodels fully meets their needs for statistical
computing and data analysis in Python.
3
OUTPUT:
v. PANDAS PACKAGES
4
OUTPUT:
RESULT:
Thus download, install and explore the features of numpy, scipy, jupyter, statsmodels and pandas
packages are successfully completed.
5
EX NO:
WORKING WITH NUMPY ARRAYS
PROCEDURE:
PROGRAM
import numpy
numpy. version
OUTPUT:
'1.21.5'
import numpy as np
OUTPUT:
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
dtype: int32
itemsize: 4 bytes
i. Indexing of Arrays
Getting and setting the value of individual array elements.
Array Indexing: Accessing Single Elements
If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite
familiar. In a one-dimensional array, you can access the ith value (counting from zero) by
specifying the desired index in square brackets, just as with Python lists
To index from the end of the array, you can use negative indices.
import numpy as np
arr=np.array([5, 0, 3, 3, 7, 9])
print(arr[2])
print(arr[-1])
print(arr[5])
print(arr[-2])
OUTPUT:
3
8
Array Indexing: Multidimensional Array
import numpy as np
print(arr[0,0])
print(arr[2,-2])
OUTPUT:
import numpy as np
arr=np.array([0,1,2,3,4,5,6,7,8,9])
arr1=np.arange(10)
print(arr[1:5])
print(arr1[2:7])
print(arr[5:])
print(arr[:5])
print(arr1[-3:-1])
print(arr1[::2])
print(arr[0::3])
9
print(arr1[::-1]) # all elements, reversed
OUTPUT:
[1 2 3 4]
[2 3 4 5 6]
[5 6 7 8 9]
[0 1 2 3 4]
[7 8]
[0 2 4 6 8]
[0 3 6 9]
[9 8 7 6 5 4 3 2 1 0]
import numpy as np
OUTPUT:
[[12 5 2]
[ 7 6 8]]
[[12 2]
[ 7 8]
[ 1 7]]
[[ 7 7 6 1]
[ 8 8 6 7]
[ 4 2 5 12]]
10
iii. Reshaping of Arrays
Changing the shape of a given array. Another useful type of operation is reshaping of arrays. The
most flexible way of doing this is with the reshape() method. For example, if you want to put the
numbers 1 through 9 in a 3×3 grid, you can do the following:
import numpy as np
grid = np.arange(0, 9)
print(grid)
print(grid.reshape(3,3))
OUTPUT:
[0 1 2 3 4 5 6 7 8]
[[0 1 2]
[3 4 5]
[6 7 8]]
11
EX NO:
WORKING WITH PANDAS DATA FRAMES
PROCEDURE:
Example 1
Step 1: Import the necessary library - `import pandas as pd`.
Step 2: Create a list of tuples named `employees` with employee data.
Step 3: Create a DataFrame `df` from the list of tuples with columns specified as 'Name', 'Age', 'City', and
'Salary' using `pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Salary'])`.
Step 4: Select the 'City' column from the DataFrame using `result = df["City"]`.
Step 5: Print the result using `print(result)`.
Example 2 :
Step 1: Import the necessary library - `import pandas as pd`.
Step 2: Create a list of tuples named `employees` with employee data.
13
Step 3: Create a DataFrame `df` from the list of tuples with columns specified as 'Name', 'Age', 'City', and
'Salary' using `pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Salary'])`.
Step 4: Select multiple columns ('Name', 'Age', 'Salary') from the DataFrame using `result = df[["Name",
"Age", "Salary"]]`.
Step 5: Print the result using `print(result)`.
14
PROGRAM:
import pandas as pd
data={"calories":[420,380,390],"duration":[50,40,45]}
df=pd.DataFrame(data) #Load data into DataFrame object
print(df)
print(df.loc[0]) #Refer to the row index
print(df.loc[[0,1]]) #Use a list of indexes
OUTPUT:
calories duration
0 420 50
1 380 40
2 390 45
calories 420
duration 50
Name: 0, dtype: int64
import pandas as pd
data={"calories":[420,380,390],"duration":[50,40,45]}
df=pd.DataFrame(data,index=["day1","day2","day3"])
print(df)
OUTPUT:
calories duration
day1 420 50
day2 380 40
day3 390 45
15
b) Locate Named Indexes
Use the named index in the loc attribute to return the specified row(s).
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print(df)
# select two columns
print(df[['Name', 'Qualification']])
OUTPUT:
As is evident from the output, the keys of a dictionary is converted into columns of a dataframe
whereas the elements in lists are converted into rows.
16
Name Qualification
0 Jai Msc
1 Princi MA
2 Gaurav MCA
3 Anuj Phd
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Adding index value explicitly
df = pd.DataFrame(data,index=['Rollno1','Rollno2','Rollno3','Rollno4'])
print(df)
OUTPUT:
Name Age Address Qualification
Rollno1 Jai 27 Delhi Msc
Rollno2 Princi 24 Kanpur MA
Rollno3 Gaurav 22 Allahabad MCA
Rollno4 Anuj 32 Kannauj Phd
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame.from_dict(data) #from_dict() function
print(df)
17
OUTPUT:
Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
OUTPUT
Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
18
Column selection
In order to select a column in Pandas DataFrame, we can either access the columns by calling
them by their columns name.
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select two columns
print(df[['Name', 'Qualification']])
OUTPUT:
Name Qualification
0 Jai Msc
1 Princi MA
2 Gaurav MCA
3 Anuj Phd
Column Addition
In Order to add a column in Pandas DataFrame, we can declare a new list as a column and add to
a existing DataFrame.
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Declare a list that is to be converted into a column
address=['Delhi', 'Kanpur', 'Allahabad', 'Kannauj']
df['Address']=address
19
# select two columns
print(df)
OUTPUT:
Name Age Qualification Address
0 Jai 27 Msc Delhi
1 Princi 24 MA Kanpur
2 Gaurav 22 MCA Allahabad
3 Anuj 32 Phd Kannauj
v. Indexing and selecting data in Pandas DataFrame using [ ], loc & iloc
Indexing in Pandas means selecting rows and columns of data from a Dataframe. It can be
selecting all the rows and the particular number of columns, a particular number of rows, and all
the columns or a particular number of rows and columns each. Indexing is also knownas
Subset selection.
import pandas as pd
# List of Tuples
employees = [('Stuti', 28, 'Varanasi', 20000),
('Saumya', 32, 'Delhi', 25000),
('Aaditya', 25, 'Mumbai', 40000),
('Saumya', 32, 'Delhi', 35000)
('Saumya', 32, 'Delhi', 30000),
('Saumya', 32, 'Mumbai', 20000),
('Aaditya', 40, 'Dehradun', 24000),
('Seema', 32, 'Delhi', 70000)]
20
print(df)
OUTPUT
# List of Tuples
df = pd.DataFrame(employees,
# to select a column
result = df["City"]
print(result)
21
OUTPUT:
0 Varanasi
1 Delhi
2 Mumbai
3 Delhi
4 Delhi
5 Mumbai
6 Dehradun
7 Delhi
Example 2:
Select multiple columns.
import pandas as pd
# List of Tuples
OUTPUT:
Name Age Salary
0 Stuti 28 20000
1 Saumya 32 25000
2 Aaditya 25 40000
22
3 Saumya 32 35000
4 Saumya 32 30000
5 Saumya 32 20000
6 Aaditya 40 24000
7 Seema 32 70000
# List of Tuples
# on a Dataframe
result = df.loc["Stuti"]
24
# Show the dataframe
print(result)
OUTPUT:
Age 28
City Varanasi
Salary 20000
Example 2:
Select multiple rows.
# import pandas
import pandas as pd
# List of Tuples
25
# on a Dataframe
result = df.loc[["Stuti","Seema","Aaditya"]]
print(result)
OUTPUT:
Example 3:
Select multiple rows and particular columns.
Syntax: Dataframe.loc[["row1", "row2"...], ["column1", "column2", "column3"...]]
# import pandas
import pandas as pd
# List of Tuples
26
('Saumya', 32, 'Delhi', 35000),
# on a Dataframe
# multiple columns
print(result)
OUTPUT:
27
Example 4:
Select all the rows with some particular columns. We use a single colon [ : ] to select all rows
and the list of columns that we want to select as given below :
Syntax: Dataframe.loc[[:, ["column1", "column2", "column3"]]
# import pandas
import pandas as pd
# List of Tuples
df = pd.DataFrame(employees,
# on a Dataframe
28
# select all the rows with
print(result)
OUTPUT:
Example 1:
# import pandas
29
import pandas as pd
# List of Tuples
result = df.iloc[2]
print(result)
OUTPUT:
Name Aaditya
Age 25
30
City Mumbai
Salary 40000
Example 2:
Select multiple rows.
# import pandas
import pandas as pd
# List of Tuples
df = pd.DataFrame(employees,
31
print(result)
OUTPUT:
Name Age City Salary
2 Aaditya 25 Mumbai 40000
3 Saumya 32 Delhi 35000
5 Saumya 32 Mumbai 20000
Example 3:
Select multiple rows with some particular columns.
# import pandas
import pandas as pd
# List of Tuples
df = pd.DataFrame(employees,
# to select multiple rows with# some particular columns result = df.iloc[[2, 3, 5],[0, 1]] # Show the
dataframe print(result)
32
OUTPUT:
Name Age
2 Aaditya 25
3 Saumya 32
5 Saumya 32
Example 4:
Select all the rows with some particular columns.
# import pandas
import pandas as pd
# List of Tuples
33
('Saumya', 32, 'Delhi', 30000),
print(result)
OUTPUT:
Name Age
0 Stuti 28
1 Saumya 32
2 Aaditya 25
3 Saumya 32
4 Saumya 32
5 Saumya 32
6 Aaditya 40
7 Seema 32
34
EX NO: READING DATA FROM TEXT FILES, EXCEL AND THE WEB AND
EXPLORING VARIOUS COMMANDS FOR DOING DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET
AIM: To read data from text files , excel and the web and exploring various commands for doing
descriptive analytics on the iris data set.
PROCEDURE:
35
PROGRAM:
import pandas as pd
df = pd.read_csv("D:\iris_csv.csv")
print(df.head())
print(df.shape)
print(df.info())
print(df.describe())
print(df.isnull().sum())
print(df.sample(10))
print(df.columns)
print(df)
#data[start:end]
print(df[10:21])
sliced_data=df[10:21]
print(sliced_data)
# data["column_name"].sum()
sum_data = df["sepallength"].sum()
mean_data = df["sepallength"].mean()
median_data = df["sepallength"].median()
36
min_data=df["sepallength"].min()
max_data=df["sepallength"].max()
print(df["class"].value_counts())
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
37
3 petalwidth 150 non-null float64
std 0.828066 0.433594 1.764420 0.763161
sepallength 0
sepalwidth 0
petallength 0
petalwidth 0
class 0
dtype: int64
4 class 150 non-null object
None
38
93 5.0 2.3 3.3 1.0 Iris-versicolor
30 4.8 3.1 1.6 0.2 Iris-setosa
27 5.2 3.5 1.5 0.2 Iris-setosa
26 5.0 3.4 1.6 0.4 Iris-setosa
17 5.1 3.5 1.4 0.3 Iris-setosa
136 6.3 3.4 5.6 2.4 Iris-virginica
39
14 5.8 4.0 1.2 0.2 Iris-setosa
15 5.7 4.4 1.5 0.4 Iris-setosa
16 5.4 3.9 1.3 0.4 Iris-setosa
17 5.1 3.5 1.4 0.3 Iris-setosa
18 5.7 3.8 1.7 0.3 Iris-setosa
19 5.1 3.8 1.5 0.3 Iris-setosa
20 5.4 3.4 1.7 0.2 Iris-setosa
Sum: 876.5
Mean: 5.843333333333335
Median: 5.8
Minimum: 4.3
Maximum: 7.9
Iris-setosa 50
40
Iris-versicolor 50
Iris-virginica 50
RESULT: We have read data from text files , excel and the web and explored various commands for doing
descriptive analytics on the iris data set.
41
EX NO: USE THE DIABETES DATA SET FROM UCI AND PIMA INDIANS DIABETES
DATA SET FOR PERFORMING UNIVARIATE ANALYSIS, BIVARIATE
ANALYSIS AND MULTIPLE REGRESSION ANALYSIS
AIM: To use the diabetes data set from UCI and PIMA indians diabetes data set for performing univariate
analysis , bivariate analysis and multiple regression analysis
PROCEDURE:
Step 1: Import the necessary libraries - pandas, matplotlib.pyplot, statsmodels.api, and seaborn.
Step 2: Read the CSV file "D:\di.csv" into a DataFrame `df` using `pd.read_csv("D:\di.csv")`.
Step 3: Print the entire DataFrame using `print(df)`.
Step 4: Calculate the mean of selected columns using `mean =
df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].mean()`.
Step 5: Print the mean values using `print(mean)`.
Step 6: Calculate the median of selected columns using `median =
df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].median()`.
Step 7: Print the median values using `print(median)`.
Step 8: Calculate the mode of selected columns using `mode =
df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].mode()`.
Step 9: Print the mode values using `print(mode)`.
Step 10: Calculate the variance of selected columns using `variance =
df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].var()`.
Step 11: Print the variance values using `print(variance)`.
Step 12: Calculate the standard deviation of selected columns using `sd =
df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].std()`.
Step 13: Print the standard deviation values using `print(sd)`.
Step 14: Perform bivariate analysis by creating a scatter plot of Age vs Glucose using `plt.scatter(df.Age,df.Glucose)`
and display it with `plt.show()`.
Step 15: Perform linear regression modeling for Age vs Glucose using Ordinary Least Squares (OLS) with `x =
sm.add_constant(df[['Age']])` and `model = sm.OLS(df['Glucose'], x).fit()`. Print the summary using
`print(model.summary())`.
42
PROGRAM:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
# Reading the CSV file
df = pd.read_csv("D:\di.csv")
print(df)
#Mean
mean=df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].mean()
print(mean)
#Median
median=df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].median()
print(median)
#Mode
mode=df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].mode()
print(mode)
#Variance
variance=df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].var()
print(variance)
#StandardDeviation
sd=df[["Pregnancies","Glucose","BP","Insulin","Diabetes","Age"]].std()
print(sd)
43
b. Bivariate analysis: Linear and logistic regression modeling
OUTPUT:
0 2 148 72 0 0.627 50
1 1 85 66 0 0.351 31
2 3 183 64 0 0.672 32
3 0 89 66 94 0.167 21
5 2 116 74 0 0.201 30
6 0 78 50 88 0.248 26
7 5 115 0 0 0.134 29
9 1 166 96 0 0.232 54
44
Pregnancies 2.1000
Glucose 131.4000
BP 59.8000
Insulin 89.3000
Diabetes 0.5078
Age 35.9000
dtype: float64
Pregnancies 2.00
Glucose 126.50
BP 66.00
Insulin 0.00
Diabetes 0.24
Age 31.50
dtype: float64
45
Pregnancies 2.766667
Glucose 1753.155556
BP 658.177778
Insulin 28878.677778
Diabetes 0.427865
Age 140.988889
dtype: float64
Pregnancies 1.663330
Glucose 41.870700
BP 25.654976
Insulin 169.937276
Diabetes 0.654114
Age 11.873874
dtype: float64
C:\Users\NIRMALKUMAR\anaconda3\lib\site-packages\scipy\stats\stats.py:1541:
UserWarning: kurtosistest only valid for n>=20 .... continuing anyway, n=10
====================================================================
==========
Df Model: 1
46
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
====================================================================
==========
RESULT: We have used the diabetes data set from UCI and PIMA indians diabetes data set and
performed univariate analysis , bivariate analysis and multiple regression analysis.
47
EX NO:
APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS ON UCI DATA
SETS.
AIM: To apply and explore various plotting functions on UCI data sets.
PROCEDURE:
a. Normal Curves
Step 1: Import the required libraries - `import numpy as np`, `import pandas as pd`, and `import
matplotlib.pyplot as plt`.
Step 2: Set the style for the plot using `plt.style.use('seaborn-whitegrid')`.
Step 3: Create a series of data `x` ranging from 1 to 100 with 50 data points using `x = np.linspace(1, 100,
50)`.
Step 4: Define a function `normal_dist` that calculates the probability density of a normal distribution given
data `x`, mean, and standard deviation.
Step 5: Calculate the mean and standard deviation of the data using `mean = np.mean(x)` and `sd =
np.std(x)`.
Step 6: Plot the results by plotting `x` against `pdf` with a red color using `plt.plot(x, pdf, color='red')
b. Density and contour plots
Step 1: Define a function `f(x, y)` that represents a mathematical expression using NumPy.
Step 2: Create linearly spaced values for `x` and `y` using `x = np.linspace(0, 5, 50)` and `y =
np.linspace(0, 5, 40)`.
Step 3: Create a grid of points using `np.meshgrid(x, y)` and assign the result to `X` and `Y`.
Step 4: Evaluate the function `f` for each point on the grid to obtain a 2D array `Z` representing the
function values.
Step 5: Plot the contours of the function using `plt.contour(X, Y, Z, colors='black')`.
Step 6: Plot colored contours with 20 levels using `plt.contour(X, Y, Z, 20, cmap='RdGy')`.
c. Correlation and scatter plots
Step 1: Create a random number generator with a seed value using `rand = np.random.RandomState(10)`.
Step 2: Generate an array of 20 random integers between 0 and 100 using `x = rand.randint(100, size=20)`.
Step 3: Calculate the sine of each element in the array `x` and store it in `y` using `y = np.sin(x)`.
Step 4: Plot the points (`x, y`) as circles ('o') in black color using `plt.plot(x, y, 'o', color='black')`.
d. Histograms
Step 1: Set the plot style to 'seaborn-white' using `plt.style.use('seaborn-white')`.
Step 2:Create a random number generator with a seed value using `rand = np.random.RandomState(0)`.
Step 3: Generate an array of 5 random integers between 0 and 10 using `x = rand.randint(10, size=5)`.
e. Three dimensional plotting
Step 1: Create a 3D plot using `ax = plt.axes(projection='3d')`.
Step 2: Generate data for a three-dimensional line - `zline = np.linspace(0, 15, 100)`, `xline = np.sin(zline)`,
and `yline = np.cos(zline)`.
Step 3: Plot the three-dimensional line using `ax.plot3D(xline, yline, zline, 'gray')`.
Step 4: Generate data for three-dimensional scattered points - `zdata = 15 * np.random.random(100)`,
`xdata = np.sin(zdata) + 0.1 * np.random.randn(100)`, and `ydata = np.cos(zdata) + 0.1 * np.random.randn(
Step 5: Scatter plot the three-dimensional points forax.scatter3D(xdata, ydata, zdata, c=zdata,
cmap='Greens')`.
48
PROGRAM
a. Normal curves
import numpy as np
import pandas as pd
plt.style.use('seaborn-whitegrid')
x = np.linspace(1,100,50)
#Creating a Function.
return prob_density
mean = np.mean(x)
sd = np.std(x)
pdf = normal_dist(x,mean,sd)
plt.xlabel('Data points')
plt.ylabel('Probability Density')
49
b. Density and contour plots
#%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black') #Visualizing three-dimensional data with contours
plt.contour(X, Y, Z, 20, cmap='RdGy') #Visualizing three-dimensional data with colored
contours
import numpy as np
import pandas as pd
rand=np.random.RandomState(10)
x=rand.randint(100,size=20)
y = np.sin(x)
50
d. Histograms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
rand=np.random.RandomState(0)
x=rand.randint(10,size=5)
plt.hist(x)
51
OUTPUT:
a) Normal Curve
52
c) Correlation and Scatter Plots
53
d) Histogram
RESULT: We have applied and explored various plotting functions on UCI data sets.
54
EX NO:
VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
PROCEDURE:
Step 1: Import the required libraries - `from mpl_toolkits.basemap import Basemap` and `import matplotlib.pyplot as
plt`.
Step 2: Create a figure with a size of 12x12 inches using `fig = plt.figure(figsize=(12, 12))`.
Step 3: Initialize a Basemap object `m` using `Basemap()`.
Step 4: Draw coastlines using `m.drawcoastlines()`.
Step 5: Display the plot with the title "Coastlines" using `plt.title("Coastlines", fontsize=20)` and `plt.show()`.
Step 6: Draw country boundaries by adding `m.drawcountries()` after drawing coastlines.
Step 7: Display the plot with the title "Country boundaries" using `plt.title("Country boundaries", fontsize=20)` and
`plt.show()`.
Step 8: Draw major rivers by adding `m.drawrivers(linewidth=0.5, linestyle='solid', color='#0000ff')` after drawing
coastlines and countries.
Step 9: Display the plot with the title "Major rivers" using `plt.title("Major rivers", fontsize=20)` and `plt.show()`.
Step 10: Draw a filled map boundary by filling continents with coral color and the map boundary with aqua color
using `m.fillcontinents(color='coral', lake_color='aqua')` and `m.drawmapboundary(color='b', linewidth=2.0,
fill_color='aqua')`.
Step 11: Display the plot with the title "Filled map boundary" using `plt.title("Filled map boundary", fontsize=20)`
and `plt.show()`.
Step 12: Create a new figure with a size of 10x8 inches using `fig = plt.figure(figsize=(10, 8))`.
Step 13: Initialize an orthographic Basemap projection with a central longitude of 25 and a central latitude of 10
using `m = Basemap(projection='ortho', lon_0=25, lat_0=10)`.
Step 14: Draw coastlines, continents, country boundaries, and map boundary in an orthographic projection using
appropriate Basemap methods.
Step 15: Display the plot with the title "Orthographic Projection" using `plt.title("Orthographic Projection",
fontsize=18)`.
55
Basemap() Package Installation
Installation of Basemap is straightforward; if you’re using conda you can type this and the package
will be downloaded:
conda install -c anaconda basemap
Description
Basemap toolkit is a library for plotting 2D data on maps in Python. It is similar in functionality
to the matlab mapping toolbox, the IDL mapping facilities, GrADS, or the Generic Mapping Tools.
56
PROGRAM:
57
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title("Orthographic Projection", fontsize=18)
OUTPUT:
58
RESULT: We have visualized geographic data with basemap.
59