5th Sem Internship Eport
5th Sem Internship Eport
Internship Plan
1
226150316051
No Topic Page no
1 27/06/2024
2 01/07/2024
3 03/07/2024
4 6/07/2024
5 10/07/2024
6 15/07/2024
7 16/07/2024
8 20/07/2024
9 25/07/2024
10 26/08/2024
11 27/07/2024
12 31/08/2024
13 04/08/2024
2
226150316051
DAY 1
1. Intro about DataScience
3
226150316051
Data science is a comprehensive, interdisciplinary field that focuses on extracting insights and
knowledge from data to solve complex problems. Here’s a more detailed look at various aspects
of data science:
1. Core Concepts:
Data Collection: Gathering raw data from various sources (databases, APIs,
web scraping, sensors, etc.).
Data Cleaning and Preprocessing: Handling missing values, correcting
errors, transforming data formats, and standardizing data to ensure accuracy.
Exploratory Data Analysis (EDA): Investigating data sets to discover
patterns, trends, correlations, and anomalies using statistical measures and
visualization tools.
Feature Engineering: Creating new features or variables that improve the
predictive power of models.
Model Building: Applying machine learning algorithms to build predictive
or descriptive models using training data.
Model Evaluation: Assessing model performance using metrics like
accuracy, precision, recall, F1-score, and AUC (Area Under the Curve).
Deployment and Monitoring: Integrating the model into production
systems and monitoring its performance over time.
4
226150316051
Data science is used across a wide range of industries to drive innovation and
improve decision-making. Some applications include:
5
226150316051
Science:
Data science will continue to evolve, driven by advances in AI, automation, and
computing power. New tools will make it easier for businesses to leverage data,
and data science will become even more critical in decision-making across
industries. The increasing role of ethics and regulation around AI and data will also
shape how data science is applied in the future.
DAY 2
1. Installation of Anaconda(Setup) and start writing and running
your Python code in jupyter
7
226150316051
1.1 Introduction
Anaconda is an open-source distribution of the Python and R programming
languages, widely used for scientific computing, data science, machine
learning, and data analysis. It simplifies package management and
deployment, making it a popular choice for both beginners and experienced
users in the data science community.
1.2 Steps in the Installation Process
You can choose to install Anaconda for just yourself or for all users
(requires administrator permissions).
Select the destination folder (default is usually fine).
Click "Install".
Once installation is complete, you can choose to launch Anaconda Navigator
or Jupyter Notebook to get started immediately.
8
226150316051
This command opens the Jupyter Notebook interface in your web browser.
9
226150316051
DAY 3
1. Find the missing value from the dataset
10
226150316051
2. Plot Scatter plot, line plot and Heatmap using Matplotlib and
Seaborn
11
226150316051
12
226150316051
13
226150316051
DAY 4
Datafame Detailng using Switviz
14
226150316051
DAY 5-7
15
226150316051
• The data is split into training and testing sets, and we measure performance
using accuracy for classification models and mean squared error (MSE)
for the regression model.
16
226150316051
17
226150316051
A Decision Tree is a popular supervised machine learning algorithm used for both
classification and regression tasks. It models decisions and their possible
consequences in a tree-like structure, where internal nodes represent decision
points based on a feature, branches represent the outcomes of these decisions, and
leaf nodes represent the final output (a class label in classification or a numerical
value in regression).
18
226150316051
19
226150316051
20
226150316051
21
226150316051
22
226150316051
23
226150316051
DAY 8-11
Using Model for Regression
Linear Regression & Gradient Boostibg Regression Used to predict exam
scores based on study hours.
Linear Regression Model:
24
226150316051
25
226150316051
26
226150316051
27
226150316051
DAY 12
Find other Dataset Related to Disease and apply Different
Classification Models
DecisionTree Classifier:
28
226150316051
29
226150316051
DAY 13
30
226150316051
Virtual Environment
Windows: .venv\Scripts\activate.bat
macOS/Linux: source .venv/bin/activate
Installing Streamlit
import streamlit as st
st.write("Hello world")
This command launches your app in the default web browser. For a more
detailed exploration, including how to run Streamlit apps in Jupyter
notebooks, refer to the official documentation.
31
226150316051
DAY 14-15
Use another Dataset use different models and plot graphs
32
226150316051
33
226150316051
34
226150316051
35
226150316051
36
226150316051
37
226150316051
38
226150316051
39
226150316051
40
226150316051
41
226150316051
DAY 16
42
226150316051
43
226150316051
44
226150316051
DAY 17-19
Predict answer from Disease dataset using streamlit by user input
This Streamlit app performs heart disease prediction using a Gradient Boosting
Classifier model trained on the heart disease dataset. Here's a brief explanation of
the code:
This provides an interactive tool where users can input health metrics to predict
whether they may have heart disease based on the model's learning from the
dataset.
45
226150316051
46
226150316051
47
226150316051
48
226150316051
49
226150316051
DAY 20
Predict Medicine by user input in Streamlit
50
226150316051
51
226150316051
DAY 21
Create a GitHub account, set it up, and upload files (like your code)
to a repository
1. Go to GitHub.com.
2. Click on the Sign Up button.
3. Fill in your details (email, password, username).
4. Complete the verification process.
5. Choose a plan (the free plan is enough for most purposes).
6. Confirm your email address.
To upload your files from your local machine to GitHub, you'll need to install Git
on your system.
For Windows:
For macOS:
git --version
For Linux:
3. Configure Git
52
226150316051
Open your terminal (or Git Bash for Windows) and run the following commands:
After creating your repository on GitHub, you’ll want to upload your local files.
git init
2. Add the remote repository: Link your local folder to the GitHub
repository:
3. Add your files: Add all the files in your folder to the Git staging area:
git add .
5. Push your files to GitHub: Finally, push your local files to the GitHub
repository:
53
226150316051
54
226150316051
DAY 22
55
226150316051
Face Detection
DAY 23-26
56
226150316051
57
226150316051
58
226150316051
59
226150316051
60
226150316051
61
226150316051
62
226150316051
63
226150316051
64
226150316051
65
226150316051
66
226150316051
DAY 27-30
Email analysis process
67
226150316051
68
226150316051
69
226150316051
70
226150316051
71
226150316051
72
226150316051
73
226150316051
74
226150316051
75
226150316051
76
226150316051
77
226150316051
78
226150316051
79
226150316051
80
226150316051
81
226150316051
82
226150316051
83