data science
data science
ii. Prediction:
Prediction involves using historical data to create a model that forecasts future values or trends. It
is a key aspect of regression analysis, such as predicting house prices based on features like size
and location.
iii. Clustering:
Clustering is an unsupervised learning technique used to group similar data points into clusters
based on their features. An example is segmenting customers into groups based on purchasing
behavior.
1. Data Collection:
o Gather customer data, including demographics, transaction history, and reasons for
churn.
o Include external data such as economic indicators.
2. Data Cleaning:
o Handle missing or inconsistent values and remove duplicates.
3. Exploratory Data Analysis (EDA):
o Analyze churn rates and identify patterns.
o Use visualizations like histograms and bar plots to understand customer behavior.
4. Customer Segmentation:
o Use clustering techniques to segment customers based on value and risk.
5. Churn Prediction Model:
o Build a classification model (e.g., Logistic Regression or Random Forest) to predict
churn.
o Use this model to identify at-risk customers.
6. Retention Strategies:
o Personalize campaigns based on insights.
o Offer tailored incentives to retain high-value customers.
7. Evaluation:
o Monitor churn rates and campaign effectiveness.
8. Data Warehousing:
o Build a data warehouse to store cleaned and processed data for future use.
import numpy as np
height_array = np.array(height)
weight_array = np.array(weight)
Example:
Big Data Analytics: Predict customer preferences using real-time transaction data.
Business Intelligence: Report monthly sales trends.
1. Descriptive Statistics:
o Summarize attributes like average age, gender distribution, and most common
conditions.
o Use histograms and boxplots for distributions.
2. Correlation Analysis:
o Analyze relationships between variables such as age and condition severity.
o Use heatmaps for visualization.
1. Telecommunications:
o Optimize network usage using predictive models.
o Reduce churn through customer segmentation.
2. Biological Data Analysis:
o Analyze DNA sequences using clustering techniques.
o Predict disease outbreaks with machine learning.
(a) Two Benefits of Using NumPy Arrays Over Nested Python Lists:
1. Performance:
o NumPy arrays are optimized for numerical computations and run significantly faster
than lists.
Example:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Vectorized addition
2. Memory Efficiency:
o Arrays store elements of the same type and require less memory compared to lists.
Example:
python_list = [1, 2, 3, 4]
numpy_array = np.array(python_list)
print(numpy_array.nbytes) # Displays memory usage
time.extend([24.5, 15.45])
print(time)
time.reverse()
print(time)
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Age': [25, 30, 25]}
df = pd.DataFrame(data)
1. Syntax:
result = function_name(arguments)
Example:
Q5: What are mutable and immutable data types in Python? Provide examples.
A:
my_list = [1, 2, 3]
my_list.append(4) # Modifies the list
print(my_list) # Output: [1, 2, 3, 4]
my_str = "hello"
my_str[0] = "H" # Error: strings are immutable
B:What are the key data types in Python, and what are their uses?
int: Represents integers (e.g., 10).
float: Represents decimal numbers (e.g., 3.14).
str: Represents text (e.g., "hello").
list: Ordered and mutable collection (e.g., [1, 2, 3]).
tuple: Ordered and immutable collection (e.g., (1, 2, 3)).
dict: Key-value pairs (e.g., {"key": "value"}).
set: Unordered, unique elements (e.g., {1, 2, 3})
for i in range(5):
if i == 3:
break # Stops at 3
print(i)
for i in range(5):
if i == 3:
continue # Skips 3
print(i)
Q6: What are loops, and how are they used in Python? Provide examples of for and while
loops.
A: Loops are used for repetitive tasks.
for i in range(3):
print(i) # Output: 0, 1, 2
count = 0
while count < 3:
print(count)
count += 1 # Output: 0, 1, 2
1. Technical Skills:
o Python/R, SQL, and statistical analysis.
o Data visualization tools (e.g., Tableau).
2. Soft Skills:
Problem-solving.
o
Communication for presenting insights.
o
3. Domain Knowledge: Understanding industry-specific challenges.
1. Business Understanding.
2. Data Understanding.
3. Data Preparation.
4. Modeling.
5. Evaluation.
6. Deployment.
import pandas as pd
import pandas as pd