Classification of Machine Learning
Classification of Machine Learning
Learning
Classification of ML
Supervised Learning
-Classification
-Regression
Unsupervised Learning
-Clustering
-Association
it interpret the raw data to find the hidden patterns from the
data
then apply suitable algorithms such as k-means clustering,
Decision tree, etc.
the algorithm divides the data objects into groups according
to the similarities & difference between the objects.
Types of Unsupervised Learning Algorithm
Clustering: Clustering is a method of Association: An association rule is used for
grouping the objects into clusters such finding the relationships between variables
that objects with most similarities in the large database.
remains into a group and has less or no It determines the set of items that occurs
similarities with the objects of another together in the dataset.
group. Association rule makes marketing strategy
Cluster analysis finds the commonalities more effective.
between the data objects and Such as people who buy X item (suppose a
categorizes them as per the presence bread) are also tend to purchase Y
and absence of those commonalities. (Butter/Jam) item.
• Self Learning
Some facts about Data
• As compared to 2005, 300 times i.e. 40 Zettabytes
(1ZB=10^21 bytes) of data will be generated by 2020.
• By 2011, the healthcare sector has a data of 161 Billion GB
• 400 M tweets are sent by about 200 M active users per day
• Each month, more than 4B hours of video streaming is done
by the users.
• 30B different types of content are shared every month by
the user.
• It is reported that about 27% of data is inaccurate and so 1
in 3 business idealists or leaders don’t trust the information
on which they are making decisions.
Best Python libraries for ML
• In the older days, people used to perform ML tasks
by manually coding all the algorithms/mathematical
or statistical formula
– time consuming, tedious & inefficient.
• It is become very much easy & efficient compared
by various python libraries, frameworks, & modules.
• Python is one of the most popular programming
languages for this task and it has replaced many
languages in the industry
• Reason: vast collection of libraries.
Python ML Libraries
• Numpy
• Scipy
• Scikit-learn
• Theano
• TensorFlow
• Keras
• PyTorch
• Pandas
• Matplotlib
NumPy
• Popular python library for large multi-
dimensional array & matrix processing, with the
help of a large collection of high-level
mathematical functions.
– It is very useful for fundamental scientific
computations in Machine Learning.
– It is particularly useful for linear algebra, Fourier
transform, & random number capabilities.
• High-end libraries like TensorFlow uses NumPy
internally for manipulation of Tensors.
SciPy
• Popular library among ML enthusiasts as it
contains different modules for optimization,
linear algebra, integration & statistics.
• There is a difference between the SciPy library
& SciPy stack.
– The SciPy is one of the core packages that make
up the SciPy stack.
– SciPy is also very useful for image manipulation.
Skikit-learn
• Most popular for classical ML algorithms.
• It is built on top of two basic Python libraries,
viz., NumPy & SciPy.
• Scikit-learn supports most of the supervised &
unsupervised learning algorithms.
– Scikit-learn can also be used for data-mining &
data-analysis
Theano
• Popular python library that is used to define, evaluate &
optimize mathematical expressions involving multi-
dimensional arrays in an efficient manner.
• It is achieved by optimizing the utilization of CPU & GPU.
• It is extensively used for unit-testing & self-verification to
detect & diagnose different types of errors.
• Theano is a very powerful library that has been used in
large-scale computationally intensive scientific projects for
a long time
– but is simple & approachable enough to be used by individuals
for their own projects.
TensorFlow
• Very popular open-source library for high
performance numerical computation [Google
Brain team].
• Tensorflow is a framework that involves
defining & running computations involving
tensors.
• It can train & run DNN
– Widely used in the field of DL research &
application
Keras
• It is a high-level NN API capable of running on
top of TensorFlow, CNTK, or Theano.
• It can run seamlessly on both CPU & GPU
• Keras makes it really for ML beginners to build
& design a NN.
– allows for easy & fast prototyping
PyTorch
• Based on Torch, which is an open-source ML
library implemented in C with a wrapper in
Lua.
• It has an extensive choice of tools & libraries
that supports on Computer Vision, NLP &
many more ML programs.
• It allows developers to perform computations
on Tensors with GPU acceleration
– helps in creating computational graphs
Pandas
• Popular for data analysis
• Pandas comes handy as it was developed
specifically for data extraction & preparation.
• It provides high-level data structures & wide
variety tools for data analysis.
• It provides many in-built methods for groping,
combining & filtering data
Matplotlib
• Popular for data visualization.
• Like Pandas, not directly related to ML.
• It particularly comes in handy when a programmer wants
to visualize the patterns in the data.
• It is a 2D plotting library used for creating 2D graphs &
plots.
• A module named pyplot makes it easy for programmers for
plotting as it provides features to control line styles, font
properties, formatting axes, etc.
• It provides various kinds of graphs & plots for data
visualization, viz., histogram, error charts, bar chats, etc,
Popular Sources for ML Datasets
• Kaggle Datasets: https://www.kaggle.com/datasets
• UCI ML Repository: https://archive.ics.uci.edu/ml/index.php
• Datasets via AWS: https://registry.opendata.aws/
• Google's Dataset Search Engine: https://toolbox.google.com/datasetsearch
• Microsoft Datasets: https://msropendata.com/
• Awesome Public Dataset Collection:
https://github.com/awesomedata/awesome-public-datasets
• Government Datasets:
– Indian Government dataset
– US Government Dataset
– Northern Ireland Public Sector Datasets
– European Union Open Data Portal
• Computer Vision Datasets: https://www.visualdata.io/
• Scikit-learn Dataset: https://scikit-learn.org/stable/datasets/index.html
Assignment
• Explain with your own logic, relevant examples
and experiences: “AI is the superset of ML i.e.
all ML is AI but not all AI is ML”
• Special instructions:
– Should be individualized
– Don’t copy-paste
– Try to write by yourselves
– Similarity score will measure by the turnitin software