0% found this document useful (0 votes)
405 views

ML (U1&u2)

Uploaded by

Atharva Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
405 views

ML (U1&u2)

Uploaded by

Atharva Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 51
Introduction to cHAPTER1 Machine Learning 14 144 112 113 114 115 12 13 14 15 154 1.62 Introduction to Machine Leaming, Data Science. Types of learning: Supervised, Unsupervised, and semi-supervi Models of Machine learning: Geometric model, Probabilistic Models, Logical Models, Grouping and grading Comparison of Machine learning with traditional programming, ML vs Al vs ised, reinforcement leaming techniques, models, Parametric and non-parametric models. Important Elements of Machine Leaming- Data formats, Learnabilty, Statistical learning approaches Introductio What is Machine Learning iss GQ. What is Machine Learning ?.. GQ. What is the importance of Machine Leaming ?... Why Is Machine Learning Important 2. Machine Learning Defi Machine Learning Process. GQ. What are the various steps in machine learning process?, ‘Applications of Machine Leaming Ga. State various applications of machine learning? GQ. What are the various applications of machine learning in Mechanical Engineering? ‘Comparison of Machine learning with traditional programming, ML vs Al vs Data Science . Types of teaming, ‘Supervised learning .. GQ. What is supervised learing?.. GQ. Explain supervised learning withthe help of an example. GQ. How supervised leaning works' How Supervised Learning Works?... ‘Advantages of Supervised Learning ..tion to Machine Learning)....Pago na ¢, Machine Leaming (SPPU - Sem 7 - Comp.) (Introd : 1.5.3 Disadvantages of Supervised Leaning... 1.6 Unsupervised leaming... GQ. What is Unsupervised Leaming’ GQ. What are the types of unsupervised Iéamning?, GQ. What are the advantages and disadvantages of unst 1.6.1 Types of Unsupervised Leaming Algorithm ... 1.6.2 Advantages of Unsupervised Learning... 1.6.3 Disadvantages of Unsupervised Leaming.. 1.64 Difference between Supervised and Unsupervised Learning. GQ. Whatis the difference between supervised leaming and unsupervised learning? 1.7 Reinforcement Leamin GQ. What is Reinforcement Leaming? Explain with an example... 1.7.1 Approaches to Implement Reinforcement Learnin, GQ. What are the approaches for Reinforcement Leaming? sn... 1.72 Challenges of Reinforcement Leaming rn 1.7.3 Applications of Reinforcement Learn 1.74 Reinforcement Leaming vs. Supervised Learning . GQ. Whatis the difference between Reinforcement Learning and Supervised Learning ?. 1.8 Introduction to semi-supervised leaming.... 19 19.1 1.92 1.9.3 1.94 195 1.8.6 Groping and Grading Models. 1.97 1.10 114 Groping versus Grading Models 4at4 111.2 1.41.3 Statistical leaming Approaches. ‘Chapter Ends.‘Machine Leaming (SPPU - Sem 7 - Comp.) (Introduction to Machine Leaming)....Page no. (1-3) SS ——————————eees DH 1.1 INTRODUCTION ‘The term “Machine Learning” or ML in short, was coined in 1969 by Arthur Samuel in the context of colving game of checkers by machine. The term refers to a computer program that can learn to produce a behaviour that is not explicitly programmed by the author of the program. «Rather it is capable of showing behavior that the author may be completely unaware of. This behaviour is learned based on three factors: (2) Data that is consumed by the program, (2) A metric that quantifies the error or some form of distance between the current behavior and ideal behavior, and (8) A feedback mechanism that uses the quantified error to guide the program to produce better behavior in the stibsequent events «Machine learning is a subfield of AI and is, in many cases, the basis for AI technology. The goal of machine learning technology is to understand the structure of data and fit that data into specific models that are able to then be understood and used by humans for various applications throughout life. «Traditional computer sciences are driven with algorithms that are human-created and managed, machine learning is driven by algorithms that the device itself can learn from and grow from. Beyond that, they are often built with a very specific purpose,that enables them to specialize in specific areas of “knowledge” or capabilities. + Everything from the face recognition capabilities in your phone to the self-driving technology in cars is derived from specialized forms of machine learning technology. It has become, and continues to become, a highly relevant and well researched part of our modern world. ‘WS 1.1.1 What is Machine Learning ? QE" What is Machine Learning ? : 3 1 GQ, _ Whatis the importance of Machine Learning ? : «Machine learning is a form of computer science technology whereby the machine itself has a complex range of “knowledge” that allows it to take certain data inputs and use complex statistical analysis strategies to create output values that fall within a specific range of knowledge, data, or information. «© “Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.” «Machine learning devices essentially take data, and use to look for patterns and other pieces of specified information to create predictions or recommendations. «The goal is for computers to learn how to use data and information to be able to learn automaticaly, rather than requiring humans to intervene or assist with the learning process. (GPPU- New Sylabus wes academic year 22-20) (7-72) Tal rech-Neo Pubications..A SACHIN SHAH Venture(Introduction to Machine Loarnin Machine Leaming (SPPU - Sem 7 - Com + A Machine Learning process begins by foeding the machine lots of data, by using this data the machin, n used to build a Machin, is trained to detect hidden insights and trends, ‘Those insights aro the! Learning Model by using an algorithm in order to solve a problem. Training the machine Building a model Predicting outcome Data ‘unBig, 1.1.1: What is machine learning? YQ 1.44.2 Why Is Machine Learning Important ? erating an immeasurable amount of data. As per Ever since the technical revolution, we've been ge! o quintillion bytes of data every single day! It is estimated that by 20% research, we generate around 2.5 TMB of data will be created every second for every person on earth. With the availability of so much data, itis finally possible to build predictive models that can study and analyze complex data to find useful insights and deliver more accurate results Machine learning and data mining, a component of machine learning, are crucial tools in the process i glean insights from massive datasets held by companies and researchers today. Here’s a list of reasons why Machine Learning is so important : (1) Increase in Data Generation : Due to excessive production of data, we need a method that can be used to structure, analyze and draw useful insights from data. This is where Machine Learning comes in, It uses data to solve problems and find solutions to the most complex tasks faced by organizations. @) Improve Decision Making : By making.use of various algorithms, Machine Learning can be used t? make better business decisions. @) re patterns & trends in data : Finding hidden patterns and extracting key insights from dat® is the most essential part of Machine Learning. By building predictive models and using statisti! o ae Machine Learning allows you to dig beneath the surface and explore the data at a minute le. Understanding data and extracting pattorns manually will take days, wher i ing = cannes " ws, Whereas Machine Learmi ithms can perform such computations in less than a second. (4) Solve complex problems : From detecti f letecting tho gones linked to the dead! i ‘lding sell driving cars, Machine Loarni leadly ALS disease to building se! arning cay be tised to solve the most complex problems, 1.1.3 Machine Learning Definitions * Algorithm : A Machi ithm i eee areas is a set of rules and statistical techniques used to leat significant information from it. It i i model. An i fs + It is the logic behind i ins Grample of a Machine Learning algorithm is the Linear Regression alg: eae ven rithm, (SPPU- New Syllabus w.e.t wef academic year 22-23) (P7-72) al Tech-Neo Publications...A SACHIN SHAH VertueMachine Leaming (SPPU - Sem7- Comp.) (introduction to Machine Leaming)....Page no. (1-5) «Mode 1: A model is the main component of Machine Learning. A model is trained by using a Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed to take based on the given input, in order to get the correct output. « Predictor Variable : It is a feature(s) of the data that can be used to predict the output. Response Variable : It is the feature or the output variable that needs to be predicted by using the predictor variable(s). * Training Data : The Machine Learning model is built using the training data. The training data helps the model to identify key trends and patterns essential to predict the output. + Testing Data : After the model is trained, it must be tested to evaluate how accurately it can predict an outcome. This is done by the testing data set. ‘The Machine Learning process involves building a Predictive model that can be used to find a solution for a Problem Statement, To understand the Machine Learning process let's assume that a problem that needs to be solved by using Machine Learning. The problem is to predict the occurrence of rain in your local area by using Machine Learning. The below steps are followed in a Machine Learning process : > Step 1: Define the objective of the Problem Statement + At this step, we must understand what exactly needs to be predicted. In this case, the objective is to predict the possibility of rain by studying weather conditions. * At this stage, it is also essential to take mental notes on what kind of data can be used to solve this problem or the type of approach you must follow to get to the solution. > Step 2: Data Gathering ‘+ Atthis stage, the questions are such as, ° ‘What kind of data is needed to solve this problem? 0 Isthe data available? How can I get the data? © Once you know the types of data that is required, you must understand how you can derive this data. Data collection can be done manually or by web scraping. + The data needed for weather forecasting includes measures such as humidity level, temperature, pressure, locality, whether or not you live in a hill station, ete. Such data must be collected and stored for analysis. > Step 8: Data Preparation © The data collected is almost never in the right format. There will be a lot of inconsistencies in the data set such as missing values, redundant variables, duplicate values, ete. (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) ‘Tech-Neo Publications...A SACHIN SHAH Venture(Introduction to Machine Learning) 90 no. Removing stich i aint ig - inconsistencies is vory essential bocauso they might lead to wrongful computation g ions. Therefore, at this stage, the data set can be scanned for any inconsistencis and ean fixed then and there, ants > Step 4: Exploratory Data Analysis * EDA or Exploratory Data Analysis isthe brainstorming stage of Machine Learning. Data Exploration involves understanding the patterns and trends in the data, At this stage, all the useful insights are drawn and correlations between the variables are understood. + For example, in the case of predicting rainfall, we know that there is a strong possibility of rain ifthe temperature has fallen low. Such correlations must be understood and mapped at this stage. > Step 5 : Building a Machine Learning Model + All the insights and patterns derived during Data Exploration are used to build the Macl Model. «This stage always begins by splitting the data set into two parts, training data, and testing data. The training data will be used to build and analyzo the model. The logic of the model is based on the Machine Learning Algorithm that is being implemented. ut will be in the form of True (if it will rain tomorrow) or thine Learning © Inthe case of predicting rainfall, since the outp False (no rain tomorrow), we can use a Classification Algorithm such as Logistic Regression. «Choosing the right algorithm depends on the type of problem to be solved, the data set and the level of complexity of the problem. > Step 6: Model Evaluation & Optimization ‘After building a model by using the training data set, itis finally ti testing data set is used to check the efficiency of the model and me to put the model to a test. The how accurately it can predict the outcome. Once the accuracy is calculated, any further improvements i stage. Methods like parameter tuning and cross-validation can in the model can be implemented at this be used to improve the performance of the médel. > Step 7: Predictions . Once the model is evaluated and improved, be a Categorical variable (e.g, True or False) or it can b value of a stock). for predicting the occurrence of rainfall, the output will ale predictions. The final output e=2 it is finally used to ms redicted a Continuous Quantity (e.g. the P be a categorical variable. + Inthis case, 1.1.5 Applications of Machine Learning te various applications of machine learning? the various applications of machine learning in Mechanical Engit suntv [el rech-neo Publications... SACHIN (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72)Machine Learning (SPPU - Sem 7 - Comp.) (Introduction to Machine Leaming). Page no. (1-7) ‘Automatic language Applications of Machine Learning (aoFig 1.1.2 : Applications of machine learning (1) Image Recognition Image recognition is one of the most common applications of machine learning. It is used to identify objects, persons, places, digital images, etc. The popular use case of image recognition and face detection is, Automatic friend tagging suggestion. Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo with our Facebook friends, then we automatically get a tagging suggestion with name, and the technology behind this is machine learning’s face detection and recognition algorithm. It is based on the Facebook project named "Deep Face," which is responsible for face recognition and person identification in the picture. (2) Speech Recognition While using Google, we got an option of "Search by voico," it comes under speech recognition, and it's a popular application of machine learning. Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech to text", or "Computer speech recognition." At present, machine learning algorithms are widely used by various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the voice instructions. (SPPU - New Syllabus w.0.f academic year 22-23) (P7-72) [al rech-ico Publications. SACHIN SHAH VentureMachine Learni 19 (SPPU Sem7-Comp,) {itrocustion to Machine Learing)...Page no (19 (3) Traffic prediction ¥ ae if we want to visit a new place, we take help of Google Maps, which shows us the correct path with the shortest route and predicts the traffic conditions. It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with the help of two ways: Real Time location of the vehicle form Google Map app and sensors. Average time has taken on past days at the same time, (4) Product recommendations Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some Product on Amazon, then we started getting an advertisement for the same product while internet, surfing on the same browser and this is because of machine learning. Google understands the user interest using various machine learning algorithms and suggests the product as per customer interest. As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc., and this is also done with the help of machine learning. (5) Email Spam and Malware Filtering Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always receive an important mail in our inbox with the important symbol and spam emails in our spam box, and the technology behind this is Machine learning. Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naive Bayes classifier are used for email spam filtering and malware detection. (6) Virtual Personal Assistant ‘We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the information using our voice instruction. ‘These assistants can help us in various ways just by our voice instructions such as Play music, call someone, open an email, scheduling an appointment, etc. These virtual assistants use machine learning algorithms as an important part. ‘These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML algorithms and act accordingly. (7) Online Fraud Detection Machine learning is making our online transaction safe and secure by detecting fraud transactio™ ‘Wheriever we perform some online transaction, there may be various ways that a sre ‘ transaction can take place such as fake accounts, fake ids, and steal money in the middle « transaction. ® (SPPU - New Syllabus w.ef academic year 22-28) (P7-72) [El rech-Neo Pubicaons..a SACHIN SHAHMachine Leaming (SPPU - Sem 7 - Comp.) (Introduction to Machine Learning)....Page no. (1-9) So, to detect this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or a fraud transaction. (8) Stock Market trading Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and downs in shares, so for this machine earning's long short term memory’ neural network is used for the prediction of stock market trends. (9) Automatic Language Translation Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all, as for this also machine learning helps us by converting the text into our known languages. Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural Machine Learning that translates the text into our familiar language, and it called as automatic translation. 2 COMPARISON OF MACHINE LEARNING WITH TRADITIONAL PROGRAMMING ‘Sr. |" -\ “Machine Learning ' Traditional Programming » ‘No. pies 2 : - 1. | Machine learning is not a manual process here | ‘Traditional programming is a manual process. the algorithm automatically formulate the|— ie. a person or programmer creates the rules from the data. program. Without anyone has to manually formulate or code rules. 2. | Machine learning approach ‘Traditional programming Data ~Computati Rost 4 Fig.A FigB 3. | With a subset of Artificial Intelligence (AD, | In T.P., we write down the exact steps required machine learning is motivated by human | to solve the problem. learning behaviour. Here we show the examples and machine figures out how to solve the problem by itself, 4, | A machine learning algorithm takes an input | A traditional algorithm takes some input and and output and gives some logie which can be | some logic in the form of code and gives the used to work with new input to give an output. | output. (SPPU - New Sylabus w.e{ academic year 22-29) (P7-72) [lrecn.o Pubtcatons.A SACHIN SHAH VentureMachine Learning (SPPU - Sem 7 - Comp.) (Introduction to Machine Learning) Page no, (1. 4 >| 1.3 ML VS Al VS DATA SCIENCE : «ML vs AI vs Data Science are interconnected but have different scopes. They follow differen, approaches and produce different results depending on the problem. * Here we shall see how ML, AI and Data Science differ from each other. ‘Aspects | Machine Learning Artificial Intelligence Data Science Job Machine Learning, | Machine learning Engineer, | Data engineer, Data scientist, roles Engineer, Data Architect, | Data scientist, | Data Analyst, Data Architect, Data scientist, Data | Business intelligence. | Database Administrator, Mining Developer, Big Data | Machine learning engineer, specialist, cloud Architect | Architect, Research Scientist. statistician, Business Analyst, and cyber _ security Data and Analytics Manager Analyst, and more. Skills Statistics, Probability | Mathematical and Algorithms | Programming Data Modelling. skills Probability and | skills. Statistics | Machine Programming _ kills, | Statistics Knowledge, | Learning, ‘Multi-variable ‘Applying ML Libraries | Expertise in programming | calculus and linear Algorithm, and algorithms. Software | AWareness about Advanced | Data visualisation and design python. Signal Processing | Communication Software techniques well versed with | Engineering Data Intuition. unix Tools. Salary | 1123 year Average | 14.3 lakhs per annum 1050 k/year Average base pay base pay (DOL 1.4 TYPES OF LEARNING ee With the constant advancements in artificial intelligence, the field has become too big to specialize in all together. There are countless problems that can be solved with countless methods. Knowledge of a experienced AI researcher specialized in one field may mostly be useless for another field. Understanding the nature of different machine learning problems is very important. Even though the list of machine learning problems is very long, these problems can be grouped into three different learning approaches: 1. Supervised Learning; 2, Unsupervised Learning; 3. Reinforcement Learning. ‘Top machine learning approaches are categorized depending on the nature of their feedback mechani for learning, Most of the machine learning problems may be addressed by adopting one of these appros¢h™> : it (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) [Bal rech-Neo Publications. .A SACHIN SHAHVS™Machine Learning (SPPU - Sem 7 - Comp.) Introduction to Machine Learning)....Page no. (1-11) DH 1.5 SUPERVISED LEARNING is Supenised learning? { GQ. _ Explain supervised learning with the help of an example. GQ. How supervised learning works? + Learning that takes place based on a class of examples is referred to as supervised learning. It is earning based on labelled data. In short, while learning, the system has knowledge of a set of Inbelled data. This is one of the most common and frequently used learning methods + The supervised learning method is comprised of a series of algorithms that build mathematical models of certain data sets that are capable of containing both inputs and the desired outputs for that particular machine. The data being inputted into the supervised learning method is known as training data, and essentially consists of training examples which contain one or more inputs and typically only one desired output. ‘This output is known as a “supervisory signal.” «Inthe training examples for the supervised learning method, the training example is represented by an array, also known as a vector or a feature vector, and the training data is represented by a matrix. © The algorithm uses the iterative optimization of an objective function to predict the output that will be associated with new inputs. + Ideally, if the supervised learning algorithm is working properly, the machine will be able to correctly determine the output for the inputs that were not a part of the training data. © Supervised learning uses classification and regression techniques to develop predictive models. Classification techniques predict categorical responses, © Regression techniques predict continuous responses, for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading. ‘+ Let us begin by considering the simplest machine-learning task : supervised learning for classification. Let us take an example of classification of documents. In this particular case a learner learns based on the available documents and their classes. This is also referred to as labelled data. © The program that can map the input documents to appropriate classes is called a classifier, because it assigns a class (i.e., document - type) to an object (i.e., a document). The task of supervised learning is to construct a classifier given a set of classified training examples. ‘A typical classification is depicted in Fig. 1.5.1. © Fig. 1.5.1 represents a hyperplane that has been generated after learning, separating two classes - class A and class B in different parts. Each input point presents input-output instance from sample spare. In case of document classification, these points are documents. (ovFig. 1.5.1 : Supervised learning academic year 22-23) (P7-72) [Bl recn.eo Pubicaions..A SACHIN SHAH Venture (SPPU- New Syllabus.(Introduction to Machine Learning) 1% An unknown documont typo yj, Machine Learning (SPPU- Sem ® Loamning computes a soparating line or hyperplane among documents be decided by its position with respect to a separator. in supervised classification such as generalization, sloction of righ, ples are used for training in case of © There are a number of challenges data for learning, and dealing with variations. Labelled examy supervised learning. The set of Jabelled examples provided to the learning algorithm is called th, training set. Supervised learning is not just about classification, but it is the overall process that with guidelines maps to the most appropriate decision. %_ 1.5.1 How Supervised Learning Works? where the model Jearns about each + In supervised learning, models are trained using labelled dataset, the basis of test data type of data, Once the training process is completed, the model is tested on (a subset of the training set), and then it predicts the output. ‘The working of Supervised learning can be easily understood by the below example and diagram (Fig. 1.5.2). Labeled data OF a [Ey |sauae Ona —(%) aN A { Triangle ‘Model training : OA Test data (1oaFig. 1.5.2 How Supervised learning works? ‘Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape. ‘© If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square. © Ifthe given shape has three sides, then it will be labelled as a triangle. © Ifthe given shape has six equal sides then it will be labelled as hexagon. . ue after training, we test our model using the test set, and the task of the model is to identify the . oun i ane — a types of shapes, and when it finds a new shape, it classifies tht fee sees ate sides, ae predicts i output. in Supervised Learning : © First Determine the type of training dataset a ut (SPPU - New Syllabus W.ef academic year 22-23) (P7-72) [elrecnieo Publications... SACHIN SHAH Ve{SPPU -Sem7- Comp.) ((ntroduction to Machine Learning) © CollecvGather the labelled training data, Split the training dataset into training dataset, test dataset, and validation dataset, Determine the input features of the training dataset, which should have enough knowledge so that the ‘model can accurately predict the output. Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc, Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters, which are the subset of training datasets. Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which means our model is accurate, + Supervised learning can be further divided into two types of problems: Regression and Classification. Regression "Regression ‘algorithms are used if there is a relationship between the input variable and the output Variable, It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, ote. Below are some popular Regression algorithms which come under supervised learning : * Linear Regression + Regression Trees . Non-Linear Regression Bayesian Linear Regression . . Polynomial Regression Classification Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc. + Random Forest © Logistic Regression * Decision Trees © Support vector Machines WA 1.5.2 Advantages of Supervised Learning (2) With the help of supervised learning, the model can predict the output on the basis of prior experiences. (2) In supervised learning, we can have an exact idea about the classes of. objects. (8) Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc. %® 1.5.3 Disadvantages of Supervised Learning (D Supervised learning models are not suitable for handling the complex tasks. (2) Supervised learning cannot predict the correct output if the test data is different from the training dataset, (8) Training required lots of computation times, (4) In supervised learning, we need enough knowledge about the classes of object. (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) ‘Tech-Neo Publications... SACHIN SHAH VentureDH 1.6 UNSUPERVISED LEARNING What is Unsupervised Learning? ; {GQ What are the types of unsupervised learning? ; GQ. What are the advantages and disadvantages of unsupervised leaning? .d data, It is based more on similarity ang all similar items are clustered together in, ‘© Unsupervised learning refers to learning from unlabele differences than on anything else. In this type of learning, particular class where the label of a class is not known. in the absence of properly labeled data. In these scenarios learning is based moré on similarities and lly represented in ‘+ ~ It is not possible to learn in a supervised way there is need to learn in an unsupervised way. Here the differences that are visible. These differences and similarities are mathematicall unsupervised learning. we often want to be able to understand these objects and visualize + Given a large collection of objects, kid can separate birds from other animals. their relationships. For an example based on similarities, a It may use some property or similarity while separating, such as the birds have wings. fe aspects of those objects. Linnaeus devoted much of his with the goal of arranging similar rithms create similar + The criterion in initial stages is the most visibl life to arranging living organisms into a hierarchy of classes, organisms together at all levels of the hierarchy. Many unsupervised learning algo hierarchical arrangements based on similarity-based mappings. © ‘The task of hierarchical clustering is to arrange a sot of °°. objects into a hierarchy such that similar objects are 09% ooo grouped together. Non-hierarchical clustering seeks to © 0806 2060 partition the data into some number of disjoint clusters. 4 9% 2g 2 ‘The process of clustering is depicted in Fig. 1.6.1. 600 oO . S500 © A learner is fed with a sot of scattered points, and it § O, o8o generates two clusters with representative centroids after Cam learning. Clusters show that points ‘with similar ee Clusters properties and closeness are grouped together. (009F ig. 1.6.1 : Unsupervised learning © Unsupervised learning is a set of algorithms where the only information being uploaded is inputs. Te device itself, then, is responsible for grouping together and creating ideal outputs based on the datsit discovers. Often, unsupervised learning algorithms have certain goals, but they are not controlled? any manner. © Instead, the developers believe that they have created strong enough inputs to ultimately prosr®™ . tnachine to ereate stronger results than they themselves possibly could, The idea here is tht machine is programmed to run flawlessly to the point where it can be intuitive and inventive in most effective manner possible. an vet (SPPU - New Syllabus w.e.f academic year 22-28) (P7-72) [sl tech.Neo Publications..A SACHIN S!Machine Learning (SPPU - Sem 7 - Comp.) (introduction to Machine Leamin, age nk 5) + Tho information in the algorithms being run by unsupervised learning methods is not labelled, classified, or categorized by humans. Instead, the unsupervised algorithtn rejects responding to feedback in favour of identifying commonalities in the data, Tt then reacts based on the presence, or absence, of such commonalities in each new piece of data that is being inputted into the machine itself. + It is used to draw inferences from datasets consisting of input data without labelled responses. Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data. Applications for clustering include gene sequence analysis, market research, and object recognition. " %_1.6.1 Types of Unsupervised Learning Algorithm ‘The unsupervised learning algorithm can be further categorized into two types of problems: Q) Clustering Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities. (2) Association An association rule-is an unsupervised learning method which is used for finding the relationships between variables in the large database. It determines the set of items that.occurs together in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis. Below is the list of some popular unsupervised learning algorithms © Kemeans clustering + Neural Networks + KNN (cnearest neighbors) + Principle Component Analysis + Hiorarchal clustering + Independent Component Analysis © Anomaly detection ** Apriori algorithm ‘+ Singular value decomposition 1.6.2 Advantages of Unsupervised Learning (2) Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we don't have labeled input data. (2) Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data. (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) |Tech-Neo Publications... SACHIN SHAH Venture(Introduction to Machine Leamin, sem 7 + Comp.) Machine Learning (SPPI %®_ 1.6.3 Disadvantages of Unsupervised Learning (@) Unsupervised learning is intrinsically more difficult than supervised learning as it does not 5, corresponding output. (2) The result of the unsupervised learning algorithm might be less accurate as input data is not lab and algorithms do not know the exact output in advance. In practical scenariés there is always need to learn from both labeled and unlabeled data. Even whi learning in an unsupervised way, there is the need to make the best use of labeled data available, This referred to as semi supervised learning, Semi supervised learning is making the best use of two paradigm of learning - that is, learning based on similarity and learning based on inputs from a teacher. Senj supervised learning tries to get the best of both the worlds. YQ 1.6.4 Difference between Supervised and Unsupervised Learning ised learning and unsupervised learning? + Supervised and Unsupervised learning are the two techniques of machine learning. But both the techniques are used in different scenarios and with different datasets, Below the explanation of both learning methods along with their difference table is given. learning is a machine learning method in which models are trained using labeled data. In © Supervised I jable (X) with the supervised learning, models need to find the mapping function to map the input vari output variable (¥). Y= f&) + Supervised learning needs supervisior rn to train the model, which is similar to as a:student learns things in the presence of a teacher. Supervised learning can be used for two types of problems: Classification and Regression. Example : Suppose we have an image of different types of fruits. The task of our supervised learning model is to identify the fruits and classify them accordingly. So to identify the image in supervised learning, we will give the input data as well as output for that, which means we will train the model by the shape, size, color, and taste of each fruit. Once the training is completed, we will test the model by giving the now set of fruit. The model will identify the fruit and predict the output using a suitable algorithm. Unsupervised learning is another machine learning method in which patterns inferred from the unlabeled input data, The goal of unsupervised learning. is to find the structure and patterns from the input data. Unsupervised learning does not need any supervision. Instead, it finds patterns from the data by its own. Unsupervised learning can be used for two types of problems: Clustering and Association. «Example : To understand the unsupervised learning, we will use the example given above. So unlit supervised learning, here we will not provide any supervision to the model. We will just provide the input dataset to the model and allow the model to find the patterns from the data. With the help !* suitable algorithm, the model will train itself and divide the fruits into different groups according #? ‘most similar features between them. avert? (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) fl Tech-Neo Publications... SACHIN SHAMachine Leaming (SPPU - Sem 7 - Comp.) (introduction to Machine Leaming)....Page no. (1-17) ‘The main differences between Supervised and Unsupervised learning are given following Table 1.6.1 + Table 1.6.1 ‘Sr. - Supervised Learning * Unsupervised Learning No. 1. | Supervised learning algorithms are trained | Unsupervised learning algorithms are trained using labeled data. using unlabeled data. 2, | Supervised learning model takes direct feedback | Unsupervised learning model does not take any to check if it is predicting correct output or not. _| feedback. 3. | Supervised learning model predicts the output. | Unsupervised learning model finds the hidden patterns in data. In supervised learning, input data is provided to the model along with the output. In unsupervised learning, only input data is provided to the model. ‘The goal of supervised learning is to train the model so that it can predict the output when it is given new data, ‘The goal of unsupervised learning is to find the hidden patterns and useful insights from the nknown dataset. ‘Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train the model. Supervised learning can be categorized in Classification and Regression problems. Unsupervised Learning can be classified in Clustering and Associations problems. ‘Supervised learning can be used for those cases where we know the input as well as corresponding outputs. ‘Unsupervised learning can be used for those cases where we have only input data and no corresponding output data. Supervised.learning model produces an accurate result. Unsupervised learning model may give less ‘accurate result as comparéd to supervised learning. 10. Supervised learning is not close to true Artificial intelligence as in this, we first train the model for each data, and then only it can predict the correct output. Unsupervised learning is more close to the true Artificial Intelligence as it learns similarly as a child learns by his daily routine things experiences. 1. It includes various algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, Multi-class Classification, Decision tree, Bayesian Logic, ete. It includes various algorithms such as Clustering, KNN, and Apriori algorithm. (SPPU-- New Syllabus w.e.f academic year 22-29) (P7-72) ‘Tech-Neo Publications... SACHIN SHAH VentureMachine Learning)... Machine Leaning (SPPU- Sem7- Comp.) eens Page. (4 EINFORCEMENT LEARNING > eS wen7 RI £7GQ) Whats Reinforcement Learning? Explain with an example. + Reinforeement Learning is a feedback-based Machine learning technique in which an agent leams ., behave in an environment by performing the actions and seeing the results of actions. For each gogj action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback o penalty. + In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data, unlike supervised learning, + Since there is no labelled data, so the agent is bound to learn by its experience only. * RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such as game-playing, robotics, ete. + The agent interacts with the environment and explores it by itself. The primary goal of an agent in reinforcement learning is to improve the performance by getting the maximum positive rewards. + The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task in a better way. Hence, we can say that “Reinforcement learning is a type of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to act within that.” How a Robotic dog learns the movement of his arms is an example of Reinforcement learning. * It is a core part of Artificial intelligence, and all AI agent works on the concept’ of reinforcement learning. Here we do not need to pre-program the agent, as it learns from its own experience without any human intervention. Environment * Example : Suppose there is an AI agent present within a Eat : maze énvironment, and his goal is to find the diamond. The as agent interacts with the environment by performing some actions, and based on those actions, the state of the agent p..,, . ‘Actions gets changed, and it also receives a reward or penalty as state feedback. * The agent continues doing these three things (take action, change state/remain in the same state, and get feedback), and by doing these actions, he learns and explores the environment. (104 Big. 1.7.1 * The agent learns that what actions lead to positive feedback or rewards and what actions lead" negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it et a negative point. i (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) Tl rech-Neo Publications...A SACHIN SHAH vertMachine Leaming (SPPU - Sem 7 - Comp.) (Introduction to Machine Learning)....Page n ) +» For machine learning, the environment is typically represented by an “MDP” or Markov Decision Process. + These algorithms do not necessarily assume knowledge, but instead are used when exact models are infeasible. In other words, they are not quite as precise or exact, but they will still serve a strong ‘method in various applications throughout different technology systems. + The key features of Reinforcement Learning are mentioned below. © InRL, the agent is not instructed about the environment and what actions need to be taken. © Itis based on the hit and trial process. © The agent takes the next action and changes states according to the feedback of the previous action. © Theagent may get a delayed reward. © The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive rewards. ‘YB 1.7.1 Approaches to Implement Reinforcement Learning ‘There are mainly three ways to implement reinforcement-learning in ML, which are : 1. Value-based : The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy x. 2. Policy-based : Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function, In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward, The policy-based approach has mainly two types of policy «Deterministic : The same action is produced by the policy (x) at any state. * Stochastic : In this policy, probability determines the produced action. 3. Model-based : In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment. Here are important characteristics of reinforcement learning © There is no supervisor, only a real number or reward signal + Sequential decision making © ‘Time plays a crucial role in Reinforcement problems + Feedback is always delayed, not instantaneous + Agent's actions determine the subsequent data it receives (SPPU - New Syllabus w.0. academic year 22-23) (P7-72) ‘Tech-Neo Publications... SACHIN SHAH Venture(Introduction to Machine Learning) Machine Leaming (SPPU 298 no, RL can be used in almost any application. It is a learning based on experience algorithm, 4, maker algorithm, an algorithm that learns autonomously, an optimization algorithm that over time ne ly to maximize its reward, the reward can be defined by the engineer to reach the objective of the problem m, YQ 1.7.2 Challenges of Reinforcement Learning Here are the major challenges you will face while doing Reinforcement earning : (1) Feature/reward design which should be very involved (2) Parameters may affect the speed of learning. (8) Realistic environments can have partial observability. (4) Too much Reinforcement may lead to an overload of states which can diminish the results. (5) Realistic environments can be non-stationary. 1.7.3 Applications of Reinforcement Learning Here are applications of Reinforcement Learning : (1): Robotics for industrial automation. (2) Business strategy planning (3) Machine learning and data processing (4) Aireraft control and robot motion control (5) It helps you to create training systems that provide custom instruction and materials according to the requirement of students. 1.7.4 Reinforcement Learning vs. Supervised Learning GQ.” What isthe differe & Table 1.7.1 Parameters Reinforcement Learning — Supervised Learning Decision style | Reinforcement learning helps you to take / In this method, a decision is made on the your decisions sequentially. input given at the beginning. Works on Works on interacting with the | Works on examples or given sample data environment. Dependeneyon [Jn RL method learning decision is | Supervised learning the decisions which deciaton dependent. Therefore, you should give | are independent of each other, so labels : labels to all the dependent decisions, are given for every decision. Best suited Supports and ori better in AI, where | It is mostly operated with an interactive human interaction is prevalent. software system or applications. Exampl Chess game 5 paws, & Object recognition | (SPPU - New Syllabus w.e.f academi ic year 22-28) (P7:72) [Bl rect-neo Pubsizatins.. SACHIN SHAH Ve™*Machine Learning (SPPU - Sem 7 - Comp.) (Introduction to Machine Leaming)....Page no. (1-21) >| 1.8. INTRODUCTION TO SEMI-SUPERVISED LEARNING (1) Semi-Supervised learning is a type of Machine Learning algorithm that represents the intermediate ground between Supervised and Unsupervised learning algorithms. It uses the combination of labeled and unlabeled datasets during the training period. (2) Before understanding the Semi-Supervised learning, you should know the main categories of Machine Learning algorithms. Machine Learning consists of three main categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning (8) Further, the basic difference between Supervised and unsupervised learning is that supervised earning datasets consist of an output label training data associated with each tuple, and unsupervised datasets do not consist the same. Semi-supervised learning is an important category that lies between the Supervised and Unsupervised machine learning. Although Semi-supervised learning is the middle ground between supervised and unsupervised learning and operates on the data that consists of a few labels, it mostly consists of unlabeled data. As labels are costly, but for the corporate purpose, it may have few labels. (4) The basic disadvantage of supervised learning is that it requires hand-labeling by ML specialists or data scientists, and it also requires a high cost to process. Further unsupervised learning also has a limited spectrum for its applications. To overcome these drawbacks of supervised learning and unsupervised learning algorithms, the concept of Semi-supervised learning is introduced. In this algorithm, training data is. combination of both labeled and unlabeled data. However, labeled data exists with a very small amount while it consists of a-huge amount of unlabeled data. Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it helps to label the unlabeled data into labeled data, It is why label data is a comparatively, more expensive acquisition than unlabeled data. (5) We can imagine these algorithms with an example. Supervised learning is where a student is under the supervision of an instructor at home and college. Further, if that student is self-analyzing the same concept without any help from the instructor, it comes under unsupervised learning. Under semi- supervised learning, the student has to revise itself after analyzing the same concept under the guidance of an instructor at college. ‘A Semi-Supervised algorithm assumes the following about the data: 1, Continuity Assumption : The algorithm assumes that the points which are closer to each other are more likely to have the same output label. 2. Cluster Assumption : The data can be divided into discrete clusters and points in the same cluster are more likely to share an output label. 8. Manifold Assumption : The data lie on a manifold of much lower dimension than the input space. ‘This assumption allows the use of distances and densities which are defined on a manifold. (SPPU-- Now Syllabus wef academic year 22-28) (P7-72) ‘ech-Neo Publications...A SACHIN SHAH Venture(introduction to Machine Learning) ..Page no Machine Learning (SPPU - Sem 7- Comp.) Practical Applications of Semi Seml-Supervised learning 1. Speech Analysis : Since labelling of audio files is very intonsive task, semi-supervised learning i, natural approach to solve this problem. Internet Content Classification : Labeling each webpage is impractical and unfeasible process ang uses Semi-Supervised learning algorithms. 8. Protein: Sequence Classification + Since DNA strands are very large in size, hence here Semi. Supervised learning is must in this field. 1.9. MODELS OF MACHINE LEARNING if di YW 1.9.1 Geometric Models «In Geometric models, features could be described as points in two dimensions (x- and y-axis) or a three- dimensional space (2, y, and 2). Even when features are not intrinsically geometric, they could be modelled in a geometric manner (for example, temperature as a function of time can be modelled in two axes), In geometric models, there are two ways we could impose similarity. © We could use geometric concepts like lines or planes to segment (classify) the instance space. These are called Linear models «Alternatively, we can use the geometric notion of distance to represent similarity. In this case, if two points are close together, they have similar values for features and thus can be classed as similar. We call such models as Distance-based models YS 1.9.2 Probabilistic Models + In contrast to deterministic models, where the relationship between quantities is already known, Probabilistic models are based on the assumption that relationship between quantities which is reasonably accurate but other components are also taken into consideration. ‘Thus probabilistic models are statistical models, which give bility di int for give probability distributi listributions to acco Probate, fa . aaa models from the basis in other areas such as machine learning, artificial intelligence, and lata analysis. Their formulation and solution rest on the two basic rt : that is, di n th i ili 7m ic rules of probability theory, that is, * We mention an example : if one live : 8 in a cold cli difficult when snow falls and covers the “mate, one knows that traffic tends to be mor roads, ‘© can go @ step further and make ; 8 hypothesis ; i eo Pothesis : Thore will be a strong correlation between sno" Probabilistic models are used jr ‘mechanies and theoretical comput in a varioty of disciplines, ter science, including statistical physics, quant” (SPPU- New Sylabus wa. a Hef academic year 22-23) (P7. ae Bl rech-Neo Publications...A SACHIN SHAH Venli®Machine Leaming (SPPU - Sem comp. {Introduction to Machine Learning) 1.9.3 Logical Models + Logical models use a logical expression to divide the instance space into segments and hence construct grouping models. A logical expression is an expression that returns a Boolean value, i.e., a True or False outcome, Once the data is grouped using a logical expression, the data is divided into homogeneous groupings for the problem we are trying to solve, ‘There are two types of logical models: Tree models and Rule models, © Rule models consist of a collection of implications or IF-THEN rules. For tree-based models, the ‘part defines a segment and the ‘then-part’ defines the behaviour of the model for this segment. Rule models follow the same reasoning ° ‘Tree models can be seen as a particular type of rule model where the if: parts of the rules are organised in a tree structure. Both Tree models and Rule models use the same approach to supervised learning 1.9.4 Groping Models ‘Tree models repeatedly split the instance space into smaller subsets. ‘Trees are usually of limited depth and don’t contain all the available features. Subsets at the leaves of the tree partition the instance space with some finite resolution. Instances filtered into thé same leaf of the tree are treated the same, regardless of any features not in the tree that might be able to distinguish them. 1.9.5 Grading Model ‘+ They don't use the notion of segment © Forms one global model over instance space Grading models are (usually) able to distinguish between arbitrary instances, no matter how similar they are Resolution is, in theory, infinite, particularly when working in a Cartesian instance space. Support vector machines and other geometric classifiers are examples of grading models. + They work in a Cartesian instance space, ‘* Exploit the minutest differences between instances, % 1.9.6 Groping and Grading Models * The key difference between grouping and grading models is the way they handle the instanee space Grouping models * Grouping models break up the instance space into groups or segments, the number of which is determined at training time, (SPU - New Syllabus w.e.f academic year 22-28) (P7-72) ‘Tech-Noo Publications...A SACHIN SHAH VentureMachine Leaming (SPPU - Sem 7 - Comp.) (rroducton to Machine Leaming).Pape ra + They have fixed resolution cannot distinguish instances beyond a resolution © At the finest resolution grouping models assign the majority class to all instances that fall into y, segment. , + Determine the right segments and label all the objects in that segment. YW _ 1.9.7 Groping versus Grading Models + Some models combine the features of both grouping and grading models. + Linear classifiers are a prime example of a grading model. «Instances on a line or plane parallel to the decision boundary can’t be distinguished by linear model, + There are infinitely many segments. 0 PARAMETRIC AND NON-PARAMETRIC MODELS «Machine learning models can be parametric or non-parametric. + Parametric models are those that require the application of some parameters before they can be used to make predictions. «Non-parametric models do not rely on any specific parameter setting and hence they often produce more accurate results. ‘We mention the difference between Parametric and Non-Parametric Methods. Sr. Parametric Methods Non-Parametric Methods No. : . 1. | Parametric methods use a fixed number of | Non-parametric methods use flexible number a parameters to build the model. parameters to build the model. | ration data. 2. | Parametric analysis is for testing group|A non-parametric analysis is for testing) means. medians. | 3. | Itis applicable only for variables. It is applicable for both variable and Attribute. | 4, | It always considers strong assumptions about | It generally considers fewer assumptions wl data. data, | 5. | Parametric methods require lesser data than | Non-parametric methods require much mor Non-parametric methods. data than parametric methods. 6. | Parametric data handles intervals data or | Non-parametric methods handle original dat® Parametric distribution. methods follow —_— normal There is no assumed distribution in parametric methods. (SPPU- New Syllabus w.e.{ academic year 22-23) (P7-72) é [el rect. Neo Pubications..A SACHIN SHAM Ve™Machine Leaming (SPPU - Sem 7 - Comp.) {Introduction to Machine Learning) ..Page no. (1-25) Sr. Parametric Methods Non-Parametric Methods No. 8. | The output generated by parametric methods | The output generated cannot be seriously can be easily affected by outliers. affected by outliers Parametric methods function well in many | Non-parametric Methods can perform well in situations but its performance is at peak (top) | many situations but its performance is at the when the spread of each group is different. __| top when the spread of each group is the same. 10. | Parametric Methods have more statistical | Non-parametric methods have less statistical power than Non-Parametrie methods. power than parametric methods. 11. | As for as the computation is concerned, these | As far as the computation is concerned, these methods are computationally faster than the | methods are computationally slower than the Non-parametric methods. parametric methods. ‘Examples : Logistic Regression, Naive Bayes | Examples : KNN, Decision Tree Model, ete. Model ete. F MACHINE LEARNING YS 1.11.1 Data Formats ‘+ Each data format represents how the input data is represented in memory. + Each machine learning application performs well for a particular data format and worse for others. Choosing the correct format is a’ major optimisation technique. ‘+ ‘There are four types of data formats ; which are commonly used. () NHWC @ NCHW (3) NCDHW (4) NDHWC Each letter in the formats denotes a particular aspect or dimension of the dat (N+ Bateh size t is the number of images passed together as a group for inference. (i) C : Channel : is the number of data components that make ‘a data point for the input data. It is 3 for opaque images and 4 for transparent images, ii) W: Width : is the width / measurement in x—axis of the input data (iv) H: Height : Is the height / measurement in. y ~ axis of the input data. (v) D+ Depth : is the dopth of the input data, (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) Tech-Nleo Publications... SACHIN SHAH VentureMachine Learning (SPPU - Sem7 jomp. (1) NHWe NHWG, denotes (Batch size, Height, Width, Channel). This implies that aa a aaa Where the first dimension roprosonts batch size and accordingly. This 4D array is lai mory in roy imagos ] major order. [ commonly used dats (2) NCHW NCHW denotes (Batch Size, Channel, Height, Width). This means that there is a 4D array where thy first dimension represents batch size and so on. ‘This 4D array is laid out in-memory in row-major order. [Commonly used data — images] (3) NCDHW NCDHW denotes (Batch Size, Channel, Depth, Height, Width). This means that there is a 5D array where the first dimension represents batch size and so on. This 5D array is laid out in memory in row major order. [Commonly used data : Video] (4) NDHWc NDHWC denotes (Batch Size, Depth, Height, Width, Channel). This means there is a 5D array where the first dimension represents batch size and accordingly. This 5D array is laid out in memory in row major order. Commonly used data : Video Software : Tensor flow WS 1.11.2 Learnability * Learnability is a quality of products and interfaces that allows users to become familiar with them. It makes good use of all their features and capabilities. * Avery learnable product is sometimes said to be intuitive because the user can immediately grasp hor to interact with the system. + First time lerarnability refers to the degree of ease with which a user can learn a newly developed system without referring to its documentation, e.g., manuals, user guides or frequently asked questions (FAQ) lists. * One elements of first — time learnability is discoverability, i.e. ; the degree of ease with which the us can find all the elements and features of the new system. + Learnability over time is the capacity of a user to gain experience in working with a given syste= through repeated interaction. + Comparatively simple systems with good learnability are said to have short or steep learning curves. implies that most learning associated with the system happons quickly. * More complex systems involve a longer learning curve. * In software testing learnability, according to ISO 9126, is the capability of a software product ® enable the user to learn how to use it. ¢ Lernability is considered as an aspect of usability and is of major concern in the design of com?! software applications. : nt (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) [el roch-Neo Pubtcaton..A SACHIN SHAHVE"Machine Leaming (SPPU - Sem 7'- Comp.) {Introduction to Machine Learning)....Page no. (1-27) + In computational learning theory, learnability is the mathematical analysis of machine learning. It is also employed in language acquisition in arguments within linguistics. +The skill of learnability confers a future value by making one agile (active). It is a currency that is rewarded with better employability and high growth prospects. Learning does not and with school or college YW 1.11.3 Statistical learning Approaches + Statistical learning theory is a tramework for machine learning ~ drawing from the fields of statistics and functional analysis. «Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. + Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition and bioinformatics. ‘+ Statistical learning is a set of tool for understanding data. + These tools come under two classes : Supervised learning and Unsupervised learning. * Statistical learning is mathematical intensive which is based on the coefficient estimator and requires a good understanding of data. + On the other hand, Machine Learning identifies patterns from the dataset through it the iterations which does not require much human effort. = Lexical Acquisition * The role of statistical learning in language acquisition is well documented in lexical acquisition. * One important contribution to infant's understanding of segmenting words from a speech is their ability to recognize statistical regularities of the speech, that is heard from their environment. "© Statistical Algorithm + Statistical algorithms create a statistical model of the input data, which is in most cases represented as a probabilistic tree data structure. = * Subsequences with a higher frequency are represented with shorter codes. * The type of algorithm used is linear regression. It is the popular algorithm in machine learning and statistics. * — This model will assume a linear relationship between the input and the output variable. * Itis represented in the form of linear equation which has a set of inputs and a predictive output. ‘© Types of statistical analysis G) Descriptive statistical analysis Inferential statistical analysis, ii) Associational statistical analysis ; (iv) Predictive analysis, () Prescriptive analysis (vi) Exploratory data analysis, (vii) Causal analysis (viii) Data collection Chapter Ends...Feature Engineering CHAPTER 2 Concept of Feature, Preprocessing of data, Normalization and Scaling, Standardization, Managing Missing Values, Introduction to Dimensionality Reduction, Principal Component Analysis (PCA), Feature Extraction: Kemel PCA, Local Binary Pattern. Introduction to Various Feature Selection Techniques; Sequential Forward Selection, Sequential Backward Selection Statistical Feature Engineering: count-based, Length, Mean, Median, Mode, ete. based Feature vector creation. Multidimensional scaling, Matrix Factorization Techniques 24 Concept of Feature... GQ. Define Feature Engineering. Explain the four processes in feature engineerin: 2.2 Preprocessing of data. GQ. Define data preprocessing, Explain the steps involved in data preprocessing... 2.3 Normalization and Scaling GQ. Explain the concept of scaling and normalization with its types. 23.4 Types of Scaling 232 Types of Normalization 24 Standardization. 25 Managing Missing Values... ing values are handled in data preprocessing: GQ. Explain how the mi 2.6 Introduction to Dimensionality Reduction... 27 Principal Component Analysis (PCA) Ga, Explain how PCA helps in dimensionality reduction 28 Feature Extraction..Machine Learning (SPPU - Sem 7 - Comp.) (Feature Engineering) ..Page ny GQ. Explain how kemel PCA and Local Binary Pattern helps in dimensionality reduction. 2.8.1 Kemel PCA... 2.8.2 Local Binary Pattern (LBP)... 2.9 Introduction to Various Feature Selection Techniques... GQ. What is feature selection? Explain different feature selection algorithms. ... GQ. Explain forward and backward feature selection process. . 2.9.1 Forward Feature Selection... 2.9.2 Backward Feature Selection 210 Statistical Feature Engineering... GQ. Explain different statistical measures in feature engineering with suitable examples. 2.10.1. Measures of Central Tendency... 2.10.2 Dispersion of Data.. 2.11 Multidimensional Scaling... GQ. Explain the concept of multidimensional scaling... 2.12 — Matrix Factorization Technique... GQ. Explain the concept of matrix factorization in recommender system. .. . Chapter Ends...Machine Learning (SPPU - Sem 7 - Comp.) (Feature Engineering)....Page no. (2-3) + Feature engineering is the pre-processing step of machine learning, which is used to transform raw data into features that can be used for creating a predictive model using Machine learning or statistical ‘Modelling. «Feature engineering is the pre-processing step of machine learning, which extracts features from raw data. + It helps to represent an underlying problem to predictive models in a better way, which as a result, improve the accuracy of the model for unseen data. + The predictive model contains predictor variables and an outcome variable, and while the feature engineering process selects the mést useful predictor variables for the model. * Generally, all machine learning algorithms take input data to generate the output. The input data remains in a tabular form consisting of rows (instances or observations) and columns (variable or attributes), and these-attributes are often known as features. * For example, an image is an instance in computer vision, but a line in the image could be the feature. Similarly, in NLP, a document can be an observation, and the word count could be the feature. © So, we can say a feature is an attribute that impacts a problem or is useful for the problem. + Feature engineering in ML contains mainly four processes: Feature Creation, Transformations, Feature Extraction, and Feature Selection. These processes are described as below : (@) Feature Creation : Feature creation is finding the most useful variables to be used in a predictive model. The process is subjective, and it requires human creativity and intervention. The new features are created by mixing existing features using addition, subtraction, and ration, and these new features have great flexibility. (2) Transformations : The transformation step of feature engineering involves adjusting the predictor variable to improve the accuracy and performance of the model. For example, it ensures that the model is flexible to take input of the variety of data; it ensures that all the variables are on the same scale, making the model easier to understand. It improves the model's accuracy and ensures that all the features are within the acceptable range to avoid any computational error. (@) Feature Extraction : Feature extraction is an automated feature engineering process that generates new variables by extracting them from the raw data, The main aim of this step is to reduce the volume of data 20 that it can be easily used and managed for data modelling. Feature extraction methods include cluster analysis, text analytics, edge detection algorithms, and principal components analysis (PCA). (SPU - Now Sylabus w.6 4 acadeinie year 22-28) (P7-72) [EB rech-noo Publications... SACHIN SHAH VentureMachine Learning (SPPU - Sem 7 - Comp.) (Feature Engineering) 299 No. (a4 (4) Feature Selection : While developing the machine learning model, only a few variables jn the datasot are useful for building the model, and tho rest features are either redundant or irrelevari Tfwo input the datasct with all these redundant and irrelevant features, it may negatively impag and reduce the overall performance and accuracy of the model. Hence it is very important 4 identify and solect the most appropriate features from the data and remove the irrelevant or leas important features, which is done with the help of feature selection in machine learning, "Feature selection is a way of selecting tho subsot of the most relevant features from the original feature, set by removing the redundant, irrelevant, or noisy features." sing. SRS Data preprocessing is the process of preparing raw data to be used on a machine learning model. It is the first and most important step in developing a machine learning model. When developing a machine learning project, we do not always come across clean and formatted data. And, before performing any operation on data, it must be cleaned and formatted. As a result, we use the data preprocessing task for this. Real-world data typically contains noise, missing values, and may be in an unusable format that cannot be used directly for machine learning models. Data preprocessing is a necessary task for cleaning the data and preparing it for a machine learning model, which improves the accuracy and efficiency of the machine learning model. * It involves below steps : Get the Dataset Importing Libraries Importing the Datasets Handling Missing Data Encoding the Categorical Data Splitting the Dataset into the Training set and Test set Feature Scaling Get the Dataset a) The first thing we need to create a machine learning model is a dataset, because a machine learni=? model is entirely dependent on data, The datasot is the collection of data for a specific problem int? proper format. Datasets can be of various formats for various purposes, For example, if we want to create a mach? learning model for business purposes, tho dataset will be different from the dataset required for © Patient. As a result, each dataset is distinct from the others, We usually save the dataset as a CSV before using it in our code. However, there may be times when we need to use an HTML or xlsx fle. (SPPU - New Syllabus w.e. academic year 22-23) (P7-72) ast &) Tech-Neo Publications... SACHIN SHAH Ve™: : Machine Learning (SPPU - Sem 7 - Comp.) (Feature Engineering) ..Page no. (2-6) > (2) Importing Libraries ‘To perform data preprocessing with Python, we must first import some predefined Python libraries. These libraries are used to carry out specific tasks. For data preprocessing, we will use three specific libraries, which are: Numpy, Matplotlib and Pandas. (8) Importing the Datasets We must now import the datasets that we have gathered for our machine learning project. However, before importing a dataset, we must make the current directory a working directory. To import the dataset, we will use the pandas library's'read_esv() function, which reads a esv file and performs various operations on it. We can use this function to read a csv file both locally and via a URL. It is essential in machine learning to distinguish the feature matrix (independent variables) from the dataset. (@ Handling Missing Data The next step in data preprocessing is to deal with missing data in the datasets. If our dataset contains some missing data, it may pose a significant challenge to our machine learning model. Asa result, handling missing valued in the dataset is required. ‘There are primarily two approaches to dealing with missing data : @ By removing the specific row : The first method is commonly used to deal with null values. In this manner, we simply delete the specific row or column that contains null values. However, this method is inefficient, and removing data may result in information loss, resulting in an inaccurate output. Gi) By calculating the mean : In this way, we will calculate the mean of that column or row which contains any missing value and will put it on the place of missing value. This strategy is useful for the features which have numeric data such as age, salary, year, ete. Here, we will use this approach. (6) Encoding the Categorical Data Categorical data is data which has some categories sich as Country, Because machine learning models are entirely based on mathematics and numbers, having a categorical variable in our dataset may cause problems when building the model. As a result, these categorical variables must be encoded into numbers, We ean use OneHotEncoding or Label Encoding technique. ©) Splitting the Dataset into the Training set and Test sot In machine learning data preprocessing, we divide our dataset into a training set and a test set. This is an important step in data preprocessing because it allows us to improve the performance of our machine learning model. (SPPU - New Syllabus w.0.f academic year 22-23) (P7-72) ‘Tech-Neo Publications...A SACHIN SHAH VentureMac . ichine Leaming (SPPU - Sem 7 - comp, (Feature Engineering)...Page no : ataset and then tested it on another. 1 yp Assur is me we trained our machine learning model on one d the models. then be difficult for our model to understand the correlations between Training Set :A subset of dataset to train the machine learning model and we already know 4, output. and by using the test set, mods *© Test set : A subset of dataset to test the machine Jearning model, predicts the output. > (®) Feature Sealing * Feature scaling is the final step in machine learning data preprocessing. It is a method fir standardizing the independent variables of a dataset within a given rand®- jables in the same range and scale so that no one variable dominats * In feature sealing, we place our vari the other. © Scaling and normalization are so similar that they're often applied interchangeably, but they hav: different effects on the data. = In both sealing and normalization, we are transforming the values of numeric variables so that the transformed data points have specific helpful properties. These properties can be exploited to crea better features and models. «In Sealing, we're changing the rany changing the shape of the distribution of the data. In scaling, we're transforming the data so that it fits within a specific scale, like 0-100 ge of the distribution of the data, while in normalization, wer or 0-1 Usually, 0-1. You want to scale data especially when you're using methods based on measures of how 8 apart data points are. © Normalization is a more radical transformation. The point of norm: observations so that they can be described as a normal distribution. alization is to change s° YS 2.3.1 Types of Scaling (1) Simple Featire Sealing : This method simply divides each value by the maximum value for tht feature, The resultant values are in the range between zero(0) and one(1). Simple-feature sealing is defacto, scaling method used on image-data, when we scale images by dividing each image by (maximum image pixel intensity). The formula for simple feature scaling is given as + Xoa Xnew = KaeMachine Leaming (SPPU - Sem som.) (Feature Engineerin Page no. (2) Min-max Scaling : This scaler takes each value and subtracts the minimum and then divides by the range(max-min).The resultant values range between zero(0) and one(1). The formula for min-max scaling is given as : ‘%® 2.3.2 Types of Normalization (1) Z-Score or Standard Score In this technique, values are normalized based on mean and standard deviation of the data A. The formula used is : Xaaé Ha ee ee a) Ha is the standard deviation and mean of A respectively. Example : If mean salary is $54,000 and standard deviation is $16,000, then the z-score value of salary ay) pq 23600 — 54001 $73,600 will be ~~ qgq99 = 1-225 (2) Box-cox Transformation * A Box-Cox transformation is a transformation of a non-normal dependent variable into a normal shape. ‘The Box-Cox transformation is named after statisticians George Box and Sir David Roxbee Cox. * At the heart of the box-cox normalization is an exponent lambda (2), which varies from — 5 to 5. All values of 2. are considered and the optimal value for your data is selected; The “optimal value” is the one which results in the best approximation of a normal distribution curve. logy); if¥=0 m=) GAD [3 otherwise + Standardization is necessary when features of the input data set have wide ranges or are simply measured in different measurement units (such as pounds, metres, miles, ete.). + For many machine learning models, these variations in the initial feature ranges are problematic. For models that compute distance,, for instance, if one of the features has a wide range of values, the distance will be determined by this specific feature. + Example : Let's say we have a 2-dimensional data set with the variables Height in Meters and Weight in Pounds, both of which have ranges of [1 to 2] Meters and [10 to 200] Pounds, respectively. No matter what distance-based model you run on this set of data, the Weight feature will take precedence over the Height feature and contribute more to the distance calculation simply because it contains larger values. (SPPU - New Syllabus w.e.{ academic year 22-23) (P7-72) [Bl recn co Pubicatons..A SACHIN SHAH VentureEngineorin Machine Learning (SPPU = Sem7~ Comp.) (Feature Engineer rn So, standardization is the way to avoid this issue by transforming features to comparable scales, + Zescore is one of the most popular methods to standardize data, and can be done by subtracting gh, mean and dividing by the standard deviation for each value of each feature. . value — meat 2 = Gandard deviation + Once the standandization is done, all the features will have a mean of zero, a standard deviation of one, and thus, the same scale. DA 2.5 MANAGING MISSING VALUES g values are handled in data preprocessing. Imagine that you are asked to analyze a dataset. You find that there are many tuples having no recorded value for several attributes such as customer income. So, the question arising here is how to fill in the missing values for this attribute. There are several methods as discussed here. @) Ignore the tuple : When the class label is missing, this technique is used. However, unless the tuple contains numerous attributes with missing values, this approach is not particularly useful. (2) Fill in the missing value manually : This approach is effective on small data set with some missing values. (8) Use a global constant to fill in the missing value : You can replace all missing attribute values with global constant, such as a label like “Unknown” or — =. (@) Use a measure of central tendency for attribute (e.g. the mean or median) to fill in the missing value : For example, suppose customer average income is $25000, then you can use this value to replace missing value for income. () Use the attribute mean or median for all samples belonging to the same class as the given tuple : For example, if you are classifying customers according to their credit_score, then you can replace the missing value with the mean income value for customers in the same credit_score category as that of the given tuple. If the data distribution for a given class is skewed, then use the medien value. (6) Use the most probable value to fill in the missing value : This can be determined using regression, Bayesian classification or decision-tree induction. ——————————_——_————————————————————— = INTRODUCTION TO DIMENSIONALITY REDUCTION —__ SSS + The number of input variables or features for a dataset is referred to as its dimensionality. + Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. «More input features often make a predictive modeling task more challenging to model, more gener=¥Y referred to as the curse of dimensionality. ye [Bal rech-eo Publications...A SACHIN SHAH Ve 1ePlI - New Svilabus w.e.f academic year 22-23) (P7-72)Machine Lenring (SPP Y Sem 7 Comp) (Feature Engineering)....Page no. (2-9) + High-dimensionality statistics and dimensionality reduction techniques are often used for data visualization. + Nevertheless, these techniques ean be used in applied machine learning to simplify a classification or regression dataset in order to better fit a predictive model. + There are a number of advantages that makes dimensionality reduction important: (J) The model accuracy is improved when there is less data. (2) When dealing with fewer dimensions, it requires a lot less computing power and also since the dater is lesser, the algorithm can train faster. : (3) Lesser data requires lesser storage space. (4) Lesser dimensions can work with algorithms that cannot be used with larger dimensions. (6) Lesser features come with the benefit of noise and redundant variables. + Dimensionality reduction has two main components : (2) Feature selection : This is the proceés where the universal set of features or variables is used to extract a subset that can be used to model the problem. Feature selection is done as Filter or Wrapper or Embedded. (2) Feature extraction : This is used to reduce data that is in a higher-dimensional space to a lower- dimensional space. For example as to how features in 3 dimensions can be reduced to two dimensions for simplicity. © Some of the dimension reduction techniques include : (Q) Principal Component Analysis (PCA) : This method is commonly used with continuous data. It works under the condition that the variance in mapped data in the lower dimensional space needs to be at the peak when the data is mapped from a higher-dimensional space. I other words it projects data where variance increases and the features with the most variance become the prineipal components. (2) Linear Discriminant Analysis (LDA) : This project’s data in such a way that the separability of the class is maximized. Points from the same class are projected closely together while those from - different classes are spaced far apart, (8) Generalized Discriminant Analysis (GDA) : The GDA is quite an effective approach when it ‘comes to extracting non-linear features. Fr nee RELL IEEE {ga sine Pes in din + Principal Component Analysis (PCA) is an unsupervised learning algorithm that is used for the dimensionality reduction in machine learning. + It is a statistical process that converts the observations of correlated features into a sot of linearly uncorrelated features with the help of orthogonal transformation, These new transformed features are called the Principal Components, (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) }Tech-Neo Publications...A SACHIN SHAH VentureMachi lachine Leaming (SPPU - Sem 7 - Comp. (Feature Engineering). * Ibis one of the popular tools that is used for exploratory data analysis and predictive modeling, + Itisa technique to draw strong patterns from the given dataset by reducing the variances. + Some real-world applications of PCA are image processing, movie recommendation system, optimizing ~ the power allocation in various communication channels. + Itisa feature extraction technique, so it contains the important variables and drops the least importan, variable. Steps in PCA we need to standardize the dataset and for that, we need to calculate @) Standardize the dataset : First, ethod to standardize the dataset. the mean and standard deviation for each feature. We use Z-score m (2) Calculate the covariance matrix for the whole dataset : The covariance matrix for the given dataset will be calculated as below For population * Cov (x,y) = For sample + Cov (x, 9) = a ‘Since we standardize the dataset, so the mean for each feature is 0 and the standard deviation is 1 For example, for a 9-dimensional data set with 3 variables x,y, and 2, the covariance matrix is a 3x3 matrix of this from: ‘Cov (x,x) Cov (x, y) Cov (x, z) Cov (y,x) Cov (y, y) Cov (y, z) Cov (z, x) Cov (z, y) Cov (2, 2). (8) Calculating Eigen values and Eigen vectors ris a nonzero vector that changes at most by a scalar factor when that which the eigenvector ‘An eigenvector lines transformation is applied to it. The corresponding eigenvalue is the factor by sealed. Let ‘A be a square matrix (in our case the covariance matrix), v a vector and 2a scalar that satis Av=)w, then 2 is called eigenvalue associated with eigenvector v of A. Rearranging the above equation, Av - Av = 0; (A-AI) v=0 Bigen vectors ean be obtained by solving the (AAI) v= 0 equation for v veetor with different 2 valve @ Sort the eigenvectors from the highest eigenvalue to the lowest. The eigenvector wit? ie highest eigenvalue is the first principal component. Higher eigenvalues correspond to greater mous of shared variance. (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) [el rech-neo Publications...A SACHIN SHAH vet*‘Machine Leaning (SPPU - Sem 7 - Comp.) (Feature Engineering)....Page no. (2-11) () Select the number of principal components. Select the top N eigenvectors (based on their eigenvalues) to become the N principal components. The optimal number of principal components is both subjective and problem-dependent. Usually, we look at the cumulative amount of shared variance explained by the combination of principal components and pick that number of components, which still significantly explains the shared variance. (6) Transform the original matrix : Feature matrix * top k eigenvectors = Transformed Data © Advantages of PCA (2) Easy to compute : PCA is based on linear algebra, which is computationally easy to solve by computers. (2) Speeds up other machine learning algorithms : Machine learning algorithms converge faster when trained on principal components instead of the original dataset. (8) Counteracts the issues of high-dimensional data : High-dimensional data causes regression-based algorithms to overfit easily. By using PCA beforehand to lower the dimensions of the training dataset, \we prevent the predictive algorithms from overfitting. *@ Disadvantages of PCA (@) Low interpretability of principal components : Principal components are linear combinations of the features from the original data, but they are not as easy to interpret. For example, it is difficult to tell which are the most important features in the dataset after computing principal components. (@) The trade-off between information loss and dimensionality reduction : Although dimensionality reduction is useful, it comes at a cost. Information loss is a necessary part of PCA. Balancing the trade-off between dimensionality reduction and information loss is unfortunately a necessary compromise that we have to make when using PCA. 1a Patera Help POD Suu o boone Se Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing. * A characteristic of these large data séts is a large number of variables that require a lot of computing resources to process. + Feature extraction is the name for methods’ that select and /or combine variables into features, effectively reducing the amount of data that must be processed, while still accurately and completely describing the original data set. + The feature extraction technique gives us new features which are a linear combination of the existing features, The new set of features will have different values as compared to the original feature values. © The main aim is that fewer features will be required to capture the same information. (SPPU - New Syllabus w.e.f academic year 22-23) (P7-72) Tlrecn.nieo Publications...A SACHIN SHAH Venture(Feature Engineering). Mac e Learning (SPPU - Sem 7: Comp.) * We might think that choosing fewer features might lead to underfitting but in the case of the Featur, Extraction technique, the extra data is generally noise WW 2.8.1 Kernel PCA + Kernel PCA was developed in an effort to help with the classification of data whose decision boundaries are described by non-linear function. which the decision boundary becomes linear. «Here's an easy argument to understand the process. Suppose the decision boundary is described by third order polynomial = a + by + cx? + dss, Now, plotting this function in the usual x - y plane will produce a wavy line, something similar to the made-up decision boundary in the right-hand-side picture above. + Suppose instead we go to a higher dimensionality sp space the third order polynomial becomes a linear hyperplane. + So, the trick i Tinearity of the boundary. In this way the usual PC: «© The iden is to go to a higher dimension space in ‘ace in which the axes are x, x?, x° and y. In this 4D function and the decision boundary becomes a « to find a suitable transformation (up-scaling) of the dimensions to ty and recover the 1A decomposition is again suitable. «This is all good but, as always, there's a catch. A generic non-linear combination of the original ‘variables will have 2 huge number of new variables which rapidly blows uP the computational complexity of the problem. = However, we won't know the exact combination of non-linear terms we need, hence the large number of combinations that are in principle is required. er simple example. Suppose we have only two wavelengths, © Let’s try and explain this issue with anoth the second order of these call them 1 and 4p. Now suppose we want to take a generic combination up to ‘hvo variables, The new variable sot will then contain the following: [s,2,hfeAds Jala). So, we went v= 2 variables to 5, just by seeking a quadratic combination! «Since one in general has tens or hundreds of wavelengths, polynomials, you can get the idea of the large number of variables © Now fortunately there is a solution to this problem, which is common] «Ok, let’s call x the original set of n variables, let's call 6 (x) the non-linear combination (mapping) these variables into a m > nm >n dataset. «Now we can compute the kernel function x (x) = @ (x) 61(8). Note that the kernel funetion in pra an array even though we are using a function (continuous) notation. and would like to consider higher ord: that would be required. ly referred to as the kernel trick tice if «Now, it tums out that the kernel funetion plays the same role as the covariance matrix did in lines? PCA. «This means that we can calculate the eigenvalues and eigenvectors of the kernel matrix and thes? “ int the the new principal components of the m-dimensional space where we mapped our original variables «The kernel trick is called this way because the’ kernel function (matrix) enables us to get nua vertMachine Learning (SPPU - Sem 7 - Comp.) (Feature Engineering)....Page no_(2-13) eigenvalues and eigenvector without actually calculating (x) explicitly. This is the step that would blow up the number of variables and we can circumvent it using the kernel trick. + There are of course different choices for the kernel matrix. Common ones are the Gaussian kernel or the polynomial kernel. + Apolynomial kernel would be the right choice for decision boundaries that are polynomial in shape. + A Gaussian kernel is a good choice whenever one wants to distinguish data points based on the distance from a common centre. «Once we have the kernel, we follow the same procedure as for conventional PCA. Remember the kernel plays the same role as the covariance matrix in linear PCA, therefore we can calculate its eigenvalues and eigenvectors and stack them up to the selected number of components we want to keep. : YS 2.8.2 Local Binary Pattern (LBP) ‘© Local Binary Pattern (LBP) is a very efficient texture operator which labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result'as a binary number. + The LBP feature vector, in its simplest form, is created in the following manner : ~. me faeyey Sifis] Twesno [ian] } Bay c0eteot* [sis lolol J (89) (Divide the examined window into cells (e.g. 16x16 pixels for each cell). (i) For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top, left-middle, left- bottom, right-top, ete.). Follow the pixels along a circle, i.e. clockwise or counterclockwise. (ii) In the above step, the neighbours considered can be changed by varying the radius of the circle around the pixel, R and the quantization of the angular space P. (iv) Where the center pixel's value is greater than the neighbor's value, write "0". Otherwise, write "I". ‘This gives an 8-digit binary number (which is usually converted to decimal for convenience). () Compute the histogram, over the cell, of the frequency of each "number" occurring (.e., each combination of which pixels are smaller and which are greater than the centre). This histogram can be seen as a 256-dimensional feature vector. (vi) Optionally normalize the histogram. (vii) Concatenate (normalized) histograms of all cells. This gives a feature vector for the entire window. © ‘The feature vector can now then be processed using some machine-learning algorithm to classify images. Such classifiers are often used for face recognition or texture analysis. (SPPU - New Syllabus w.e.f academic year 22-26) (P7-72) [Bl rech-t Pubications..A SACHIN SHAH Venture(Feature Engineering)...Pa9e n. (24g aching Leaming (SPPU - Sem 7 - Comp) DH_2.9 INTRODUCTION TO VARIOUS FEATURE SELECTION TECHNI |QUES GQ What 5 Sastre selection? Explain diferent feature selection algorithms. S + Feature Selection is the method of reducing the input variable to your model by using only relevany sta end getting rid of noise in data. + I is the process of automatically choosing relevant features for your machine learning model based o, the type of problem you are trying to solve. + Web this by including or excluding important features without changing them. ‘helps in cutting down the noise in our data and reducing the size of our input data. Benefits of Feature Selection ‘Using unnecessexy feature variables for the prediction can deteriorate the performance of a predictive saodel. Thus, feature selection helps in improving the model performance. 2) Algoritins Hic Hneer regression and logistic regression must avoid using correlated features. Using fisture sclection methods thus leads to a better fit of these models. I: is ex exellent practice io work with a minimum set of predictive modeling features as they SqvScentiy reduce the algorithm's complexity and computational costs. Festure Selection Algorithms ‘Festare selection algorithms can be classified into three major categories : (@) Filter methods (2) Wrapper Methods (8) Intrinsic Methods > @ Filter methods Filter methods for feature selection are usually pre-processing techniques that independently consider each feature in the dataset. + Er implements its model on each feature and then evaluates which can then be used to analyze its impact on a predictive model. * Such methods include information gain, entropy, consistency-based feature selection, correlatica metrix, ete. + For basic guidance, we can refer to the following table for defining correlation coefficients. _ Feature/Response Continuous Categorical Continuous Pearson's correlation LDA Categorical Anova Chi-Square (SPPU - New Syllabus w.e.{ academic year 20- z Year 22-23) (F7-72) fe) Tech-Neo Publications...A SACHIN SHAH Vent®Learning (SPPU - Sem 7 - Comp. (Feature Engineering)...Page no. (2-15) Mas + Pearson’s Correlation :It is used as a measure for quantifying linear dependence between two continuous variables X and Y. Its value varies from — 1 to + 1, Pearson's correlation is given as : = ov) RYT Ox Oy + LDA: Linear discriminant analysis is used to find a linear combination of features that characterizes or separates two or more classes (or levels) of a categorical variable. + ANOVA : ANOVA stands for Analysis of variance. It is similar to LDA except for the fact that it is operated using one or more categorical independent features and one continuous dependent feature. It provides a statistical test of whether the means of several groups are equal or not. + Chi-Square : It is a is a statistical test applied to the groups of categorical features to evaluate the likelihood of correlation or association between them using their frequeney distribution. + One thing that should be kept in mind is that filter methods do not remove multicollinearity. So, we must deal with multicollinearity of features as well before training models for your data. + The filter methods are popular for feature selection methods because of their generic behavior. + However, they are proven disadvantageous as they do not consider the nature of a predictive model and typically reduce its accuracy. > (2) Wrapper Methods © The wrapper methods aim to create a subset of features from the given dataset that results in the best performance of a predictive model. + Inother words, it tests every subset of the available variables for the model's accuracy. + There are two kinds of wrapper methods for feature selection, greedy and non-greedy. + The greedy search approach involves following a path that heads towards achieving the best results at the given time. This approach results in locally best results. An example of a greedy search method is the Recursive Feature Elimination (RFE) method. * On the other hand, the non-greedy approach involves assessing alll the previous feature subsets and can lead to a path that results in the overall best performance, Genetic Algorithms (GA) and Simulated Annealing (SA) are examples of non-greedy wrapper methods. > () Intrinsic Methods * This method combines the qualities of both the Filter and Wrapper method to create the best subset. * In these methods, the feature selection algorithm is blended as part of the learning algorithm, thus having its own built-in feature selection methods. * It means if you are using these algorithms, you don’t need to worry about using a feature selection method explicitly. * These methods are fast and easy to implement as no external algorithm is required to filter features. (SPPU- New Syllabus w.e.f academic year 22-28) (P7-72) [el rechNoo Publcatons..A SACHIN SHAH Venture(Feature Engineering)....Page no Machine Leaming (SPPU - Sem omp. * Examples of intrinsic methods for feature selection are : @_ Rule-and-Tree-based algorithms : The basic idea behi : : algorithms is to split the dataset into different sets based on a feature variable in a manner th,, bsets. Thus, a feature variable that didn’, _ results in a homogenous spread in the resulting su to.a split is automatically considered redundant by the model. models : The MARS algorithms create na, ‘These features are then added to a lines nd the mathematical structure oft, e Gi) Multivariate adaptive regression spline (MARS) feature variables from the existing ones in the dataset. model in sequence. Ifthe algorithm does not use a fow features to ereate the MARS features, thy are considered irrelevant and automatically ignored. 5 1 These models assign weights to features in a model to improve the f sents consequences that can be narrowed down ature from the predictive model's equation. (ii) Regularization model: quality. The lasso regularization method imple absolute zero, indicating you should remove the fe: YS 2.9.1 Forward Feature Selection «Forward feature selection is an iterative method in which we start with having no feature in the model, = Ineach iteration, we keep adding the feature which best improves our model till an addition of a nex variable does not improve the performance of the model, 2.9.2 Backward Feature Selection «In backward featuré selection, also called backward elimination, we start with all the features ani removes the least significant feature at each iteration which improves the performance of the model. * We repeat this until no improvement is observed on removal of features. «Below are some main steps which are used to apply backward elimination process: Step 1 : Firstly, we need to select a significance level to stay in the model. (SL=0.05) Step 2: Fit the complete model with all possible predictors/independent variables. Step 3 : Choose the predictor which has the highest P-value, such that. If P-value > SL, go to step 4. Else Finish, and Our model is ready. Step 4 : Remove that predictor. Step 5 : Rebuild and fit the model with the remaining variables Ttis essential to have an overall picture of the data, if data preprocessing is to be made successfil- Statistical description of data is useful in identifying the properties of the data and highlight" data value should be treated at noise or outliers. sav oust vet FQDDII Nowe GullaFollowing are the basic statistical description of data : 2.10.1 Measures of Central Tendency + A measure of central tendency is n number used to represent the center or middle of a set of data values. + The mean, median, mode and midrange are commonly used measures of central tendency. (i) Mean + The mean, or average, of n numbers is the sum of the numbers divided by n. + The mean is denoted by ¥ and is read as “x-bar”. + For the data set x, Xp, ...) %y the mean is ae 2 x emt tm z's Ltt + Sometimes, weights are associated with the value x; The weights reflect the importance, significance or frequency of occurrence to their respective values. The weighted arithmetic mean or the weighted average is computed as + Mean has one limitation; it is highly sensitive to outliers. Under such condition, median would be a better measure of central tendency. (ii) Median © The median of n numbers is the middle number when numbers are written in order. + Ifniseven, the median is the mean of the two middle numbers. © When we have large number of observations, tho median is expensive to compute. In such case, we can approximate the median of the entire data sot by interpolation using the formula : iS) median = Ly +|"Feqmcaan ) Width where, Ly is the lower boundary of the median interval, nis the number of values in the entire data set, © freq), is the eum of frequencies of all of the intervals that are lower than the median interval (SPPU - New Syllabus w.e.4 academic year 22-28) (P7-72) Tal roch.voo Puiieation..A SACHIN SHAM Venture

You might also like