A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
At
BACHELOR OF ENGINEERING
IN
SUBMITTED BY:
NAME: Mohan G
USN: 1VI17CS057
2. Candidate Declaration 7
3. Abstract 8
4. Acknowledgement 9
5. Chapter 1:Introduction 10
6.
8. AI, ML, DL
9. AI VS ML VS DL VS Data Science
11.
25.
32. NLP
33. Tensorflow
35.
44. CMS
46.
51.
52. Project 54
53. Introduction
58. Methodology
59. Project Design(Block, Dataflow and Usecase 57-59
diagram)
60. Implementation 60
62. Coding 65
65. Conclusion 72
This is to certify that this project report entitled “ AI , ML , Data Science &
Word Press” by Mohan G submitted in partial fulfilment of the requirements
for the internship (ASPEXX Health Solution Pvt. Ltd.), during the Eight weeks
of internship , is a bonafede record of work carried out under my guidance and
supervision. I hereby declare that the work has been carried out under my
supervision and has not been submitted elsewhere for any other purpose.
(Signature of CEO)
Ms. Shivani Mishra
(Signature of Director)
Managing Director
Vemana Institute of Technology
CANDIDATE'S DECLARATION
I Mohan G hereby declare that I have undertaken 08 weeks Training from ASPEXX
Health Solution Pvt. Ltd. during a period from Feb 1, 2021- April 1, 2021 in partial
fulfillment of requirements for the award of degree of B Tech. Computer Science &
Engineering at Vemana Institute of Technology . The work which is being presented in the
(Signature of Director)
Managing Director
ABSTRACT
During internship from ASPEXX Health Solution Pvt. Ltd. most of the
theoretical knowledge gained during the course of studies was put to test.
Various efforts and processes involved in designing of a component was studied
and understood during the internship. In our internship, I undertook projects of
AI.
Any effort becomes successful when there is the effect of synergy-the conceptthat two and
two make more than four. This report has also the effect of synergy, without prejudice to
my own contribution. I am appreciative of ASPEXXHealth Solutions Pvt. Ltd Company
and those people who cooperate with me. It is my privilege that I had the opportunity to do
an internship at ASPEXX HealthSolutions Pvt. Ltd Company, India. I would like to those
all who either directly or indirectly contributed to this project.
• At first, I express my deep gratefulness to Bajarang Prasad Mishra and ShivaniMishra. of
ASPEXX Health Solutions Pvt. Ltd who gave me the opportunity to allow me in this
organization areas.
• Shivesh (Mentor) without his effort, it would be impossible to bringthis to the light. I
would also like to express my excessive thanks to all members of ASPEXX Health
Solutions Pvt. Ltd for their excellent support and proper guidance in completing my
internship report.
I would also like to express my excessive thanks to all members of ASPEXX Health
Solutions Pvt. Ltd for their excellent support and proper guidance in completing my
internship report.
- Mohan G
CHAPTER 1
1.1 INTRODUCTION
CUREYA (registered as Aspexx Health solutions Pvt Ltd , under MCA ) is
DPIIT recognized start up , registered under ' STARTUP INDIA SCHEME '
.CUREYA collaborated with stakeholders include - World Yoga associations ,
Flag bits technologies and many more . CUREYA primarily objective is ‘
HEALTH FOR ALL’, by reducing the medical expenditure , eliminating the
information asymmetry , promoting health awareness and achieving inclusive &
holistic approach for healthcare treatments . The mission is to achieve the right
to "Health for All" and improve the healthcare indicators by dissemination of
health education that focuses on health promotion, health prevention, and self-
medication. The objective is to eliminate the information asymmetry, language
barrier, and emphasize to achieve global standards of healthcare delivery systems
based on access , equity, affordability andquality, efficiency, sustainability.
Machine learning uses algorithms to parse data, learn from that data, andmake
informed decisions based on what it has learned.
Data science is a broad field that spans the collection, management, analysis
and interpretation of large amounts of data with a wide range ofapplications.
It integrates all the terms above and summarizes or extract
insights from data (exploratory data analysis) and make predictions fromlarge
datasets (predictive analytics).
Jupyter Notebooks are a spin-off project from the IPython project, whichused
to have an IPython Notebook project itself. The name, Jupyter, comesfrom the
core supported programming languages that it supports: Julia, Python, and R.
Jupyter ships with the IPython kernel, which allows you to write your
programs in Python, but there are currently over 100 other kernels that you
can also use.
Chapter 3
1. Supervised Learning
2. Unsupervised Learning
How it works: In this algorithm, we do not have any target or outcome variable
to predict / estimate. It is used for clustering population in differentgroups, which
is widely used for segmenting customers in different groups for specific
intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.
3. Reinforcement Learning:
How it works: Using this algorithm, the machine is trained to make specific
decisions. It works this way: the machine is exposed to an environment where
it trains itself continually using trial and error. This machine learns from past
experience and tries to capture the best possible knowledgeto
make accurate business decisions. Example of Reinforcement Learning:
Markov Decision Process
1. Image Recognition
2. Speech Recognition
3. Traffic prediction
4. Product recommendations
5. Self-driving cars
For a given problem, the collection of all possible outcomes represents the
sample space or instance space.
The basic idea for creating a taxonomy of algorithms is that we divide the
instance space by using one of three ways:
Python is the fifth most important language as well as most popular language
for Machine learning and data science. The following are the features of
Python that makes it the preferred choice of language for datascience −
Python has an extensive and powerful set of packages which are ready to be
used in various domains. It also has packages like numpy,SciPy, pandas, scikit-
learn etc. which are required for machine learning and data science.
Easy prototyping:
Collaboration feature:
SAS: SAS has been the undisputed market leader in commercial analytics space.
The software offers huge array of statistical functions, has good GUI(Enterprise
Guide & Miner) for people to learn quickly and provides awesome technical
support. However, it ends up being the most expensive option and is not always
enriched with latest statistical functions.
Chapter 3
Statistics and machine learning are two very closely related fields. That
statistical methods can be used to clean and prepare data readyfor modelling.
That statistical hypothesis tests and estimation statistics can aid in model
selection and in presenting the skill and predictions from finalmodels.
7. This is caused in part by the fact that Machine Learning has adopted
many of Statistics' methods, but was never intended to replace
statistics, or even to have a statistical basis originally “Machine
learning is statistics scaled up to big data” “The short answer is thatthere is no
difference.
10. This is caused in part by the fact that Machine Learning has adopted
many of Statistics' methods, but was never intended to replace
statistics, or even to have a statistical basis originally .................. “Machine
learning is statistics scaled up to big data” “The short answer is thatthere is no
difference.
11. Statistics are used behind all the medical study. Statistic help doctors
keep track of where the baby should be in his/her mental development.
Physician's also use statistics to examine the effectiveness of
treatments. Statistics are very important for observation, analysis and
mathematical prediction models.
Feature Engineering
6. Engineering and selecting the correct features for a model will not only
significantly improve its predictive power, but will also offer the
flexibility to use less complex models that are faster to run and more
easily understood.
3. Data collection. Funnelling incoming data into a data store is the first
step of any ML workflow. The key point is that data is persisted
without undertaking any transformation at all, to allow us to have an
immutable record of the original dataset.
Data Visualization
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way
to see and understand trends, outliers, and patterns in data.
In the world of Big Data, data visualization tools and technologies are essential to analyze
massive amounts of information and make data-driven decisions.
Data visualization is another form of visual art that grabs our interest and keeps our eyes
on the message. When we see a chart, we quickly see trends and outliers. If we can see
something, we internalize it quickly. It’s storytelling with a purpose. If you’ve ever stared
at a massive spreadsheet of data and couldn’t see a trend, you know how much more
effective a visualization can be.
Charts
Tables
Graphs
Maps
Infographics
Dashboards
Area Chart
Bar Chart
Box-and-whisker Plots
Bubble Cloud
Bullet Graph
Cartogram
Circle View
Dot Distribution Map
Gantt Chart
Heat Map
Highlight Table
Histogram
Matrix
Network
Polar Area
Radial Tree
Scatter Plot (2D or 3D)
Streamgraph
Text Tables
Timeline
Treemap
Wedge Stack Graph
Word Cloud
And any mix-and-match combination in a dashboard!
Painting: Persistent brushing is useful when we want to group the points into clusters
and then proceed to use other operations, such as the tour, to compare the groups. It is
becoming common terminology to call the persistent operation painting,
Identification: which could also be called labeling or label brushing, is another plot
manipulation that can be linked. Bringing the cursor near a point or edge in a
scatterplot, or a bar in a barchart, causes a label to appear that identifies the plot
element. It is widely available in many interactive graphics, and is sometimes called
mouseover.
Scaling: maps the data onto the window, and changes in the area of the. mapping
function help us learn different things from the same plot. Scaling is commonly used
to zoom in on crowded regions of a scatterplot, and it can also be used to change the
aspect ratio of a plot, to reveal different features of the data.
Linking: connects elements selected in one plot with elements in another plot. The
simplest kind of linking, one-to-one, where both plots show different projections of the
same data, and a point in one plot corresponds to exactly one point in the other. When
using area plots, brushing any part of an area has the same effect as brushing it all and
is equivalent to selecting all cases in the corresponding category.
Data Mapping
Data mapping is a way to organize various bits of data into a manageable and easy-to-
understand system. This system matches data fields with target fields while in storage.
Simply put, not all data goes by the same organizational standards. They may refer to a
phone number in as many different ways as you can think of. Data mapping recognizes
phone numbers for what they are and puts them all in the same field rather than having
them drift around by other names.
With this technique, we're able to take the organized data and put a bigger picture
together. You can find out where most of your target audience lives, learn what sorts of
things they have in common and even figure out a few controversies that you shouldn't
touch on.Armed with this information, your business can make smarter decisions and
spend less money while spinning your products and services to your audience.
The earlier example of recognizing phone numbers has a lot to do with something
called unification and data cleaning. These processes are often powered by machine
learning, which is not to be confused with artificial intelligence.
Machine learning uses patterns and inference to offer predictions rather than perform a
single task, which is more of a subset of AI technology than anything. In the earlier
example, machine learning is used to recognize a phone number and assign it to its proper
category for organizational purposes.
Machine learning goes a step beyond just recognizing phone numbers though. The
technology can recognize errors like missing values or typos and group information from
the same source together.
That's what data cleaning and unification really means — to clean up all of the data
without any human input and present the information in its most perfect and precise form.
This process saves time and is also more effective in regard to how correct the
information will be.
The data can then be displayed in almost any way a person or company needs to see it.
For instance, geospatial data is one route machine learning can automatically take and
create without input. Geospatial data is basically translating data into a map and plotting
out physical locations and routes that your target audience takes every day. This technique
can provide a unique aid to your next advertising campaign.
Why Machine Learning Is Important to Data Mapping
Machine learning allows data mapping to be more precise. Without that technology, data
mapping would be either very rudimentary or have to be done completely manually.
Assuming we go the rudimentary route, a simple spreadsheet would be able to take
information and plug into its best guess of a proper category. Typos wouldn't be fixed,
missing values would remain missing and some information would just be scattered in
random places.
Trying to complete data mapping manually would be worse. For one, a person would
never be able to keep up with the flow of information, not to mention the backlog of
information already hiding and in need of sorting in the Internet of Things. Assuming
someone could keep up with the flow, there would still be errors as the sheer amount of
data would lead to the human being unable to notice connections like a machine could.
The use of data is an extremely important part of modern-day marketing. Knowing the
best possible place and time to reach customers will allow you to target your audience
more efficiently.Even large industries that can afford to splay their names in all possible
media outlets use data mapping to save money and appear more loyal to their customer
base.Big or small, you can use this information and get ahead of everyone else vying for
your customers' attention. The competition is dense these days, so getting ahead of the
curve and staying ahead is an art everyone is trying to perfect. Data mapping can help you
get there as early as possible.
Population Distribution
According to demographic data such as age, gender, income, education level, etc., analyze
and classify customers in different regions or communities on the map. Data can help us
figure out their lifestyle, interests and shopping habits.
Neural networks, in the world of finance, assist in the development of such process as
time-series forecasting, algorithmic trading, securities classification, credit risk modeling
and constructing proprietary indicators and price derivatives.
A neural network works similarly to the human brain’s neural network. A “neuron” in a
neural network is a mathematical function that collects and classifies information
according to a specific architecture. The network bears a strong resemblance to statistical
methods such as curve fitting and regression analysis.
A neural network contains layers of interconnected nodes. Each node is a perceptron and
is similar to a multiple linear regression. The perceptron feeds the signal produced by a
multiple linear regression into an activation function that may be nonlinear.
Hidden layers fine-tune the input weightings until the neural network’s margin of error is
minimal. It is hypothesized that hidden layers extrapolate salient features in the input data
that have predictive power regarding the outputs. This describes feature extraction, which
accomplishes a utility similar to statistical techniques such as principal component
analysis.
A neural network evaluates price data and unearths opportunities for making trade
decisions based on the data analysis. The networks can distinguish subtle nonlinear
interdependencies and patterns other methods of technical analysis cannot. According to
research, the accuracy of neural networks in making price predictions for stocks differs.
Some models predict the correct stock prices 50 to 60 percent of the time while others are
accurate in 70 percent of all instances. Some have posited that a 10 percent improvement
in efficiency is all an investor can ask for from a neural network.
There will always be data sets and task classes that a better analyzed by using previously
developed algorithms. It is not so much the algorithm that matters; it is the well-prepared
input data on the targeted indicator that ultimately determines the level of success of a
neural network.
Fuzz Logic
Fuzzy logic is based on the observation that people make decisions based on imprecise
and non-numerical information. Fuzzy models or sets are mathematical means of
representing vagueness and imprecise information (hence the term fuzzy). These models
have the capability of recognising, representing, manipulating, interpreting, and utilising
data and information that are vague and lack certainty.[5]
Fuzzy logic has been applied to many fields, from control theory to artificial intelligence.
Fuzzification is the process of assigning the numerical input of a system to fuzzy sets with
some degree of membership. This degree of membership may be anywhere within the
interval [0,1]. If it is 0 then the value does not belong to the given fuzzy set, and if it is 1
then the value completely belongs within the fuzzy set. Any value between 0 and 1
represents the degree of uncertainty that the value belongs in the set. These fuzzy sets are
typically described by words, and so by assigning the system input to fuzzy sets, we can
reason with it in a linguistically natural manner.
For example, in the image below the meanings of the expressions cold, warm, and hot are
represented by functions mapping a temperature scale. A point on that scale has three
"truth values"—one for each of the three functions. The vertical line in the image
represents a particular temperature that the three arrows (truth values) gauge. Since the
red arrow points to zero, this temperature may be interpreted as "not hot"; i.e. this
temperature has zero membership in the fuzzy set "hot". The orange arrow (pointing at
0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold".
Therefore, this temperature has 0.2 membership in the fuzzy set "warm" and 0.8
membership in the fuzzy set "cold". The degree of membership assigned for each fuzzy
set is the result of fuzzification.
Chapter 5
Natural language processing helps computers communicate with humans in their own
language and scales other language-related tasks. For example, NLP makes it possible for
computers to read text, hear speech, interpret it, measure sentiment and determine which
parts are important.
Today’s machines can analyze more language-based data than humans, without fatigue
and in a consistent, unbiased way. Considering the staggering amount of unstructured data
that’s generated every day, from medical records to social media, automation will be
critical to fully analyze text and speech data efficiently.
While supervised and unsupervised learning, and specifically deep learning, are now
widely used for modeling human language, there’s also a need for syntactic and semantic
understanding and domain expertise that are not necessarily present in these machine
learning approaches. NLP is important because it helps resolve ambiguity in language and
adds useful numeric structure to the data for many downstream applications, such
as speech recognition or text analytics.
Genetic Algorithms
Nature has always been a great source of inspiration to all mankind. Genetic Algorithms
(GAs) are search based algorithms based on the concepts of natural selection and
genetics. GAs are a subset of a much larger branch of computation known
as Evolutionary Computation.
GAs were developed by John Holland and his students and colleagues at the University
of Michigan, most notably David E. Goldberg and has since been tried on various
optimization problems with a high degree of success.
In GAs, we have a pool or a population of possible solutions to the given problem. These
solutions then undergo recombination and mutation (like in natural genetics), producing
new children, and the process is repeated over various generations. Each individual (or
candidate solution) is assigned a fitness value (based on its objective function value) and
the fitter individuals are given a higher chance to mate and yield more “fitter”
individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.
In this way we keep “evolving” better individuals or solutions over generations, till we
reach a stopping criterion.
Genetic Algorithms are sufficiently randomized in nature, but they perform much better
than random local search (in which we just try various random solutions, keeping track
of the best so far), as they exploit historical information as well.
Advantages of GAs
GAs have various advantages which have made them immensely popular. These include
−
Does not require any derivative information (which may not be available for many
real-world problems).
Is faster and more efficient as compared to the traditional methods.
Has very good parallel capabilities.
Optimizes both continuous and discrete functions and also multi-objective
problems.
Provides a list of “good” solutions and not just a single solution.
Always gets an answer to the problem, which gets better over the time.
Useful when the search space is very large and there are a large number of
parameters involved.
Limitations of GAs
Like any technique, GAs also suffer from a few limitations. These include −
GAs are not suited for all problems, especially problems which are simple and for
which derivative information is available.
Fitness value is calculated repeatedly which might be computationally expensive
for some problems.
Being stochastic, there are no guarantees on the optimality or the quality of the
solution.
If not implemented properly, the GA may not converge to the optimal solution.
Chapter 6
Natural Language Processing or NLP is a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages.
It is a discipline that focuses on the interaction between data science and human language,
and is scaling to lots of industries. Today NLP is booming thanks to the huge
improvements in the access to data and the increase in computational power, which are
allowing practitioners to achieve meaningful results in areas like healthcare, media, finance
and human resources, among others.
NLP can help you with lots of tasks and the fields of application just seem to increase on a
daily basis. Let’s mention some examples:
NLP enables the recognition and prediction of diseases based on electronic health
records and patient’s own speech. This capability is being explored in health
conditions that go from cardiovascular diseases to depression and even schizophrenia.
For example, Amazon Comprehend Medical is a service that uses NLP to extract
disease conditions, medications and treatment outcomes from patient notes, clinical
trial reports and other electronic health records.
Organizations can determine what customers are saying about a service or product by
identifying and extracting information in sources like social media. This sentiment
analysis can provide a lot of information about customers choices and their decision
drivers.
To help identifying fake news, the NLP Group at MIT developed a new system to
determine if a source is accurate or politically biased, detecting if a news source can be
trusted or not.
Amazon’s Alexa and Apple’s Siri are examples of intelligent voice driven
interfaces that use NLP to respond to vocal prompts and do everything like find a
particular shop, tell us the weather forecast, suggest the best route to the office or turn
on the lights at home.
Having an insight into what is happening and what people are talking about can be
very valuable to financial traders. NLP is being used to track news, reports,
comments about possible mergers between companies, everything can be then
incorporated into a trading algorithm to generate massive profits. Remember: buy the
rumor, sell the news.
NLP is also being used in both the search and selection phases of talent recruitment,
identifying the skills of potential hires and also spotting prospects before they become
active on the job market.
Companies like Winterlight Labs are making huge improvements in the treatment of
Alzheimer’s disease by monitoring cognitive impairment through speech and they can also
support clinical trials and studies for a wide range of central nervous system disorders.
Following a similar approach, Stanford University developed Woebot, a chatbot
therapist with the aim of helping people with anxiety and otherdisorders.
But serious controversy is around the subject. A couple of years ago Microsoft
demonstrated that by analyzing large samples of search engine queries, they could identify
internet users who were suffering from pancreatic cancer even before they have received a
diagnosis of the disease. How would users react to such diagnosis? And what would
happen if you were tested as a false positive? (meaning that you can be diagnosed with the
disease even though you don’t have it). This recalls the case of Google Flu Trends which
in 2009 was announced as being able to predict influenza but later on vanished due to its
low accuracy and inability to meet its projected rates.
NLP may be the key to an effective clinical support in the future, but there are still many
challenges to face in the short term.
The main drawbacks we face these days with NLP relate to the fact that language is very
tricky. The process of understanding and manipulating language is extremely complex, and
for this reason it is common to use different techniques to handle different challenges
before binding everything together. Programming languages like Python or R are highly
used to perform these techniques, but before diving into code lines (that will be the topic of
a different article), it’s important to understand the concepts beneath them. Let’s
summarize and explain some of the most frequently used algorithms in NLP when defining
the vocabulary of terms:
Bag of Words
Is a commonly used model that allows you to count all words in a piece of text. Basically it
creates an occurrence matrix for the sentence or document, disregarding grammar and
word order. These word frequencies or occurrences are then used as features for training a
classifier.
Tokenization
Is the process of segmenting running text into sentences and words. In essence, it’s the task
of cutting a text into pieces called tokens, and at the same time throwing away certain
characters, such as punctuation. Following our example, the result of tokenization would
be:
Pretty simple, right? Well, although it may seem quite basic in this case and also in
languages like English that separate words by a blank space (called segmented languages)
not all languages behave the same, and if you think about it, blank spaces alone are not
sufficient enough even for English to perform proper tokenizations. Splitting on blank
spaces may break up what should be considered as one token, as in the case of certain
names (e.g. San Francisco or New York) or borrowed foreign phrases (e.g. laissez faire).
Tokenization can remove punctuation too, easing the path to a proper word
segmentation but also triggering possible complications. In the case of periods that follow
abbreviation (e.g. dr.), the period following that abbreviation should be considered as part
of the same token and not be removed.
The tokenization process can be particularly problematic when dealing with biomedical
text domains which contain lots of hyphens, parentheses, and other punctuation marks.
For deeper details on tokenization, you can find a great explanation in this article.
Stop words can be safely ignored by carrying out a lookup in a pre-defined list of
keywords, freeing up database space and improving processing time.
There is no universal list of stop words. These can be pre-selected or built from scratch.
A potential approach is to begin by adopting pre-defined stop words and add words to the
list later on. Nevertheless it seems that the general trend over the past time has been to go
from the use of large standard stop word lists to the use of no lists at all.
The thing is stop words removal can wipe out relevant information and modify the context
in a given sentence. For example, if we are performing a sentiment analysis we might
throw our algorithm off track if we remove a stop word like “not”. Under these conditions,
you might select a minimal stop word list and add additional terms depending on your
specific objective.
Stemming
Refers to the process of slicing the end or the beginning of words with the intention of
removing affixes (lexical additions to the root of the word).Affixes that are attached at the
beginning of the word are called prefixes (e.g. “astro” in the word “astrobiology”) and the
ones attached at the end of the word are called suffixes (e.g. “ful” in the word
“helpful”).The problem is that affixes can create or expand new forms of the same word
called inflectional affixes
A possible approach is to consider a list of common affixes and rules (Python and R
languages have different libraries containing affixes and methods) and perform stemming
based on them, but of course this approach presents limitations. Since stemmers use
algorithmics approaches, the result of the stemming process may not be an actual word or
even change the word (and sentence) meaning. To offset this effect you can edit those
predefined methods by adding or removing affixes and rules, but you must consider that
you might be improving the performance in one area while producing a degradation in
another one. Always look at the whole picture and test your model’s performance.
So if stemming has serious limitations, why do we use it? First of all, it can be used to
correct spelling errors from the tokens. Stemmers are simple to use and run very
fast (they perform simple operations on a string), and if speed and performance are
important in the NLP model, then stemming is certainly the way to go. Remember, we use
it with the objective of improving our performance, not as a grammar exercise.
Lemmatization
Has the objective of reducing a word to its base form and grouping together different forms
of the same word. For example, verbs in past tense are changed into present (e.g. “went” is
changed to “go”) and synonyms are unified (e.g. “best” is changed to “good”), hence
standardizing words with similar meaning to their root. Although it seems closely related
to the stemming process, lemmatization uses a different approach to reach the root forms
of words.
Lemmatization resolves words to their dictionary form (known as lemma) for which it
requires detailed dictionaries in which the algorithm can look into and link words to their
corresponding lemmas.
For example, the words “running”, “runs” and “ran” are all forms of the word “run”, so
“run” is the lemma of all the previous words.
Lemmatization also takes into consideration the context of the word in order to solve other
problems like disambiguation, which means it can discriminate between identical words
that have different meanings depending on the specific context. Think about words like
“bat” (which can correspond to the animal or to the metal/wooden club used in baseball) or
“bank” (corresponding to the financial institution or to the land alongside a body of water).
By providing a part-of-speech parameter to a word ( whether it is a noun, a verb, and so
on) it’s possible to define a role for that word in the sentence and remove disambiguation.
As you might already pictured, lemmatization is a much more resource-intensive task than
performing a stemming process. At the same time, since it requires more knowledge about
the language structure than a stemming approach, it demands more computational
power than setting up or adapting a stemming algorithm.
Topic Modeling
From the universe of topic modelling techniques, Latent Dirichlet Allocation (LDA) is
probably the most commonly used. This relatively new algorithm (invented less than 20
years ago) works as an unsupervised learning method that discovers different topics
underlying a collection of documents. In unsupervised learning methods like this one,
there is no output variable to guide the learning process and data is explored by algorithms
to find patterns. To be more specific, LDA finds groups of related words by:
1. Assigning each word to a random topic, where the user defines the number of topics it
wishes to uncover. You don’t define the topics themselves (you define just the number
of topics) and the algorithm will map all documents to the topics in a way that words
in each document are mostly captured by those imaginary topics.
2. The algorithm goes through each word iteratively and reassigns the word to a topic
taking into considerations the probability that the word belongs to a topic, and the
probability that the document will be generated by a topic. These probabilities are
calculated multiple times, until the convergence of the algorithm.
Unlike other clustering algorithms like K-means that perform hard clustering (where topics
are disjointed), LDA assigns each document to a mixture of topics, which means that each
document can be described by one or more topics (e.g. Document 1 is described by 70% of
topic A, 20% of topic B and 10% of topic C) and reflect more realistic results.
Topic modeling is extremely useful for classifying texts, building recommender systems
(e.g. to recommend you books based on your past readings) or even detecting trends in
online publications.
Chapter 8
WordPress Demand:
WordPress has long been the most popular content management system (CMS). This CMS
powers millions and millions of websites. Although WordPress has had a particularly bad
track record in terms of security, in recent years many of the well-known security risks
have transmuted from the core WordPress to the numerous plugins and themes written for
the CMS.
A demand-side viewpoint was used to motivate the analysis; the basic hypothesis is that
plugins with large installation bases have been affected by multiple vulnerabilities. As the
WordPress is web publishing software you can use to create a beautiful website or blog. It
just may be the easiest and most flexible blogging and website content management
system (CMS) for beginners. It is a web publishing software you can use to create your
own website or blog. Since it was released in 2003, WordPress has become one of the most
popular web publishing platforms. And today it powers more 35% of the entire web —
everything from hobby blogs to some of the most popular websites online.
It enables you to build and manage your own full-featured website using just your web
WordPress is released under an Open-Source license -- which means you can download
and use the WordPress software anyway for FREE. But it also means that hundreds of
volunteers from all around the world are constantly working to improve the WordPress
software.
WordPress is easy to learn and use -- You don’t need to hire a web designer every
time you want to make a small change to your website. Instead, you can easily
update and create your own content… without having to learn how tocode.
plugins that enable you to change the entire look of your website, or even add
WordPress became a popular CMS for website development, it was developed for
non-tech savvy bloggers. So, most of the user-interface components are easy to use.
It has lower setup and maintenance costs -- WordPress incurs fewer setup,
There are a few stuff to know about the basics of WordPress that are necessary for building
Posts are blog content listed in a reverse chronological order. You will see posts listedon
your blog page. If you are using WordPress as a blog, then you will end up using posts for
the majority of your website’s content. You can add and edit your WordPress posts from
the ‘Posts’ menu in your dashboard. Here is how Add New Post screen looks. Due to their
reverse chronological order, your posts are meant to be timely. Older posts are archived
Categories are meant for broad grouping of your posts. Think of these as general topics or
the table of contents for your WordPress site. Categories are hierarchical which means you
Tags are meant to describe specific details of your posts. Think of these as your site’s
index words. They let you micro-categorize your content. Tags are not hierarchical.
Content Management System (CMS):
The Content Management System (CMS) is a software which stores all the data
such as text, photos, music, documents, etc. and is made available on your website.
WordPress is an open source Content Management System (CMS), which allows the
users to build dynamic websites and blogs. WordPress is the most popular blogging
system on the web and allows updating, customizing and managing the website
Features:
User Management − It allows managing the user information such as changing the
delete the user, change the password and user information. The main role of the user
manager is Authentication.
Media Management − It is the tool for managing the media files and folder, in
which you can easily upload, organize and manage the media files on your website.
Theme System − It allows modifying the site view and functionality. It includes
Extend with Plugins − Several plugins are available which provides custom
the user.
Importers − It allows importing data in the form of posts. It imports custom files,
Advantages:
CSS files can be modified according to the design as per users need.
There are many plugins and templates available for free. Users can customize the
various plugins as per their need.
Disadvantages:
Using several plugins can make the website heavy to load and run.
website.
WordPress Basics:
WordPress was primarily a tool to create a blog, rather than more traditional websites. That
hasn’t been true for a long time, though. Nowadays, thanks to changes to the core code, as
well as WordPress massive ecosystem of plugins and themes, you can create any type of
WordPress powers a huge number of business sites and blogs, it’s also the most popular
immediate note, we use WordPress! So the very site that you’re looking at right now is
WordPress Is Extensible
Even if you aren’t a developer, you can easily modify your website thanks to WordPress’
Unlike static HTML sites, WordPress themes are a set of template files written in PHP,
HTML, CSS, and JavaScript. Typically, you would need to have a decent understanding of
all these web design languages or hire a web developer to create a custom WordPress
theme.
On the other hand, WordPress page builder plugins made it super easy to create custom
page layouts using a drag & drop interface, but they were limited to layouts only. You
Until Beaver Builder, one of the best WordPress page builder plugins decided to solve this
Beaver Themer is a site builder add-on that allows you to create custom theme layouts
Beaver Themer allows you to create a custom theme, but you will still need a theme to
start with. We recommend using a light-weight theme that includes a full-width page
Chat bots can be programmed to respond the same way each time, to respond differently to messages
containing certain keywords and even to use machine learning to adapt their responses to fit the situation.
A developing number of hospitals, nursing homes, and even private centers, presently utilize online Chat
bots for human services on their sites. These bots connect with potential patients visiting the site, helping
them discover specialists, booking their appointments, and getting them access to the correct treatment.
An ML model has to be created wherein we could give any text input and on the basis of training data it
must analyze the symptoms. A Supervised Logistic Regression machine learning algorithm can be
implemented to train the model with data sets containing various diseases CSV files. The goal is to
compare outputs of various models and suggest the best model that can be used for symptoms in real-
world inputs. Data set contains CSV file having all diseases compiled together. The logistic regression
algorithm in ML allows us to process the data efficiently. The goal here is to model the underlying
structure or distribution of the data in order to learn more from the training set.
In any case, the utilization of artificial intelligence in an industry where individuals’ lives could be in
question, still starts misgivings in individuals. It brings up issues about whether the task mentioned above
ought to be assigned to human staff. This healthcare chat bot system will help hospitals to provide
healthcare support online 24 x 7, it answers deep as well as general questions. It also helps to generate
leads and automatically delivers the information of leads to sales. By asking the questions in series it
helps patients by guiding what exactly he/she is looking for.
Chat bots can be programmed to respond the same way each time, to respond differently to messages
containing certain keywords and even to use machine learning to adapt their responses to fit the situation.
A developing number of hospitals, nursing homes, and even private centers, presently utilize online Chat
bots for human services on their sites. These bots connect with potential patients visiting the site, helping
them discover specialists, booking their appointments, and getting them access to the correct treatment.
An ML model has to be created wherein we could give any text input and on the basis of training data it
must analyze the symptoms. A Supervised Logistic Regression machine learning algorithm can be
implemented to train the model with data sets containing various diseases CSV files. The goal is to
compare outputs of various models and suggest the best model that can be used for symptoms in real-
world inputs. Data set contains CSV file having all diseases compiled together. The logistic regression
algorithm in ML allows us to process the data efficiently. The goal here is to model the underlying
structure or distribution of the data in order to learn more from the training set.
In any case, the utilization of artificial intelligence in an industry where individuals’ lives could be in
question, still starts misgivings in individuals. It brings up issues about whether the task mentioned above
ought to be assigned to human staff. This healthcare chat bot system will help hospitals to provide
healthcare support online 24 x 7, it answers deep as well as general questions. It also helps to generate
leads and automatically delivers the information of leads to sales. By asking the questions in series it
helps patients by guiding what exactly he/she is looking for.
2. Project Analysis:
1. Review of Literature:
The main purpose of the scheme is to build the language gap between the user and health providers by
giving immediate replies to the Questions asked by the user. Today’s people are more likely addicted to
the internet but they are not concerned about their personal health. They avoid going to hospital for small
problems which may become a major disease in future. Establishing question answer forums is becoming
a simple way to answer those queries rather than browsing through the list of potentially relevant
documents from the web. Many of the existing systems have some limitations such as there is no instant
response given to the patients they have to wait for experts to acknowledge for a long time. Some of the
processes may charge an amount to perform live chat or telephony communication with doctors online.
The aim of this system is to replicate a person’s discussion.
2. Project Timeline:
Timeline provided was from Feb 1 2021-April 1 2021
3. Dataset Details:
Dataset contains description of different types of diseases. There are different sets of different
types of diseases. These sets consists of descriptions of a single disease with different doctors,
hospitals, etc.
A dataset has been created by recording sequences from over 133 number of diseases and
doctors and hospitals.
2.5 Methodology Used:
The Health-Care Chat Bot System should be written in Python, GUI links and a simple, accessible
network API. The system must provide a capacity for parallel operation and system design should not
introduce scalability issues with regard to the number of surface computers, tablets or displays connected
at any one time. The end system should also allow for seamless recovery, without data loss, from
individual device failure. There must be a strong audit chain with all system actions logged. While
interfaces are worth noting that this system is likely to conform to what is available. With that in mind,
the most adaptable and portable technologies should be used for the implementation. The system has
criticality in so far as it is a live system. If the system is down, then customers must not notice, or notice
that the system recovers quickly (seconds). The system must be reliable enough to run, crash and glitch
free more or less indefinitely, or facilitate error recovery strong enough such that glitches are never
revealed to its end-users.
3. Project Design:
1. Block Diagram:
2. Data Flow Diagram:
3. Use Case Diagram:
4. Sequence Diagram:
4. Implementation:
Work Division
training_dataset = pd.read_csv('Training.csv')
test_dataset = pd.read_csv('Testing.csv')
X = training_dataset.iloc[:, 0:132].values
#print(X)
y = training_dataset.iloc[:, -1].values
#print(y)
dimensionality_reduction = training_dataset.groupby(training_dataset['prognosis']).max()
#print(dimensionality_reduction)
cols = training_dataset.columns
cols = cols[:-1]
importances = classifier.feature_importances_
indices = np.argsort(importances)[::-1]
features = cols
val = node.nonzero()
#print(val)
disease = labelencoder.inverse_transform(val[0])
return disease
def tree_to_code(tree,feature_names):
tree_ = tree.tree_
#print(tree_)
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print(name + " ?")
ans = input()
ans = ans.lower()
if ans == 'yes':
val = 1
else:
val = 0
if val <= threshold:
recurse(tree_.children_left[node], depth + 1)
else:
symptoms_present.append(name)
recurse(tree_.children_right[node], depth + 1)
else:
present_disease = print_disease(tree_.value[node])
print( "You may have " + present_disease )
print()
red_cols = dimensionality_reduction.columns
symptoms_given =
red_cols[dimensionality_reduction.loc[present_disease].values[0].nonzero()]
print("symptoms present " + str(list(symptoms_present)))
print()
confidence_level = (1.0*len(symptoms_present))/len(symptoms_given)
print("confidence level is " + str(confidence_level))
print()
print('The modelsuggests:')
print()
recurse(0, 1)
tree_to_code(classifier,cols)
diseases = dimensionality_reduction.index
diseases = pd.DataFrame(diseases)
doctors = pd.DataFrame()
doctors['name'] = np.nan
doctors['link'] = np.nan
doctors['disease'] = np.nan
doctors['disease'] = diseases['prognosis']
doctors['name'] = doc_dataset['Name']
doctors['link'] = doc_dataset['Description']
record = doctors[doctors['disease'] == 'AIDS']
record['name']
record['link']
execute_bot()
4. Testing:
Without a well-thought testing effort, the project will undoubtedly fail overall and will impact
the entire operational performance of the solution. With a poorly tested solution, the support and
maintenance cost will escalate exponentially, and the reliability of the solution will be poor
Therefore, project managers need to realize that the testing effort is a necessity, not merely as anad
hoc task that is the last hurdle before deployment.
The project manager should pay specific attention to developing a complete testing plan and
schedule. At this stage, the project manager should have realized that this effort would have to be
accommodated within the project budget, as many of the testing resources will be designing,
testing, and validating the solution throughout the entire project life cycle—and this consumes
work-hours and resources.
The testing effort begins at the initial project phase (i.e. preparing test plans) and continues
throughout until the closure phase
5. Result:
1. Snapshot of Result:
Snapshot
Analysis of Result
Fig 5.1.1
It is a
Snapshot
of
working
of the
ChatBot
6. Advantage and Disadvantages of Model:
1. Advantages:
1. Omni-capable
• The chat bot converses seamlessly across multiple digital channels and retains data and
context for a seamless experience. In best cases, even passing that information to a live
agent if needed.
2. Free to Explore
• The chat bot can reach, consume, and process vast amounts of data– both structuredand
unstructured–to surface insights from any source - to gather relevant data to solve
customer issues quickly.
3. Autonomous Reasoning
• The chat bot can perform complex reasoning without human intervention. For example, a
great Service chatbot should be able to infer solutions based on relevant case histories.
4. Pre-Trained
• The chat bot is pre-trained to understand brand-specific or industry-specific knowledge
and terms. Even better, it’s pre-configured to resolve common customer requests of a
particular industry.
5. Register/Log-in
• To access this chat bot and individual needs to register and then use the registration ID to
log in to access the features.
6. User Interface
• A user friendly interface which is engaging and easy to access.
2. Disadvantages:
• Complex Interface – Chatbots are often seen to be complicated and require a lot of time
to understand user’s requirement. It is also the poor processing which is not able to filter
results in time that can annoy people.
• Inability to Understand – Due to fixed programs, chatbots can be stuck if an unsaved
query is presented in front ofthem. This can lead to customer dissatisfaction and result in
loss. It is also the multiple messaging that can be taxing for users and deteriorate the
overall experience on the website.
• Time-Consuming – Chatbots are installed with the motive to speed-up the response and
improve customer interaction. However, due to limited data-availability and time
required for self-updating, this process appears more time-taking and expensive.
Therefore, in place of attending several customers at a time, chatbots appear
confused about how to communicate with people.
• Zero decision-making – Chat bots are known for being infamous because of their
inability to make decisions. A similar situation has landed big companies like Microsoft
etc. in trouble when their chat bot went on making a racist rant. Therefore, it is critical to
ensure proper programing of your chat bot to prevent any such incident which can
hamper your brand.
• Poor Memory – Chat bots are not able to memorize the past conversation which forces
the user to type the same thing again & again. This can be cumbersome for the customer
and annoy them because of the effort required. Thus, it isimportant to be
careful while designing chat bots and make sure that the program is able to comprehend
user queries and respond accordingly.
7. Conclusion & Future Scope:
7.1 Conclusion:
Thus, we can conclude that this system giving the accurate result. As we are using large dataset which
will ensures the better performance. Thus we build up a system which is useful for people to detect the
disease by typing symptoms
8. References:
https://en.wikipedia.org/wiki/Chatbot https://en.wikipedia.org/wiki/Disease
https://data-flair.training/blogs/python-chatbot-project/
https://www.youtube.com/playlist?list=PLQVvvaa0QuDdc2k5dwtDTyT9aCja0on8j
Team Members
Semester: 7th
Semester: 7th
Branch :
College :
Semester:
Name: Mahesh YV
Semester: 7th
Name: Mohan G
Semester: 7th