0% found this document useful (0 votes)
35 views

Unit 4-Health care and Deep Learninh

The document discusses the application of deep learning in healthcare, highlighting its advantages over traditional machine learning, particularly in handling high-dimensional data and feature extraction. It explains the structure and functioning of deep learning networks, including various types such as CNNs and RNNs, and their relevance in tasks like image recognition and natural language processing. Additionally, it outlines the importance of different layers in neural networks and the concept of feature learning in deep learning models.

Uploaded by

saiofshridi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Unit 4-Health care and Deep Learninh

The document discusses the application of deep learning in healthcare, highlighting its advantages over traditional machine learning, particularly in handling high-dimensional data and feature extraction. It explains the structure and functioning of deep learning networks, including various types such as CNNs and RNNs, and their relevance in tasks like image recognition and natural language processing. Additionally, it outlines the importance of different layers in neural networks and the concept of feature learning in deep learning models.

Uploaded by

saiofshridi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

UNIT IV

HEALTHCARE AND DEEP LEARNING

Introduction on Deep Learning – DFF network CNN- RNN for Sequences – Biomedical Image
and Signal Analysis – Natural Language Processing and Data Mining for Clinical Data – Mobile
Imaging and Analytics – Clinical Decision Support System.

Drawbacks of the Machine Learning


➢ Traditional ML algorithms are not useful while working with high dimensional data, that
is where we have a large number of inputs and outputs. For example, in case of handwriting
recognition we have large amount of input where we will have different type of inputs
associated with different type of handwriting.
➢ Second major challenge is to tell the computer what are the features it should look
for that will play an important role in predicting the outcome as well as to achieve better
accuracy while doing so. This very process is referred as feature extraction.
➢ Feeding raw data to the algorithm rarely ever works and this is the reason why feature
extraction is a critical part of the traditional machine learning workflow.
➢ Therefore, without feature extraction, the challenge for the programmer increases as the

effectiveness of algorithm very much depends on how insightful the programmer is.

➢ Hence, it is very difficult to apply these Machine Learning models or algorithms to


complex problems like object recognition, handwriting recognition, NLP (Natural
Language Processing), etc.

What is deep learning?

Deep learning attempts to mimic the human brain—albeit far from matching its ability— enabling
systems to cluster data and make predictions with incredible accuracy.

Deep learning is a subset of machine learning, which is essentially a neural network with three or
more layers. These neural networks attempt to simulate the behaviour of the human brain—albeit
far from matching its ability—allowing it to “learn” from large amounts of data.

Deep learning models are capable of learning to focus on the right features by themselves,
requiring little guidance from the programmer.
Basically, deep learning mimics the way our brain functions i.e. it learns from experience. As
you know, our brain is made up of billions of neurons that allows us to do amazing things. Even
the brain of a one year old kid can solve complex problems which are very difficult to solve even
using super-computers.Recognize the face of their parents and different objects as well.
Discriminate different voices and can even recognize a particular person based on his/her voice.
Draw inference from facial gestures of other persons and many more.

How deep learning mimics the functionality of a brain?

Deep learning uses the concept of artificial neurons that functions in a similar manner as the
biological neurons present in our brain. Therefore, we can say that Deep Learning is a subfield
of machine learning concerned with algorithms inspired by the structure and function of the brain
called artificial neural networks. Now, let us take an example to understand it. Suppose we want
to make a system that can recognize faces of different people in an image. If we solve this as a
typical machine learning problem, we will define facial features such as eyes, nose, ears etc.
and then, the system will identify which features are more important for which person on its own.

Now, deep learning takes this one step ahead. Deep learning automatically finds out the features
which are important for classification because of deep neural networks, whereas in case of
Machine Learning we had to manually define these features.

How Deep Learning works?

The inspiration for deep learning is the way that the human brain filters the information. Its main
motive is to simulate human-like decision making. Neurons in the brain pass the signals to perform
the actions. Similarly, artificial neurons connect in a neural network to perform tasks clustering,
classification, or regression. The neural network sorts the unlabeled data according to the
similarities of the data. That’s the idea behind a deep learning algorithm.

Neurons are grouped into three different types of layers:

a) Input layer

b) Hidden layer
c) Output layer

Input Layer

• It receives the input data from the observation. This information breaks
into numbers and the bits of binary data that a computer can understand.
Variables need to be either standardized or normalized to be within the
same range.

Hidden Layer

• It performs mathematical computations on input


data. To decide the number of hidden layers and
the number of neurons in each layer is
challenging. It does the non-linear processing
units for feature extraction and transformation.
Each following layer utilizes the output of the
preceding layer as input. It forms the hierarchy
concepts from the learning. In the hierarchy, each
level grasps to transform the input data into a
more and more abstract and composite
representation.

• The “deep” in Deep Learning refers to have more than one hidden layer.
Output Layer:

The output layer returns the output data

Weight:

The connection between neurons is called weight, which is the numerical values. The weight
between neurons determines the learning ability of the neural network. During the learning of
artificial neural networks, weight between the neuron changes. Initial weights are set randomly.

Transfer Function

The transfer function translates the input signals to output signals. Four types of transfer
functions are commonly used, Unit step (threshold), sigmoid, piecewise linear, and Gaussian.

Unit step (threshold)

The output is set at one of two levels, depending on whether


the total input is greater than or less than some threshold
value.

Sigmoid

The sigmoid function consists of 2 functions, logistic and


tangential. The values of logistic function range from 0 and
1 and -1 to +1 for tangential function.

Piecewise Linear
Gaussian

Gaussian functions are bell-shaped curves that are continuous. The node output (high/low) is
interpreted in terms of class membership (1/0), depending on how close the net input is to a chosen
value of average.

Linear

Like a linear regression, a linear activation function transforms the weighted sum inputs of the
neuron to an output using a linear fnction.
Activation Function

Activation function decides, whether a neuron should be activated or not by calculating weighted
sum and further adding bias with it.

Activation function - Hidden layer i.e. layer 1 :-

z(1) = W(1)X + b(1)


a(1) = z(1)

Here, z(1) is the vectorized output of layer 1.W(1) be the vectorized weights assigned to neurons
of hidden layer i.e. w1, w2, w3 and w4. X be the vectorized input features i.e. i1 and i2. b is the
vectorized bias assigned to neurons in hidden layer i.e. b1 and b2. a(1) is the vectorized form of
any linear function.

Layer 2 i.e. output layer :-


• // Note : Input for layer

• // 2 is output from layer 1

• z(2) = W(2)a(1) + b(2)

• a(2) = z(2)

Calculation at Output layer:

• // Putting value of z(1) here


• z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
• z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
• Let,
• [W(2) * W(1)] = W]
• [W(2)*b(1) + b(2)] = b]
• Final output : z(2) = W*X + b
• Which is again a linear function

Types of Deep Learning Networks


➢ Feedforward neural network
➢ Radial basis function neural networks
➢ Multi-layer perceptron
➢ Convolution neural network
➢ Recurrent neural network
➢ Modular neural network
➢ Sequence to sequence models
Feedforward neural network
This type of neural network is the very basic neural network where the flow control occurs from
the input layer and goes towards the output layer.These kinds of networks are only having single
layers or only 1 hidden layer.Since the data moves only in 1 direction there is no backpropagation
technique in this network.In this network, the sum of the weights present in the input is fed into
the input layer. These kinds of networks are used in the facial recognition algorithm using
computer vision.
Radial basis function neural networks
This kind of neural network have generally more than 1 layer preferably two layers
In this kind of network, the relative distance from any point to the center is calculated and the
same is passed towards the next layer
Radial basis networks are generally used in power restoration systems to restore the power in the
shortest span of time to avoid blackouts.
Multi-layer perceptron
This type of network are having more than 3 layers and its used to classify the data which is not
linear.These kinds of networks are fully connected with every node.These networks are
extensively used for speech recognition and other machine learning technologies.

Convolution neural network (CNN)


CNN is one of the variations of the multilayer perceptron. CNN can contain more than 1
convolution layer and since it contains a convolution layer the network is very deep with fewer
parameters. CNN is very effective for image recognition and identifying different image patterns.
Recurrent neural network
RNN is a type of neural network where the output of a particular neuron is fed back as an input
to the same node. This method helps the network to predict the output. This kind of network is
useful in maintaining a small state of memory which is very useful for developing the chatbot.
This kind of network is used in chatbot development and text-to-speech technologies.

Modular neural network


This kind of network is not a single network but a combination of multiple small neuralnetworks.
All the sub-networks make a big neural network and all of them work independently to achieve a
common target. These networks are very helpful in breaking the small-large problem into small
pieces and then solving it.
Sequence to sequence models
This type of network is generally a combination of two RNN networks.
The network works on the encoding and decoding that is it consists of the encoder which is used
to process the input and there is a decoder which processes the output
Generally, this kind of network is used for text processing where the length of the inputted text is
not as same as outputted text.

DFF network CNN


Deep feedforward networks
Deep feedforward networks, also often called feedforward neural networks, or multilayer
perceptrons (MLPs), are the quintessential deep learning models. The goal of a feedforward
network is to approximate some function f*.
[Only for reference: a function approximation problem asks us to select a function among a well-
defined class that closely matches a target function in a task-specific way.]
For example, for a classifier, y = f *(x) maps an input x to a category y. A feedforward network
defines a mapping y= f (x; θ) and learns the value of the parameters θ that result in the best function
approximation.
Flow of Information
These models are called feedforward because information flows through the
function being evaluated from x, through the intermediate computations used to
define f, and finally to the output y. There are no feedback connections in which
outputs of the model are fed back into itself. When feedforward neural networks
are extended to include feedback connections, they are called recurrent neural
networks.
Example: US Election
Importance of Feedforward Networks:
Form basis for many commercial applications
1. CNNs are a special kind of feedforward networks.They are used for recognizing objects
from photos
2. They are a conceptual stepping stones to RNNs
3. RNNs power many NLP applications
Feedforward Neural Network Structures
Feedforward neural networks are called networks because they are typically rep- resented by
composing together many different functions. The model is associated with a directed acyclic
graph describing how the functions are composed together. For example, we might have three
functions f(1) , f (2) , and f (3) connected in a chain, to form f(x ) = f (3) (f (2) (f (1)(x))). These chain
structures are the most commonly used structures of neural networks. In this case, f (1) is called
the first layer of the network, f (2) is called the second layer, and so on.
Definition of Depth
The overall length of the chain gives the depth of the model. – Ex: the composite function f(x ) =
f (3) (f (2) (f (1)(x))),has depth of 3 . It is from this terminology that the name “deep learning” arises.
The final layer of a feedforward network ex f (3) is called the output layer.
Training the Network
In network training we drive f (x) to match f* (x). Training data provides us with noisy,
approximate examples of f* (x) evaluated at different training points. Each example accompanied
by label y ≈ f*(x). Training examples specify directly what the output layer mustdo at each point
x – It must produce a value that is close to y.
Definition of Hidden Layer:
Hidden layers perform various types of mathematical computation on the input data and
recognize the patterns that are part of. Behavior of Hidden layers is not directly specified by
the data. Learning algorithm must decide how to use those layers to produce value that is close to
y . Training data does not say what individual layers should do . Since the desired output for
these layers is not shown, they are called hidden layers.
A net with depth 2: one hidden layer
Width of Model
Each hidden layer is typically vector-valued. Dimensionality of hidden layer vector is width of
the model.
Units of a model
Each element of vector viewed as a neuron – Instead of thinking of it as a vector-vector function,
they are regarded as units in parallel. Each unit receives inputs from many other units and computes
its own activation value
Depth versus Width
Going deeper makes network more expressive – It can capture variations of the data better. –
Yields expressiveness more efficiently than width . Tradeoff for more expressiveness is increased
tendency to overfit – You will need more data or additional regularization. network should be as
deep as training data allows. – But you can only determine a suitable depth by experiment. Also
computation increases with no. of layers.
Convolutional Neural Network:
In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep neural
networks most commonly applied to analyze visual imagery. Now when we think of a neural
network we think about matrix multiplications but that is not the case with ConvNet. It uses a
special technique called Convolution. Now in mathematics convolution is a mathematical
operation on two functions that produces a third function that expresses how the shape of
one is modified by the other.
➢ A convolutional neural network (CNN or ConvNet), is a network architecture for deep
learning which learns directly from data, eliminating the need for manual feature
extraction.
➢ CNNs are particularly useful for finding patterns in images to recognize objects, faces,
and scenes. They can also be quite effective for classifying non-image data such as audio,
time series, and signal data.
➢ Applications that call for object recognition and computer vision— such as self driving
vehicles and face-recognition applications — rely heavily on CNNs
Important Factors:
➢ CNNs eliminate the need for manual feature extraction—the features are learned directly
by the CNN.
➢ CNNs produce highly accurate recognition results.
➢ CNNs can be retrained for new recognition tasks, enabling you to build on pre-existing
networks.
➢ Deep learning workflow. Images are passed to the CNN, which automatically learns
features and classifies objects.
Applications:
• Medical Imaging: CNNs can examine thousands of pathology reports to visually detect the
presence or absence of cancer cells in images.
• Audio Processing: Keyword detection can be used in any device with a microphone to
detect when a certain word or phrase is spoken - (‘Hey Siri!’). CNNs can accurately learn
and detect the keyword while ignoring all other phrases regardless of the environment.
• Stop Sign Detection: Automated driving relies on CNNs to accurately detect the presence
of a sign or other object and make decisions based on the output.
• Synthetic Data Generation: Using Generative Adversarial Networks (GANs), new images
can be produced for use in deep learning applications including face recognition and
automated driving.
How CNNs Work:
• A convolutional neural network can have tens or hundreds of layers that each learn to
detect different features of an image.
• Filters are applied to each training image at different resolutions, and the output of each
convolved image is used as the input to the next layer.
• The filters can start as very simple features, such as brightness and edges, and increase in
complexity to features that uniquely define the object
Feature Learning, Layers, and Classification:
• Three of the most common layers are: convolution, activation or ReLU, and pooling.
Convolution :
• Convolution puts the input images through a set of convolutional filters, each of which
activates certain features from the images.
• Convolution is a specialized type of linear operation used for feature extraction, where a
small array of numbers, called a kernel, is applied across the input, which is an array of
numbers, called a tensor. An element-wise product between each element of the kernel and
the input tensor is calculated at each location of the tensor and summed to obtain the output
value in the corresponding position of the output tensor, called a feature map.
• This procedure is repeated applying multiple kernels to form an arbitrary number of feature
maps, which represent different characteristics of the input tensors; different kernels can,
thus, be considered as different feature extractors
• Two key hyperparameters that define the convolution operation are size and number of
kernels. The former is typically 3 × 3, but sometimes 5 × 5 or 7 × 7. The latter is arbitrary,
and determines the depth of output feature maps
• However, there are three hyperparameters which affect the volume size of the output that
need to be set before the training of the neural network begins. These include:
• 1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three.
• 2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
• 3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the last convolution is
dropped if dimensions do not align.
• Same padding: This padding ensures that the output layer has the same size as the input
layer
• Full padding: This type of padding increases the size of the output by adding zeros to the
border of the input.
Pooling layer:
• Pooling layers, also known as downsampling, conducts dimensionality reduction, reducing
the number of parameters in the input.
• Similar to the convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights.
• Instead, the kernel applies an aggregation function to the values within the receptive field,
populating the output array. There are two main types of pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the maximum
value to send to the output array. As an aside, this approach tends to be used more often
compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the average value
within the receptive field to send to the output array.
• While a lot of information is lost in the pooling layer, it also has a number of benefits to
the CNN. They help to reduce complexity, improve efficiency, and limit risk of overfitting.
Fully-Connected Layer:
• The name of the full-connected layer aptly describes itself. As mentioned earlier, the
pixel values of the input image are not directly connected to the output layer in partially
connected layers. However, in the fully-connected layer, each node in the output layer
connects directly to a node in the previous layer.
• This layer performs the task of classification based on the features extracted through the
previous layers and their different filters. While convolutional and pooling layers tend to
use ReLu functions, FC layers usually leverage a softmax activation function to classify
inputs appropriately, producing a probability from 0 to 1.
• Rectified linear unit (ReLU) allows for faster and more effective training by mapping
negative values to zero and maintaining positive values. This is sometimes referred to
as activation, because only the activated features are carried forward into the next layer.

RNN for Sequences


Sequence models:
• Sequence models are the machine learning models that input or output sequences of data.
Sequential data includes text streams, audio clips, video clips, time-series data and etc.
Recurrent Neural Networks (RNNs) is a popular algorithm used in sequence models.
• Applications of Sequence Models:
Used for speech recognition, voice recognition, time series prediction, and natural
language processing.
Why sequentially matters?
• There are a lot of real-life scenarios like image processing, voice recognition, language
translations in which sequence matters. For example, If I write “are you how ?” will it
make sense? no right because our brain is trained to process this sentence in sequence.
• That is because we have trained our brain with this sequenced information and some
change in their order would make it hogwash. Similarly, these tasks need a model that
considers time, traditional models like SVM, logistic regression, or Neural Networks like
FFN are not capable of doing these tasks. While talking about AI/ML, the primary
conception of Artificial Intelligence is a machine that can engage with a human in a way
similar to other humans.
• Artificial Intelligence is the ability of a machine to convincingly engage in dialogue
( what we will call an AI-based advanced chatbot ), this will be only done when computers
will be able to process the time-dependent data in the same way as the human mind does.
Recurrent Neural Networks:
• RNN is the multiple ANNs chained so as to keep a track of previous outputs, unlike normal
ANN. The output of current timestep acts as an input to the next timestep. Predictions have
to be made based on the past inputs where there is a need to memorize the previous inputs.
Hence, RNN has “Hidden states” which act as a memory for what all information is
computed.
• where x̅ and ȳ represent the input and the output respectively, s represents the states which
is generally previous input, and Wx, Ws, Wy represents the weights for input, hidden and
output layers respectively.
RNN Folded Model

RNN unfolded model

• In FFNN we obtain the input for the hidden layer by applying the activation function, for
this, we only need the input vector and the weights matrix.

• RNNs also use activation functions only with a small change:


• The hidden layer’s input is calculated using the sum of the product of input and state
vectors with their respective weights matrix
• output is calculated the same in both FFNNs and RNNs using the following formula:

• The unfolded architecture of the RNNs can be altered as per the requirement, say if you
want to do the sentiment classification task we can have multiple inputs and single output

• while in the case of language generation models we need to have multiple inputs and
multiple outputs, also RNNs can be stacked together for some special use cases.
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
➢ One to One
➢ One to Many
➢ Many to One
➢ Many to Many
One to One RNN
This type of neural network is known as the Vanilla Neural Network. It's used for general
machine learning problems, which has a single input and a single output.
One to Many RNN
This type of neural network has a single input and multiple outputs. An example of this is the
image caption.

Many to One RNN


This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a good
example of this kind of network where a given sentence can be classified as expressing positive or
negative sentiments.
Many to Many RNN
This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is
one of the examples.

Training Recurrent Neural Networks


• Training RNNs is considered to be difficult, in order to preserve long-range dependencies
it often meets one of the problems called
• Exploding Gradients ( weights become too large that over-fits the model ) or
• Vanishing Gradients ( weights become too small that under-fits the model ).
• The occurrence of these two problems depends on the activation functions used in the
hidden layer.
• with the sigmoid activation function vanishing gradient problem sounds reasonable while
with rectified linear unit exploding gradient make sense.
• For these problems, a concept called regularisation is used which helps to tackle both
vanishing and exploding gradient.
• RNNs can be easily trained using some Deep Learning libraries like Tensorflow, PyTorch,
Theano, etc. The only important thing here is if you want to run RNNs, GPUs are needed
since they are deeper networks for smaller networks you can make use of online GPU
enabled notebooks like Google Colab, Kaggle Kernels, etc.
• As an extension to RNNs, LSTMs ( Long Short Term Memory ) and BRNNs
( Bidirectional Recurrent Neural Networks ) were proposed
Advantages of an RNN
• It can model non-linear temporal/sequential relationships.
No need to specify lags to predict the next value in comparison to and autoregressive
process.
Disadvantages of an RNN
• Vanishing Gradient Problem
• Not suited for predicting long horizons
NATURAL LANGUAGE PROCESSING AND DATA MINING IN CLINICAL TEXT

NATURAL LANGUAGE PROCESSING:

Natural language processing (NLP) is the ability of a computer program to understand human
language as it is spoken and written -- referred to as natural language. It is a component of artificial
intelligence (AI).

WHY WE ARE USING NLP IN CLINICAL TEXT?

Electronic health records (EHR) of patients are major sources of clinical information that
arecritical to improvement of health care processes. Automated approach for retrieving
informationfrom these records is highly challenging due to the complexity involved in converting
clinical textthat is available in free-text to a structured format. Natural language processing (NLP)
and datamining techniques are capable of processing a large volume of clinicaltext (textual patient
reports)

to automatically encode clinical information in a timely manner.

GENERAL WORKFLOW OF A NLP SYSTEM:

The input for a NLP system is the unstructured natural text that is extracted from patient’s
medical record and send it to report analyser.

Report Analyzer:

The clinical text differs from the biomedical text with the possible use of pseudotables, i.e.,natural
text formatted to appear as tables, medical abbreviations, and punctuation in addition to the natural
language. The text is normally dictated and transcribed to a person or speech recognition software
and is usually available in free-text format. Some clinical texts are even available in the image or
graph format which are in unstructured format.

As a result, NLP processing techniques are applied to convert the unstructured free-text into a
structured format.
The first and foremost task of report analyzer is to preprocess the clinical input text by applying
NLP methodologies. The major preprocessing tasks in a clinical NLP include text segmentation,
text irregularities handling, domain specific abbreviation, and missing punctuation

Text Analyzer

Text analyzer is the most important module in clinical text processing that extracts the clinical
information from free-text and makes it compatible for database storage.The syntactic andsemantic
interpreter component of the text analyzer generates a deeper structure such as constituent or
dependency tree structures to capture the clinical information present in the text. The conversion
rules or ML algorithms encode the clinical information from the deep tree structures. An advantage
of the rule-based approach is that the predefined patterns are expert- curated and are highly
specific. The database handler and inference rules component generates a processed form of data
from the database storage

CORE COMPONENTS OF NLP


Due to the complex nature of the clinical text, the analysis is carried out in many phases such as
morphological analysis, lexical analysis, syntactic analysis, semantic analysis, and data encoding

MORPHOLOGICAL ANALYSIS:

It is a word level analysis.

It contain four steps

➢ Tokenization - it extracts word from a given text

➢ Stop word Remove - it remove unwanted word like punctuations and articels etc.

➢ Stemming – It is the process of reducing word into its base forms. example:base form of
took is take i.e,the word took is derived from take.

➢ N gram language – It is a sequence of n continuous words

o unigram:-It process one word at a time

o bigram:-It process two words at a time and so on. By this we can find the
probabilty of the word

Core NLP Components

Research in NLP for clinical domain makes the computers understand the free-form clinical text
for automatic extraction of clinical information. The general aims of clinical NLP understandings
include the theoretical investigation of human language to explore the details of language from
computer implementation point of view and the more natural man-machine communications that
aims at producing a practical automated system. Due to the complex nature of the clinical text, the
analysis is carried out in many phases such as morphological analysis, lexical analysis, syntactic
analysis, semantic analysis, and data encoding.
LEXICAL ANALYSIS:

The words or phrases in the text are mapped to the relevant linguistic information such as syntactic
information, i.e., noun, verb, adverb, etc., and semantic information i.e., disease, procedure, body
part, etc. Lexical analysis is achieved with a special dictionary called a lexicon, which provides
the necessary rules and data for carrying out the linguistic mapping. The development of
maintenance of a lexicon requires extensive knowledge engineering and effort to develop and
maintain. The National Library of Medicine (NLM) maintains the Specialist Lexicon with
comprehensive syntactic information associated with both medical and English terms.
Semantic Analysis

It is used to check whether the sentence is meaningful or not. It find some importent tokens and
find its base words. It find parts of speech of each word (It is done in lexical analysis). It need to
check, the two words come together in a sentence does they make a sense. It is done by mapping
syntactic structure and objects in a domain.

It determines the words or phrases in the text that are clinically relevant, and extracts their semantic
relations. The natural language semantics consists of two major features:

➢ The representation of the meanings of a sentence, which can allow the possible
manipulations (particularly inference)

➢ Relating these representations to the part of the linguistic model that deals with the
structure (grammar or syntax).

The semantic analysis uses the semantic model of the domain or ontology to structure and
encodes the information from the clinical text. The semantic model is either frame oriented or
conceptual graphs. The generated structured output of the semantic analysis is subsequently
used by other automated processes.
SYNTATIC ANALYSIS:

The word “syntax” refers to the study of formal relationships between words in the text. The
grammatical knowledge and parsing techniques are the major key elements to perform syntactic
analysis. The context free grammar (CFG) is the most common grammar used for syntactic
analysis. CFG is also known by various other terms including phrase structure grammar (PSG) and
definite clause grammar (DCG). The syntactic analysis is done by using two basic parsing
techniques called top-down parsing and bottom-up parsing to assign POS tags (e.g., noun, verb,
adjective, etc.) to the sequence of tokens that form a sentence and to determine the structure of the
sentence through parsing tools.
DATA ENCODING:

The process of mining information from EHR requires coding of data that is achieved either
manually or by using NLP techniques to map free-text entries with an appropriate code. The coded
data is classified and standardized for storage and retrieval purposes in clinical research. Manual
coding is normally facilitated with search engines or pick-up list.

Data Mining in Healthcare

The use of data mining in healthcare is being adopted by organizations with a focus on optimizing
the efficiency and quality of their predictive analytics.

In the healthcare industry specifically, data mining can be used to decrease costs by increasing
efficiencies, improve patient quality of life, and perhaps most importantly, save the lives of more
patients.

DATA MINING IN CLINICAL TEXT:

Text mining in clinical domain is usually more difficult than general domains (e.g. newswire
reports and scientific literature) because of the high level of noise in both the corpus and training
data for machine learning (ML). Healthcare systems and specifically health record systems contain
both structured and unstructured information as text.

It is a subfield of biomedical NLP to determine classes of information found in clinical text that
are useful for basic biological scientists and clinicians for providing better health care.

More specifically, it is estimated that over 40% of the data in healthcare record systems contains
text, so-called clinical text, sometimes also called electronic patient record text.

Clinical text contains valuable information about symptoms, diagnoses, treatments, drug use and
adverse (drug) events for the patient that can be utilized to improve healthcare for other patients.

However, clinical text also contains sensitive information such as personal names, telephone
numbers and addresses of the patient and relatives. This information needs to be pseudonymized
before the clinical text can be utilized for secondary use.

Text mining and data mining techniques to uncover the information on health, disease, and
treatment response support the electronically stored details of patients’ health records. A
significant chunk of information in HER and CDA are text and extraction of such information by
conventional data mining methods is not possible. The semi-structured and unstructured data in
the clinical text and even certain categories of test results such as echocardiograms and radiology
reports can be mined for information by utilizing both data mining and text mining techniques.

Information extraction

Information extraction (IE) is a specialized field of NLP for extracting predefined types of
information from the natural text. It is defined as the process of discovering and extracting
knowledge from the unstructured text.

IE differs from information retrieval (IR) that is meant to be for identifying and retrieving relevant
documents. In general, IR returns documents and IE returns information or facts.

A typical IE system for the clinical domain is a combination of components such as tokenizer,
sentence boundary detector, POS tagger, morphological analyzer, shallow parser, deep parser
(optional), gazetteer, named entity recognizer, discourse module, template extractor, and template
combiner.

A careful modeling of relevant attributes with templates is required for the performance of high
level components such as discourse module, template extractor, and template combiner. The high
level components always depend on the performance of the low level modules such as POS tagger,
named entity recognizer, etc.

IE for clinical domain is meant for the extraction of information present in the clinical text. The
Linguistic String Project–Medical Language Processor (LSP–MLP), and Medical Language
Extraction and Encoding system (MedLEE) are the commonly adopted systems to extract
UMLS concepts from clinical text.
Preprocessing

The primary source of information in the clinical domain is the clinical text written in natural
language. However, the rich contents of the clinical text are not immediately accessible by the
clinical application systems that require input in a more structured form. An initial module adopted
by various clinical NLP systems to extract information is the preliminary preprocessing of the
unstructured text to make it available for further processing. The most commonly used
preprocessing techniques in clinical NLP are spell checking, word sense disambiguation, POS
tagging, and shallow and deep parsing.

Spell Checking

The misspelling in clinical text is reported to be much higher than any other types of texts. In
addition to the traditional spell checker, various research groups have come out with a variety of
methods for spell checking in the clinical domain: UMLS-based spell-checking error correction
tool and morpho-syntactic disambiguation tools.

Word Sense Disambiguation

The process of understanding the sense of the word in a specific context is termed as word sense
disambiguation. The supervised ML classifiers and the unsupervised approaches automatically
perform the word sense disambiguation for biomedical terms.

POS Tagging

An important preprocessing step adapted by most of the NLP systems is POS tagging that reads
the text and assigns the parts of speech tag to each word or token of the text. POS tagging is the
annotation of words in the text to their appropriate POS tags by considering the related and
adjacent words in a phrase, sentence, and paragraph. POS tagging is the first step in syntactic
analysis and finds its application in IR, IE, word sense disambiguation, etc. POS tags are a set of
word categories based on the role that words may play in the sentence in which they appear. The
most common set contains seven different tags: Article, Noun, Verb, Adjective, Preposition,
Number, and Proper Noun.

Shallow and Deep Parsing

Parsing is the process of determining the complete syntactic structure of a sentence or a string of
symbols in a language. Parser is a tool that converts an input sentence into an abstract syntax tree
such as the constituent tree and dependency tree, whose leafs correspond to the words of the given
sentence and the internal nodes represent the grammatical tags such as noun, verb, noun phrase,
verb phrase, etc. Most of the parsers apply ML approaches such as PCFGs (probabilistic context-
free grammars) as in the Stanford lexical parser [50] and even maximum entropy and neural
network.

Few parsers even use lexical statistics by considering the words and their POS tags. Such taggers
are well known for overfitting problems that require additional smoothing. An alternative to the
overfitting problem is to apply shallow parsing, which splits the text into nonoverlapping word
sequences or phrases, such that syntactically related words are grouped together. The word phrase
represents the predefined grammatical tags such as noun phrase, verb phrase,
prepositional phrase, adverb phrase, subordinated clause, adjective phrase, conjunction phrase, and
list marker. The benefits of shallow parsing are the speed and robustness of processing. Parsing is
generally useful as a preprocessing step in extracting information from the natural text.

Context-Based Extraction

The fundamental step for a clinical NLP system is the recognition of medical words and phrases
because these terms represent the concepts specific to the domain of study and make it possible
to understand the relations between the identified concepts. Even highly sophisticated systems of
clinical NLP include the initial processing of recognizing medical words and phrases prior to the
extraction of information of interest. While IE from the medical and clinical text can be carried
out in many ways, this section explains the five main modules of IE.

Concept Extraction

Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes
a basic enabling technology to unlock the knowledge within and support moreadvanced reasoning
applications such as diagnosis explanation, disease progression modeling, and intelligent analysis
of the effectiveness of treatment. The first and foremost module inclinical NLP following the
initial text preprocessing phase is the identification of the boundaries of the medical terms/phrases
and understanding the meaning by mapping the identified term/phrase to a unique concept
identifier in an appropriate ontology. The recognition of clinical entities can be achieved by a
dictionary-based method using the UMLS Metathesaurus, rule- based approaches, statistical
method, and hybrid approaches. The identification and extraction ofentities present in the clinical
text largely depends on the understanding of the context. For example, the recognition of diagnosis
and treatment procedures in the clinical text requires the recognition and understanding of the
clinical condition as well as the determination of itspresence or absence. The contextual features
related to clinical NLP are negation (absence of a clinical condition), historicity (the condition had
occurred in the recent past and might occur in the future), and experiencer (the condition related
to the patient). While many algorithms are available for context identification and extraction, it is
recommended to detect the degree of certainty in the context.

Association Extraction

Clinical text is the rich source of information on patients conditions and their treatments with
additional information on potential medication allergies, side effects, and even adverse effects.
Information contained in clinical records is of value for both clinical practice and research;
however, text mining from clinical records, particularly from narrative-style fields (such as
discharge summaries and progress reports), has proven to be an elusive target for clinical Natural
Language Processing (clinical NLP), due in part to the lack of availability of annotated corpora
specific to the task. Yet, the extraction of concepts (such as mentions of problems, treatments, and
tests) and the association between them from clinical narratives constitutes the basic
enabling technology that will unlock the knowledge contained in them and drive more advanced
reasoning applications such as diagnosis explanation, disease progression modeling, and
intelligent analysis of the effectiveness of treatment.

Negation

“Negation” is an important context that plays a critical role in extracting information from the
clinical text. Many NLP systems incorporate a separate module for negation analysis in text
preprocessing. However, the importance of negation identification has gained much of its interest
among the NLP research community in recent years. As a result, explicit negation detection
systems such as NegExpander, Negfinder, and a specific system for extracting SNOMED-CT
concepts as well as negation identification algorithms such as NegEx that uses regular expression
for identifying negation and a hybrid approach based on regular expressions and grammatical
parsing are developed by a few of the dedicated research community. While the NegExpander
program identifies the negation terms and then expands to the related concepts, Negfinder is a
more complex system that uses indexed concepts from UMLS and regular expressions along
with a parser using LALR (look-ahead left-recursive) grammar to identify the negations.

Extracting Codes

Extracting codes is a popular approach that uses NLP techniques to extract the codes mapped to
controlled sources from clinical text. The most common codes dealing with diagnoses are the
International Classification of Diseases (ICD) versions 9 and 10 codes. The ICD is designed to
promote international comparability in the collection, processing, classification and presentation
of mortality statistics.
Preprocessing of texts such as tokenisation and text segmentation.

Word processing such as :

Morphological processing:- Pre-processing technique based on morphological operations for four


different imaging modalities namely MRI, CT, mammogram and ultrasound images have been
discussed. In pre-processing step after removal of noise, cleaning of images is done by dilating,
eroding, opening and closing operations. Top hat transforms extract small elements anddetails in
the image. Even though morphology is a very old technique, it still finds its application in all
medical images in one way or the other. This morphological processing can also be extended to
use in medical image feature selection and segmentation

•Lemmatisation: Lemmatisation (or lemmatization) in linguistics is the process of grouping


together the inflected forms of a word so they can be analysed as a single item, identified by the
word's lemma.

•Stemming: Stemming is a natural language processing technique that lowers inflection in words
to their root forms, hence aiding in the preprocessing of text, words, and documents for text
normalization.

•Compound splitting: Dealing with word compounding in statistical machine translation (SMT)
is essential to mitigate the sparse data problems that productive word generation causes. There
are several issues that need to be addressed: splitting compound words into their correct
components (i.e. disambiguating between split points), deciding whether to split a compound word
at all, and, if translating into a compounding language, merging components into a compound
word

• Abbreviation detection: Detection of abbreviations is also a major subproblem and task of


sentence segmentation and tokenization processes in general, i.e.: disambiguate sentence endings
from punctuation attached to abbrevations. Statistical methods (NLP) have been applied to detect
and extract them successfully, mostly in a (semi-)supervised manner.

Generally, the same building blocks used for regular texts can also be utilised for clinical text
processing. However, clinical texts contain more noise in the form of incomplete sentences,
misspelled words and non-standard abbreviations that can make the natural language processing
cumbersome.

Applications:

1) Healthcare Associated Infections (HAIs)

Healthcare associated infections are also called hospital associated infections or nosocomial
infections. An important goal in defeating HAIs is to collect statistics by detecting and measure
the prevalence of HAIs, but also to predict and warn if a particular patient has a high risk of
obtaining HAI. HAIs can encompass, for example, pneumonia, urinary tract infection, sepsis or
various wound infections but also norovirus (winter vomiting disease). Two machine learning
algorithms ; Support Vector Machine (SVM) and Random Forest (RF) in the Weka toolkit were
applied on the annotated Stockholm EPR Detect-HAI Corpus.

2) Detection of Adverse Drug Events (ADEs)

Adverse drug events (ADEs) are a major public health problem, around 5% of all hospital
admissions in the world are due to ADEs

All drugs are poisonous in some sense but given in the correct amount they may cure a disease.

(a) Dose-related, for example giving toxic effect.

(b) Non-dose related, for example penicillin hypersensitivity.

(c) Dose-related and time-related, related to the cumulative dose.

(d) Time-related, becomes apparent some time after the use of the drug.

(e) Withdrawal, occurs after the withdrawal of the drug.

(f) Unexpected, often caused by drug interactions.

First of all, ICD-10 diagnosis codes related to adverse drug events that are assigned to the patient
records need to be studied.

Medical classification systems:

Medical terminologies, classification systems and available controlled vocabularies are used in
healthcare to report, administer, classify and explain diseases and treatment, including medication.

Mobile Imaging and Analytics

Mobile imaging is the technique of creating visual representations of the interior of a body for
clinical analysis and medical intervention, as well as visual representation of the function of some
organs or tissues.

Introduction:

Mobile technology and smart devices, especially smartphones, allows new ways of easier imaging
at the patient’s bedside and possess the possibility to be made into a diagnostic tool that can be
used by both professionals as well as lay people. Smartphones usually contain at least one high-
resolution camera that can be used for image formation. However, careful consideration has to be
taken when dealing with cameras in general, and with nonscientific cameras specifically. Many
parameters are usually reported on camera in public commercials, but not all of them are useful.
Especially, pixel resolution can be misleading as the number of pixels itself is not a measure of
quality. Quality is usually measured in signal-to-noise ratio (SNR)

SNR is defined as the power of the signal by the power of noise.

Noise can be introduced in several steps of the image acquisition.

• Shot noise, which is dependent on the quality of the sensor and the discretization of
different number of photons. This noise mostly occurs when only a few photons hit the
sensor.

• Transfer noise, which is introduced by connectivity in the sensor. This is usually static
for all images and can be reduced using background subtraction with an image acquired
in complete darkness.

In case of a camera, the signal is the amount of light captured by the sensor. Since image
noise is reduced, more photons are available. The most important parameter for the quality of
an optical system is the amount of light accumulated on each pixel. This parameter is
determined by the physical size of a pixel (or chip size in relation to number of pixels), as
larger pixel acquires more light, and the diameter of the entry lens, which regulates the amount
of light. The size of the entry lens is usually given in f-stop k (written as 1:k or f/k), the ratio
of distance from sensor to entry lens to diameter of entry lens, the lower, the better. Most
modern smartphones have similar optical parameters as regular consumer cameras, while being
built at a far smaller scale.

First integrations of these cameras into clinical routine and research have already shown manifold
applications for mobile technology in medicine.One example is the usage of the smartphone
camera to take pictures of test strips for automatic analysis.

Another example is the use of smartphone cameras to document necrotic skin lesions caused from
the rare disease calciphylaxis in a multicenter clinical registry. Here, special care must be taken
when dealing with multiple different smartphones or lighting conditions due to different
efficiencies in capturing colors.

A color reference has to be used to calibrate the camera colors in a later step. To control
illumination, zoom, and distance, the German company FotoFinder has developed an integrated
lens system that is easily attached to and powered by an iPhone transforming it into a
dermatoscope.
Beside the integrated camera, additional image formation methods can also be used on smart
devices by either incorporating special sensors (like ultrasound or ECG) or by connecting
themwired or wireless to more powerful imaging machines like micronuclearmagnetic resonance
(micro-NMR) for bedside diagnostic.

Data Visualization

The task of transforming an acquired image dataset into a perceptible form is called
visualization.This is rather simple for most 2D methods like digital photographs, but for 3D
volumes, in particular, if voxels are annotated with several features or monitored over time (3D+t).
In general, all data is displayed by transforming it into a colored 2D representation. Hence, we
need to consider the output devices as well as the definition and value ranges of the initial data.

Visualization Basics

The human eye is capable of detecting light between 390 and 700 nm wavelengths. Images that
are recorded and displayed within this so-called visible spectrum show the data in “true color.”
But because many modalities like X-ray, ultraviolet, or infrared imaging capture wavelengths
outside the visible spectrum, a modification of the recorded data has to be performed. The resulting
image (e.g., a grayscale image for X-ray) is displayed in “false color.” A special case ofthis is the
so-called “pseudo color,” which means that the color of an image has been artificed to enhance
certain features. Here, a single channel image and a so-called color map are used to convert each
value of the single channel into a corresponding color.

As an example, the Doppler signal contains information on direction of movement for each
position. This movement can be either positive (towards the detector), negative (away from the
detector), or zero (no movement). To superimpose this information to morphologic image data (B
mode), a different color scheme is applied. The zero level would be encoded in black, negative
values in blue, and positive values in red. Larger absolute value of the signal results in brighter
color.

Output Devices

All data is displayed on a computer screen, where colors are mixed from three basic channels: red,
green, and blue (RGB). This results in a cubic color space.Setting all three colors to the samevalue
creates different shades of gray. Each color is usually scaled from 0 (dark) to 255 (bright). This
equals a bit depth of 8, meaning that 8 bits in memory are allocated for each color channel yielding
in total 256 power 3∼ 16 million possible values. Higher bit-depth color or gray values are also
possible but rarely used, as they are not well supported by computer screens and file formats.

However, in some cases a higher contrast or distribution of color or gray values is needed, e.g., for
diagnostics in radiology. Therefore, computer screens in diagnostic radiology support higher bit
depth (e.g., grayscale bit depth of 10), and have a better contrast (e.g., 1400:1 compared to 1000:1
regular) and brightness (e.g., 400 cd/m2 brightness compared to 200 cd/m2 regular) than regular
computer screens.

Printers differ from screens in that the background color of a screen (no color turned on) is black,
while the background color of a printout (paper) is white. Thus, higher values in color for screens
result in brighter colors, while higher amounts of color from a printer result in darker colors.
Therefore, printers usually use cyan, magenta, yellow, and black (CMYK) color space to
compensate for the nonblack background. Black is used as a key ingredient when mixing the colors
to minimize the fluid on the paper. Therefore, computer screens in diagnostic radiology support
higher bit.

Mobile Visualization

Recently, visualization and display technology has been dominated by trends in mobile
computing.For example, prior to the introduction of the first retina display with the iPhone 4 in
2010, almost all computer and smartphone displays had a pixel density of about 70–100 pixels per
inch(ppi). Increase in resolution was mostly achieved through larger monitor screens.

However, the introduction of the retina display increased the pixel density above 300 ppi,
improving perceived contrast and also outperforming radiology displays in many other aspects
(e.g., iPhone 4 brightness: 500 cd/m2). Thereby, these new types of screens show great potential
for radiologists.

Additionally, modern smartphones and tablet computers provide a high amount of processing
power (e.g, 64-bit dual core, 1.3 GHz in iPhone 5s) that can be used for image visualization.
Almost all 2D and surface-rendering visualization techniques can be employed in real time. Real
time means that the result is delivered fast enough to make an impact on the current situation, or,
in terms of visualization of data, so that no delay between action (e.g., zooming) and result
(zoomed image) is perceived. Usually, this requires 15 to 20 frames per second (fps).[Frame rate,
then, is the speed at which those images are shown]

Volume rendering

Volume rendering is a type of data visualization technique which creates a three-dimensional


representation of data. CT and MRI data are frequently visualized with volume rendering in
addition to other reconstructions and slices. This technique can also be applied to tomosynthesis
data.

Volume rending is computationally expensive, for example, a dataset of CT angiography can


contain up to 6 GB of data in 512 power 3 pixels taken over time that have to be in memory during
visualization.Therefore, most smart devices are not capable of performing volume rending
natively. Remote visualization has been successfully implemented to display images, which have
been rendered on a server, remotely on a tablet computer or smartphone. This so- called streaming
is performed by sending video of a live view of an object from the server to the client (tablet
computer or smart-phone).

Example, using H.264 video compression that is standard in mobile communication. On the other
hand, the client captures touches, swipes, and other interactions of the user and sends these to the
server to update the live view. Streaming of video data has the benefit of allowing the user to use
a mobile device while having the computational power of a workstation. The drawback of this
approach is the needed bandwidth to stream images in real time from the server to themobile
device.
For example, a video with 30 frames per second (fps) and a resolution of 1920 by 1080 pixels
(FullHD/1080 p) requires about 1 Mb/sec bandwidth. This is not possible through most current
wireless networks like 3G, which is limited between 350 and 2000 kilobits per second
(kbit/s),depending on country and reception.

Calibration

Important for distributed visualization on a range of different devices is calibration. This means
that the same image is displayed in the exact same way on all devices, even if background
illumination differs between these devices. For this, an application has been developed that allow
users to calibrate their devices visually on their own. In this application the user is guided through
8 steps, each showing a visual pattern. In each step, the user has to adjust a slider to change the
visibility of the pattern.

One concern that is often raised when visualizing biomedical images on mobile devices is the
appropriateness for diagnostics. For example, software that displays medical images might have
to undergo investigation by the Food and Drug Administration (FDA) or other local legal
authorities to be cleared for commercial marketing. Smartphones and tablet computers do not
necessarily meet the requirements to undergo these studies. Therefore, the appropriateness and
legitimacy of the device chosen for visualization should always be taken into account when
considering the use of a mobile device for diagnostic or visualization of medical images.

Image Analysis

Image analysis is the task of extracting abstract information or semantics and knowledge from the
raw pixels of image and signal data.

This is the most challenging task in biomedical imaging as it supports researchers and clinicians
in finding clues for disease or certain phenotypes (diagnostics), supports novices and experts in
performing procedures (therapy) and follow-up to the outcome, and allows scientist to gain
knowledge from imaging data.

With the growing number of digital imaging devices, automated knowledge extraction becomes
more and more important. The new trend towards mobile and personalized health data additionally
drives the need for automation. For example, many applications for the smartphone-
based investigation of skin cancers do already exist but only a few are actually accurate. Pulse
frequency is determined accurately and contactless by any smartphone device simply filming the
face and determining the very slight periodic changes in skin color, which are usually not observed
by humans.

Biomedical image analysis

Biomedical image analysis task can be split up into several substeps:

1. Preprocessing to remove background noise or enhance the image

2. Extraction of features to be used in later steps

3. Registration of several images

4. Segmentation (localization and delineation) of regions of interest (ROIs)

5. Classification of the image or segmented parts and measurements

Preprocessing and Filtering

Basically all images from biomedical imaging modalities and especially those from smart phone
cameras are noisy and contain artifacts. Therefore, preprocessing is required before the data can
be used for analysis. Additional preprocessing can also help to prepare the image for certain
analysis tasks, such as edge detection. Most of the preprocessing algorithms are low in
computation time and memory requirements and hence suitable for mobile devices.

Gaussian filter
A Gaussian filter is commonly used to remove noise and recording artifacts from an image by
blurring. The filter consists of a multidimensional Gaussian distribution that is convolved with the
image. For convolution, the center value is replaced with the accumulated weighted values
according to the mask. High frequency noise in the image is thereby reduced.
On convolution of the local region and the Gaussian kernel gives the highest intensity value to
the center part of the local region(38.4624) and the remaining pixels have less intensity as the
distance from the center increases.Sum up the result and store it in the current pixel

location(Intensity = 94.9269) of the image.


Median filter:

The median filter is also used to reduce noise. For this filter, a sliding window with a fixed size
(here a 3x3 pixel) is moved across the image. The center point of the window is replaced by the
median value within the window. For median computation, the image pixel values at current mask
position (A to I) are sorted, and the center is replaced by the fifth value in the sorted row.This
removes outliers in an otherwise smooth area while maintaining the value of the majority of the
pixels.
Sobel filter

The Sobel filter is used to enhance edges in the image. For this, an asymmetric filter is convolved
with the image. The mask that is visualized is sensitive to vertical edges, in particular to vertical
edges from black to white. Usually, this mask is turned by 90◦ and the signs are changed ending
up with a set of eight different masks. All eight masks are applied individually and, for instance,
the maximum is used as a replacement for the center pixel to obtain an edge map.
Feature Extraction

Features are simplified descriptors of an image or part of an image. Features are used to compare
two images, or find similarities or shared objects between multiple images. Image features can be
either global (describing the image as a whole) or local (describing a part of any size of the image).

A very basic global image feature is the image histogram. A histogram is a probability
distribution of the pixel/voxel values in the image. For each possible value, the number of
occurrences is counted in the image. This results in a very simplified representation as information
on the intensity is maintained, but all spatial information is lost. Global features,such as the
shape of the histogram, can be used, for instance, to distinguish between classes of images, e.g.,
hand and skull radiographs

Local features describe only a part of the image at a certain spatial position. Most are created in
two separate steps. The first one is feature detection, in which points of interest (POIs) are
localized.The second step features description. For each of the detected points, a description of
this position (possibly including some surrounding areas) is created. Since images can be acquired
under different conditions like scale and rotation, certain invariance against thesechanges is needed
for both detector and descriptor.

Scale Invariant Feature Transform (SIFT)

Recognizing objects in images is one of the most important problems in computer vision. A
common approach is to first extract the feature descriptions of the objects to be recognized from
reference images, and store such descriptions in a database. When there is a new image, its feature
descriptions are extracted and compared to the object descriptions in the database to seeif the
image contains any object we are looking for. In real-life applications, the objects in the images to
be processed can differ from the reference images in many way:

Scale, i.e. size of the object in the image

➢ Orientation

➢ Viewpoint
➢ Illumination

➢ Partially covered

Scale-invariant feature transform (SIFT) is an algorithm for extracting stable feature description
of objects call keypoints that are robust to changes in scale, orientation, shear, position, and
illumination.
mo
BioMedical Image Analysis
Introduction
In its broadest sense, an image is a spatial map of one or more physical
properties of a sub- ject where the pixel intensity represents the value of a physical
property of the subject at that point. Imaging the subject is a way to record spatial
information, structure, and context information. In this context, the subject could be
almost anything: your family sitting for a family photo taken with your smartphone,
the constellations of orion’s belt viewed from a telescope, the roads of your
neighbor- hood imaged from a satellite, a child growing inside of its mother viewed
using an ultrasound probe. The list of possible subjects is endless, and the list of
possible imaging methods is long and ever- expanding. But the idea of imaging is
simple and straightforward: convert some scene of the world into some sort of array
of pixels that represents that scene and that can be stored on a computer.
Naturally, if we wanted to describe all of the possible subjects and modalities,
that would be an entire book of its own. But, for our purposes, we are interested
in biomedical images, which are a subset of images that pertain to some form of
biological specimen, which is generally some part of human or animal anatomy.
The imaging modality used to acquire an image of that specimen generally falls into
one of the categories of magnetic resonance imaging (MRI), computed tomog-
raphy (CT), positron emission tomography (PET), ultrasound (U/S), or a wide range
of microscopy modalities such as fluorescence, brightfield, and electron microscopy.
Such modalities have various purposes: to image inside of the body without harming
the body or to image specimens that are too small to be viewed with the naked eye.
These modalities enable us to image biological structure, function, and processes.
While we often think of images as 2D arrays of pixels, this is an overly
restrictive conception, especially as it pertains to biomedical images. For example,
if you broke a bone in your leg, you might get a 3D MRI scan of the region, which
would be stored as a three-dimensional array of pixel values on a disk. If that leg
needed to be observed over time, there might be multiple MRI scans at different
time intervals, thus leading to the fourth dimension of time. A fifth dimension of
modality would be added if different types of MRI scans were used or if CT, PET, U/S,
or biological images were added. When all of these time-lapse datasets of different
modalities are registered to each other, a rich set of five-dimensional information
becomes available for every pixel representing a physical region in the real world.
Such information can lead to deeper insight into the problem and could help
physicians figure out how to heal your leg faster.
Another multidimensional example is common in the area of microscopy. To
visualize cellular dynamics and reactions to drugs (for example, for the purpose of
discovering targets for treating cancer), a group of cells could be imaged in their
3D context using confocal microscopy, which enables optical sectioning of a region
without harming the structure. This region could have multiple markers for different
regions of the cell such as the nucleus, cytoplasm, membrane, mitochondria,
endoplasmic reticulum, and so forth. If these are live cells moving over time, they
can be imaged every few seconds, minutes, hours, or days, leading to time-lapse
datasets. Such five-dimensional datasets are common and can elucidate structure-
structure relationships of intracellular or extra- cellular phenomena over time in
their natural 3D environment.
If we were to stop at this point in the description, we would be left in a rather
frustrating position: having the ability to image complex structures and processes,
to store them on a computer, and to visualize them but without any ability to
generate any real quantitative information. Indeed, as the number of imaging
modalities increases and the use of such modalities becomes ubiquitous coupled
with increasing data size and complexity, it is becoming impossible for all such
datasets to be
carefully viewed to find structures or functions of interest. How is a physician
supposed to find every single cancerous lesion in the CT scans of hundreds of
patients every day? How is a biologist supposed to identify the one cell acting
unusually in a field of thousands of cells moving around randomly? At the same
time, would you want such events to be missed if you are the patient?
Being able to look inside of the body without hurting the subject and being able
to view bi- ological objects that are normally too small to see has tremendous
implications on human health. These capabilities mean that there is no longer a
need to cut open a patient in order to figure out the cause of an illness and that we
can view the mechanisms of the building block of our system, the cell. But being
able to view these phenomena is not sufficient, and generating quantitative infor-
mation through image analysis has the capability of providing far more insight into
large-scale and time-lapse studies. With these concepts in mind, the need for
computationally efficient quantitative measurements becomes clear.
Biomedical image analysis is the solution to this problem of too much data. Such
analysis meth- ods enable the extraction of quantitative measurements and
inferences from images. Hence, it is possible to detect and monitor certain
biological processes and extract information about them. As one example, more
than 50 years after the discovery of DNA, we have access to the comprehensive
sequence of the human genome. But, while the chemical structure of DNA is now
well understood, much work remains to understand its function. We need to
understand how genome-encoded com- ponents function in an integrated manner to
perform cellular and organismal functions. For example, much can be learned by
understanding the function of mitosis in generating cellular hierarchies and its
reaction to drugs: Can we arrest a cancer cell as it tries to replicate?
Such analysis has major societal significance since it is the key to understanding
biological systems and solving health problems. At the same time, it includes many
challenges since the images are varied, complex, and can contain irregular shapes.
Furthermore, the analysis techniques need to account for multidimensional datasets
I(x, y, z, λ,t, ), and imaging conditions (e.g., illumination)
cannot always be optimized.
In this chapter, we will provide a definition for biomedical image analysis and
explore a range of analysis approaches and demonstrate how they have been and
continue to be applied to a range of health-related applications. We will provide a
broad overview of the main medical imaging modal- ities (Section 3.2) and a
number of general categories for analyzing images including object de- tection,
image segmentation, image registration, and feature extraction. Algorithms that fall
in the category of object detection are used to detect objects of interest in images
by designing a model for the object and then searching for regions of the image that
fit that model (Section 3.3). The output of this step provides probable locations for
the detected objects although it doesn’t neces- sarily provide the segmented outline
of the objects themselves. Such an output feeds directly into segmentation
algorithms (Section 3.4), which often require some seeding from which to grow
and segment the object borders. While some segmentation algorithms do not
require seeding, accurate locations of the objects provides useful information for
removing segmented regions that may be ar- tifacts. Whereas detection and
segmentation provide detailed information about individual objects, image
registration (Section 3.5) provides the alignment of two or more images of either
similar or different modalities. In this way, image registration enables information
from different modalities to be combined together or the time-lapse monitoring of
objects imaged using the same modality (such as monitoring tumor size over time).
Feature extraction combines object detection, image segmentation, and image
registration together by extracting meaningful quantitative measurements from the
output of those steps (Section 3.6). Taken as a whole, these approaches enable the
genera- tion of meaningful analytic measurements that can serve as inputs to other
areas of healthcare data analytics.
Chest and abdomen CT Whole-body FDG-PET T1-weighted MRI brain

Cardiac ultrasound Brightfield brown stain Fluorescence


microscopy

FIGURE 3.1 (See color insert.): Representative images from various medial
modalities.

Biomedical Imaging Modalities


In this section, we provide a brief introduction to several biomedical imaging
modalities with emphasis on unique considerations regarding image formation and
interpretation. Understanding the appearance of images resulting from the different
modalities aids in designing effective image analysis algorithms targeted to their
various features. Representative images from the modalities discussed in this
section are shown in Figure 3.1.

Computed Tomography
Computed Tomography (CT) creates 2D axial cross-section images of the body
by collecting several 1D projections of conventional X-ray data using an X-ray
source on one side and a detec- tor on the other side. The 1D projection data are
then reconstructed into a 2D image. Modern CT systems are capable of acquiring
a large volume of data extremely fast by increasing the axial cov- erage. A CT image
displays a quantitative CT number usually reported in Hounsfield units, which is a
measure of the attenuation property of the underlying material at that image
location. This makes CT inherently amenable to quantification. CT has become the
mainstay of diagnostic imaging due to the very large number of conditions that are
visible on CT images. A recent development has been the advent of so-called Dual
Energy CT systems, where CT images are acquired at two different en- ergy levels.
This makes it possible to do a very rich characterization of material composition
using differential attenuation of materials at two different energy levels. The
simplest form of CT image reconstruction algorithms use variations of the filtered
back-projection method, but modern iterative model-based methods are able to
achieve excellent reconstruction while limiting doses to a patient. Common artifacts
associated with CT images including aliasing, streaking, and beam hardening.
Positron Emission Tomography
Positron Emission Tomography (PET) is a nuclear imaging modality that uses
radioactively labeled tracers to create activity maps inside the body based on uptake
of a compound based on metabolic function. PET measures the location of a line
on which a positron annihilation event occurs and as a result two simultaneous 511
keV photons are produced and detected co-linearly using co-incidence detection.
PET allows assessment of important physiological and biochemical processes in
vivo. Before meaningful and quantitatively accurate activity uptake images can be
generated, corrections for scatter and attenuation must be applied to the data. Newer
iterative recon- struction methods model attenuation, scatter, and blur and have
sophisticated methods of dealing with motion that may take place during the image
acquisition window.

Magnetic Resonance Imaging


Magnetic Resonance Imaging (MRI) is a high resolution, high contrast,
noninvasive imaging modality with extremely rich and versatile contrast
mechanisms that make it the modality of choice for looking at soft tissue contrast.
In conventional MRI, signals are formed from nuclear mag- netic response
properties of water molecules that are manipulated using external static and varying
magnetic fields and radio-frequency pulses. In addition to looking at anatomy and
structure, im- age acquisition methods can be tailored to yield functional
information such as blood flow. Images with very different contrasts can be created
to selectively highlight and/or suppress specific tissue types. Spatially varying
gradients of magnetic fields are used to localize the received signal from known
anatomic locations and form 2D or 3D images. Received data is typically
reconstructed us- ing Fourier methods. Some common artifacts in MRI images are
geometric distortion (warping) due to gradient nonlinearities, wraparound and
aliasing, streaking, ghosts, chemical shift, and truncation artifacts.

Ultrasound
Ultrasound is one of the most ubiquitous imaging modalities due in large part to
its low cost and completely noninvasive nature. Ultrasound imaging transmits high
frequency sound waves using specialized ultrasound transducers, and then collects
the reflected ultrasound waves from the body using specialized probes. The variable
reflectance of the sound waves by different body tissues forms the basis of an
ultrasound image. Ultrasound can also depict velocities of moving structures such
as blood using Doppler imaging. Imaging a growing fetus in the womb and
cardiovascular imaging are two of the most common ultrasound imaging
procedures. Due to very fast acquisition times, it is possible to get excellent real-
time images using ultrasound to see functioning organs such as the beating heart.
Modern ultrasound systems employ sophisticated electronics for beam forming and
beam steering, and have algorithms for pre-processing the received signals to help
mitigate noise and speckle artifacts.

Microscopy
In addition to in vivo radiological imaging, clinical diagnosis as well as research
frequently makes uses of in vitro imaging of biological samples such as tissues
obtained from biopsy speci- mens. These samples are typically examined under a
microscope for evidence of pathology. Tradi- tional brightfield microscopy imaging
systems utilize staining with markers that highlight individual cells or cellular
compartments or metabolic processes in live or fixed cells. More rich proteomics
can be captured by techniques such as fluorescence-based immunohistochemistry
and images can be acquired that show expression of desired proteins in the sample.
Images from such microscopy systems are traditionally read visually and scored
manually. However, newer digital pathology plat-
forms are emerging and new methods of automated analysis and analytics of
microscopy data are enabling more high-content, high-throughput applications.
Using image analysis algorithms, a mul- titude of features can be quantified and
automatically extracted and can be used in data-analytic pipelines for clinical
decision making and biomarker discovery.

Biomedical Imaging Standards and Systems


Development of image analytics and quantification methods is founded upon
common standards associated with image formats, data representation, and
capturing of meta-data required for down- stream analysis. It would be extremely
challenging to develop general-purpose solutions if the data produced by systems
across platforms and manufacturers did not conform to standard formats and data
elements. Digital Imaging and Communications in Medicine (DICOM,
dicom.nema.org) is a widely used standard that helps achieve this for the purposes
of handling, storing, printing, and transmitting medical imaging data. It defines a
file format and a network communications protocol for these data types. Every
device that deals with medical imaging data comes with a DICOM con- formance
statement which clearly states the DICOM classes that it supports and how it
implements them. As an example, all the GE Healthcare devices DICOM
conformance statements can be found in
http://www3.gehealthcare.com/en/Products/Interoperability/DICOM.
While DICOM is the most commonly adopted industry wide standard for
medical imaging data, HL7 (http://www.hl7.org) is a more general standard used
for exchange, integration, sharing, and retrieval of electronic healthcare
information. It defines standards not just for data but also appli- cation interfaces
that use electronic healthcare data. The IHE (http://www.ihe.net) initiative drives
the promotion and adoption of DICOM and HL7 standard for improved clinical
care and better integration of the healthcare enterprise.
Medical imaging data is commonly stored and managed using specialized
systems known as Picture Archiving and Communications System (PACS). PACS
systems house medical images from most imaging modalities and in addition can
also contain electronic reports and radiologist annota- tions in encapsulated form.
Commercial PACS systems not only allow the ability to search, query- retrieve, and
display and visualize imaging data, but often also contain sophisticated post-
processing and analysis tools for image data exploration, analysis, and
interpretation.
In this section, we have presented a number of the most common biomedical
imaging modal- ities and described their key features. In the following sections, we
will show how image analysis algorithms are applied to quantify these types of
images.
Object Detection
We begin our discussion of image analysis algorithms with the topic of object
detection. De- tection is the process through which regions of potential interest,
such as anatomical structures or localized pathological areas, are identified. Often
associated with detection is the localization of the targeted structures. In the absence
of such association, the problem of detecting a region of interest has a strong overlap
with the problem of classification, in which the goal is simply to flag the pres- ence
(or absence) of an abnormal region. In this section the word “detection” is used
specifically to designate the joint detection and localization of a structure of interest.

Clinical Decision Support System.


Clinical decision support (CDS) provides clinicians, staff, patients or other individuals
with knowledge and person-specific information, intelligently filtered or presented at
appropriate times, to enhance health and health care. CDS encompasses a variety of tools to
enhance decision-making in the clinical workflow. These tools include computerized alerts
and reminders to care providers and patients; clinical guidelines; condition-specific order sets;
focused patient data reports and summaries; documentation templates; diagnostic support, and
contextually relevant reference information, among other tools.

Introduction
Clinical Decision Support Systems (CDSS) are computer systems designed to
assist clinicians with patient-related decision making, such as diagnosis and
treatment. Ever since the seminal To Err Is Human [1] was published in 2000, CDSS
(along with Computer-Based Physician Order Entry systems) have become a
crucial component in the evaluation and improvement of patient treatment. CDSS
have shown to improve both patient outcomes and cost of care. They have
demonstrated to minimize analytical errors by notifying the physician of potentially
harmful drug interactions, and their diagnostic procedures have been shown to
enable more accurate diagnoses. There are a wide variety of uses for CDSS in
clinical practice. Some of the main uses include:
• Assisting with patient-related decision making.
• Determining optimal treatment strategies for individual patients.

• Aiding general health policies by estimating the clinical and economic outcomes of different
treatment methods.

• Estimating treatment outcomes under circumstances where methods like randomized trials
are either impossible or infeasible.

In 2005, Garg et al. [2] conducted a review of 100 patient studies and concluded
that CDSS improved diagnosis in 64% and patient outcomes in 13% of the studies
tested. That same year, Duke University conducted a systematic review of 70
different cases and concluded that decision support systems significantly improved
clinical practice in 68% of all trials. The CDSS features attributed to the analysis’
success included:
• natural integration with clinical workflow.

• electronic nature.

• providing decision support at the time/location of care rather than before or after the patient
encounter.

• use of recommended care rather than assessments of care.

Two particular fields of healthcare where CDSS have been hugely influential
are the pharmacy and billing. Pharmacies now use batch-based order checking
systems that look for negative drug interactions and then report them to the
corresponding patient’s ordering professional. Meanwhile,
in terms of billing, CDSS have been used to examine both potential courses of
treatment and con- ventional Medicare conditions in order to devise treatment plans
that provide an optimal balance of patient care and financial expense.
In this chapter, we will provide a survey of different aspects of CDSS along with
various chal- lenges associated with their usage in clinical practice. This chapter is
organized as follows: Sec- tion 19.2 provides a brief historical perspective
including the current generation CDSS. Various types of CDSS will be described
in Section 19.3. Decision support during care provider order en- try is described in
19.4 while the diagnostic decision support is given in 19.5. Description of the
human-intensive techniques that can be used to build the knowledge base is given
in Section 19.6. The primary challenges with the usage of CDSS are studied in
Section 19.7 while the legal and ethical issues concerned is discussed in Section
19.8. Section 19.9 concludes our discussion.

Historical Perspective
In this section, we provide a historical perspective on the development of CDSS.
We will first describe the most popular early CDSS that were developed several
decades ago and then we will discuss the current generation CDSS. For each of the
CDSS, we will give the high-level idea of its functioning and also mention the
primary drawbacks.

Early CDSS
Ever since the birth of the medical industry, health scientists have recognized the
importance of informed clinical decision making. Unfortunately, for a long time,
efficient methods for researching and evaluating such methods were quite rare.
Clinicians often relied on extensive research and hand- written records to establish
the necessary knowledge for a well-informed decision. Naturally, this proved to be
both error prone and very time consuming. Fortunately, the evolution of business-
related computing in the 1970s and 1980s gave clinicians an easy mechanism for
analyzing patient data and recommending potential courses of treatment and thus,
CDSS were born.
Early systems rigidly decided on a course of action, based on the user’s input
[3]. The user would input any necessary information, and the CDSS would output
a final decision, which in turn would be the user’s course of action:
• Caduceus (aka The Internist) [4]: This system was developed in the 1970s as a means of
implementing an artificial intelligence model for use in CDSS, with the central goal of the
physician using a “hypothetico-deductive” approach to medical diagnosis. One of the sys-
tem’s unique features was its use of a probabilistic method for ranking diagnoses. It evaluated
patient symptoms and then searched its knowledge base for the most likely disease, based
on the statistics of existing patients with the specified symptoms. Unfortunately, Caduceus’
diagnostic accuracy was not good. For instance, in 1981, a study using pre-existing clinico-
pathological conference cases was conducted and then published in The New England Journal
of Medicine. Caduceus was unable to match the diagnostic accuracy of real-life experts in this
study, due to its limited knowledge base and small number of diagnostic algorithms. Thus,
the system was unable to gain widespread acceptance with the medical community.
In the mid 1980s, Caduceus evolved into QMR (Quick Medical Reference).
QMR differed significantly from Caduceus in that, while Caduceus was used
mainly for diagnostic consul- tation (i.e., suggesting rigid courses of treatment
to clinicians), QMR was more flexible. It allowed clinicians to modify and
manipulate its suggested diagnoses/treatments in whichever
way they wished, while allowing them to utilize its knowledge base to establish
their own hy- potheses with regards to the treatment of more complex and
difficult cases [4]. While QMR contained an extensive medical database
(approximately 570 diseases in all), it had the major disadvantage of requiring
frequent updates whenever new diseases were discovered. Further- more,
according to a 1994 study comparing QMR with three other clinical decision
support systems, the system gave considerably fewer “correct” patient
diagnoses (by the standards of a group of physicians) than the three competing
systems [5]. Thus, by 2001, QMR was largely abandoned in favor of less
cumbersome and more accurate CDSS.

• MYCIN [6]: This was originally developed in the 1970s as a means for identifying infec-
tious diseases and recommending antibiotics for treatment. A unique aspect of MYCIN was
its emphasis on artificial intelligence (AI). Its AI model was constructed through a rule-based
system, in which roughly 200 decision rules (and counting) were implemented into the sys-
tem, forming the knowledge base. To determine possible patient diagnoses, MYCIN’s internal
decision tree was consulted, and diagnostic options were reached by running through its var-
ious branches. The rule-based system was very flexible in that it allowed clinicians to either
modify existing rules or devise new ones as they saw fit, making MYCIN adaptable to chang-
ing medical trends and discoveries. Therefore, it was considered an expert system, since its
AI component allowed for results that were theoretically similar to those of a real-life expert.
Unfortunately, there were many significant problems with MYCIN. First, it
worked very slowly, with a typical analysis requiring upwards of 30 minutes.
Second, there was concern over whether physicians ran the risk of putting too
much trust in computerized results at the expense of their own judgment and
inquiry. Third, there was the issue of accountability: Who would be held liable
if the machine made an error in patient diagnosis? Perhaps the most important
problem was how ahead of its time MYCIN was. It was developed before
desktop computing and the Internet existed, so the system was based on a rather
dated model for com- puter interaction [7]. Nonetheless, its influence was far
reaching and is still felt to this day, with many systems either combining it
with other expert systems (Shyster-MYCIN [8]) or using it as an influence on
the development of new systems (GUIDON [9]).

• Iliad [10]: Iliad is another “expert” CDSS. It contains three modes of usage: Consultation,
Simulation, and Simulation-Test. In Consultation mode, users enter real-life patient findings
into the system. Iliad then analyzes these findings and compiles a list of possible diagnoses,
with each diagnosis ranked in terms of its likelihood of correctness. A unique feature of Iliad
is its handling of “gaps” in patient information. If the patient data appears incomplete, Iliad
will suggest methods of completion and/or compromise, so that the clinician may continue
working on a possible diagnosis. In Simulation mode, Iliad assumes the role of a complaining
patient. It offers a typical real-life complaint and then demands input, testing, etc., from the
clinician. The clinician’s questions, responses, and diagnostic decisions are evaluated by Iliad,
with feedback provided once analysis is complete. Finally, in Simulation-Test mode, Iliad runs
a similar real-life patient simulation, except that feedback is not given to the clinician. Instead,
Iliad silently evaluates his/her performance and then sends it to another user. Needless to say,
because of its highly scholastic focus, Iliad is often used for educational purposes. In fact,
studies have shown that it is very effective in training aspiring medical professionals for real-
life practice [10].
Unlike many other systems, which use knowledge-frame implementations, Iliad
uses a framed version of the Bayes model for its analysis [11]. This makes it
much easier for the system to recognize multiple diseases in a single patient
(further information on Bayes classification can be found in Section 19.3.1.2).
For diseases that are mutually dependent, a form of cluster analysis is included.
This groups the diseases into independent categories, based not only on
the disease type, but also on clinician-specified factors such as their specific point
of infection. This is so that the diseases may be efficiently analyzed and a more
effective Bayesian classifier may be devised.

The 1980s saw tremendous growth and development in the field of clinical
decision support. Greater involvement from the Association of American Medical
Colleges in clinical library practice provided the necessary funding and resources
for developing functional computerized information systems. Such systems
included everything from electronic health records to financial management
systems. Furthermore, PDAs (personal digital assistants) aided the development of
CDSS by giving them portability. Patient data and clinical decision-making
software could now be carried in the clinician’s pocket, allowing him/her to easily
reach informed decisions without cutting into their time with the patient. Although
PDAs were more akin to basic information systems than CDSS, they were major
stepping-stones in the development of CDSS that would allow clinicians to make
diagnostic and treatment decisions while remaining physically close to their
patients.

CDSS Today
Today’s CDSS have much broader and more flexible methods for making
clinical decisions, using both clinician and machine knowledge to give a series of
potential “suggestions,” with the clinician deciding on the suggestion that is most
appropriate to her specific needs [3].
• VisualDx [12]: This is a JAVA-based clinical decision support system that, as the name sug-
gests, is often used as a visual aid in assisting healthcare providers with diagnosis. This is use- ful
in instances where surface level diseases (such as those of the skin) are present, and doctorsneed
visual representations of these diseases to aid with diagnosis. A unique feature of Visu-alDx
is that, rather than being organized by a specific diagnosis, it is organized by symptomsand
other visual clues. It uses a sophisticated matching process that visually matches images of
the specific patient’s abnormalities with pre-existing images within a built-in database of
more than 6,000 illnesses. It then uses the results of these comparisons to recommend courses
of treatment.
VisualDX has significant limitations. In addition to a vast image database, the
system contains a written summary of each image. Unfortunately, these
summaries are relatively brief and are, therefore, prone to overgeneralization.
For example, skin biopsies are often recommended for “sicker” patients.
However, it is unclear what is actually meant by “sicker.” This is especially
problematic when we consider that skin biopsies are rarely performed unless
standard skin therapy has proven ineffective. Nevertheless, VisualDx has
been demonstrated to be quite useful when diagnosing surface-level illness.
The system is operational to this day, with a significant update in 2010
enabling companionship with a similar product called UpToDate [3].
• DXplain [13]: This is a web-based diagnosis system developed in the late 1980s by the Amer-
ican Medical Association. A unique feature of this system is its simplicity: Clinicians enter
patient information using nothing but their own medical vocabulary, and the system outputs a
list of potential diagnoses from a knowledge base consisting of thousands of diseases (with up
to ten different references each), along with the potential relevance of its choices. Therefore, it
functions as a clinical decision support system for physicians with little computer experience.
DXplain has been demonstrated to be both reliable and cost efficient,
especially in academic environments [3]. For example, a 2010 study consisting
of more than 500 different diagnos- tic cases was assigned to various
Massachusetts General Medicine residents. They concluded that medical
charges, Medicare Part A charges, and service costs significantly decreased
when
using DXplain for diagnostic recommendation [14]. DXplain has also been
frequently demon- strated to give very accurate diagnoses. For example, in a
2012 study conducted by Lehigh University, the system was compared with
four other CDSS. The conclusion drawn was that it was second only to Isabel
(discussed below) in terms of accuracy [15].
• Isabel [16]: This is one of the most comprehensive CDSS available. Like DXplain, it is a web-
based system designed with physician usability in mind. Originally, it focused mainly on
pediatrics, but it was soon expanded to cover adult symptoms. Isabel contains two subsystems:
a diagnostic checklist utility and a knowledge mobilizing utility. The diagnosis checklist tool
enables physicians to enter patient demographics and clinical features into the system, which
then returns a set of recommended diagnoses. The knowledge mobilizing utility may then be
used to research additional information about the recommended diagnoses [3].
Isabel has been demonstrated to give exceptionally accurate diagnoses of most
patient cases. In the Lehigh University study, for example, it was shown to be
the most accurate of the five systems tested. Other studies, such as a 2003 study
conducted by the Imperial College School of Medicine, have also
demonstrated this system to be very accurate [17]. Unfortunately, Isabel is a
relatively new CDSS and, thus, more extensive testing must be performed in
order to give a firm assessment of its overall reliability.

Various Types of CDSS


There are two main types of clinical decision support systems:
Knowledge-Based and
Nonknowledge-Based.

Knowledge-Based CDSS
Contemporary CDSS are rooted in early expert systems. These systems
attempted to replicate the logic and reasoning of a human decision maker, reaching
firm decisions based on existing knowl- edge. Knowledge-based CDSS rose out of
the intuitive realization that medicine was a good field for applying such
knowledge. A computer could (theoretically) mimic the thought processes of a
real-life clinician and then give a finalized diagnosis based on the information at hand
(Figure 19.1). During the 1990s and 2000s, however, CDSS moved away from
attempting to make rigorous clinical decisions in favor of offering a variety of
possible diagnostic/treatment options and then allowing the clinician herself to
make a finalized decision [7]. There are multiple reasons for this change in focus.
These include an underlying fear of computers being inherently prone to errors, the
realization that artificial intelligence still had a long way to go before it could
successfully mimic the knowledge and reasoning skills of real-life clinicians, the
infringement computerized decision making placed on physician/patient relations,
etc. Thus, today’s CDSS present a variety of diagnos- tic/treatment options to
clinicians, allowing them to evaluate first-hand the patient’s symptoms and
personal testimonies while utilizing the systems as reference points for possible
diagnoses.
Knowledge-based CDSS are those with a built-in reference table, containing
inbred information about different diseases, treatments, etc. They use traditional AI
methods (such as conditional logic) to reach decisions on courses of treatment. There
are three main parts to a knowledge-based CDSS. They are the knowledge base, the
inference engine, and the user communication method.
The knowledge base is essentially a compiled information set, with each piece
of information structured in the form of IF-THEN rules. For example, IF a new
order is placed for a slowly- changing blood test, AND IF the blood test was ordered
within the past 48 hours, THEN we alert the
FIGURE 19.1: A general knowledge-based clinical decision
support system.

physician to the possibility of duplicate test ordering. The knowledge base functions
in conjunction with whichever algorithmic structure the system uses for its analysis.
To put it simply, the user inputs patient information, and then the system searches
through its knowledge base for matching diseases or treatment possibilities [2].
The inference engine applies a system of logic to the knowledge base, allowing
it to “become smarter” by establishing new and/or updated knowledge. It contains
the necessary formulae for combining the rules in the knowledge base with any
available patient data, allowing the system to create patient-specific rules and
conditions based on its knowledge of both the patient’s medical history and the
severity of his/her current condition. A particularly important aspect of the inference
engine is its mutual exclusion from the knowledge base. Because CDSS
development is very time consuming, reusability is key. Anybody should be
allowed to construct a new CDSS through an existing inference engine.
Unfortunately, most real-life systems are developed with a specific goal in mind
(for example, diagnosing breast cancer). Thus, it is either difficult or impossible to
use them beyond their intended purpose.
Finally, the user communication method is where the clinician herself inputs
the patient’s relevant data and then receives the corresponding results. In some
CDSS, the patient data must be
manually entered. Most of the time, however, patient data is provided through a
computer-based record. The record is inputed either by the clinician or an external
lab or pharmacy and is, thus, already electronically scaled. It is the clinician’s job
to properly manipulate the system to obtain the outcome she wishes. Diagnostic
and treatment outcomes are generally represented as either recommendations or
alerts. Occasionally, if an alert has been generated after an initial order was placed,
automated emails and wireless notifications will be sent.
The usual format for a knowledge-based CDSS is that the clinician is asked to
supply a certain amount of input, which is then processed through both the system’s
knowledge base and reasoning engine. It then outputs a series of possible diagnostic
or treatment options for her.

Input
While there is substantial variance in the manner in which clinical information
is entered into a CDSS, most systems require the user to choose keywords from
his/her organization’s word dic- tionary. The challenge clinicians typically face
with this requirement is that different CDSS have different word vocabularies. The
quality of output in a CDSS depends on how well its vocabulary
matches the clinician’s keywords. In general, however, items related to the patient’s
medical history and current symptoms are going to be the suggested input.
One potentially effective method of giving detailed input is to use an explicitly
defined time model, in which the user specifies various time intervals and the
events that occurred within them. Unfortunately, this complicates user input and
would, thus, likely prove too cumbersome for the average clinician. A simpler
solution would be to use an implicit time model, in which broad temporal
information is part of the specified user input (for example, “history of recent
exposure to strep”) [7]. While this simplified approach has the disadvantage of
temporal ambiguity (does “recent” mean “just last week” or “last year”?), it has
proven to be a viable method of measuring time in a CDSS.

Inference Engine
The inference engine is the part of the CDSS that combines user input with all
other necessary data to devise a final list of “decisions.” To avoid confusion, this
process is usually hidden from the user. There are many different methods of
analyzing user input and devising results from it. One popular method is the
utilization of production rules. These are logical IF-THEN statements that, when
combined, form concrete solutions to problems. MYCIN is an example of a popular
CDSS that uses production rules. However, the most popular method of
probabilistic estimate in an inference engine is Bayes’ Rule, which computes the
conditional probabilities [7]. In mathematical terms, suppose we wish to compute
the probability of event A given event B, (or Pr(A|B)). As long as we already have
Pr(B|A), along with “prior probabilities” (Pr(A) and Pr(B)) at our disposal, we may
use Bayes’ Rule to compute Pr(A|B) as follows:
Pr(A) · Pr(B|A)
Pr(A|B) = (19.1)
Pr(B)

To give a practical example, suppose we wish to learn the likelihood of a patient


|
having hepatitis given that she has jaundice. (i.e., Pr(hepatitis jaundice)). To
|
compute this probability, we begin by computing a more obvious probability: Pr(
jaundice hepatitis). Intuitively, this could be solved by studying an established
series of patients with hepatitis and then calculating the ratio of patients with
jaundice to the total number of patients. We would then plug the resultant
probability into Bayes’ Rule, along with the general likelihoods of hepatitis and
jaundice among the total patient population (“Pr(hepatitis)” and “Pr( jaundice),”
respectively). We, thus, obtain the following:
Pr(hepatitis) · Pr( jaundice|hepatitis)
Pr(hepatitis | jaundice) = (19.2)
Pr( jaundice)

The result is an estimate of the patient’s likelihood for having hepatitis, given
the presence of jaundice.
In medicine, there is the challenge of computing the likelihood of two disjoint
yet potentially related events happening simultaneously in a patient [7]. For
example, suppose we wish to compute the probability of a patient having both
pneumonia and an abnormal chest radiograph:

Pr(pneumonia + abnormal CXR) (19.3)


Intuitively, it would appear that the solution is as follows:

Pr(pneumonia + abnormal CXR) = Pr(pneumonia) · Pr(abnormal CXR) (19.4)


Unfortunately, this formula will not work since the probabilities for pneumonia
and abnormal chest radiography are typically very small. Thus, we would obtain
an absurdly small probability for both occurring simultaneously, even though we
know patients with pneumonia typically have abnormal chest radiographies.
Fortunately, we may modify the formula to give a more accurate
prediction by multiplying the probability that a patient has pneumonia with the
probability that she has an abnormal chest radiograph given the presence of
pneumonia:

Pr(pneumonia + abnormal CXR) = Pr(pneumonia) · Pr(abnormal CXR|pneumonia) (19.5)


This will give us a much higher, and thus more accurate, probability estimate.
In general terms, we compute the probability of conditions “A” and “B” existing
simultaneously in the following manner:

Pr(A + B) = Pr(A) · Pr(B|A) (19.6)


By slightly rearranging this equation, we obtain Bayes’ Rule:
Pr(A) · Pr(B|A)
Pr(A|B) = (19.7)
Pr(B)

A major roadblock when implementing Bayes’ Rule is the possibility of a


patient having mul- tiple symptoms. Fortunately, this problem is slightly
neutralized by the fact that most diseases are mutually exclusive of one another.
With that said, a frame-based version of Bayes’ Rule is used for taking all possible
diseases into account. Illiad [11] is an example of a CDSS that successfully uses
this mechanism. It uses a cluster-based framework that categorizes potential
diagnoses by a common underlying thread (for example, chest pains). The logic
used in these clusters is based not only on the dependencies of these possible
diagnoses but also a user’s understanding of how they would be categorized. For
this very reason, Iliad uses Boolean statements [11]. Likewise, a Bayesian Network
could be established through a series of Bayes’ Rule implementations. This is
essentially a graphical framework representing the cause-and-effect relationships of
different events.

19.3.1.1 Knowledge Base


Naturally, for a CDSS to be successful, it must possess some form of medical
knowledge. Fur- thermore, this knowledge must be implemented in whichever
format the inference engine uses. Thus, a knowledge base must be created. The
knowledge base contains all necessary medical infor- mation along with any rules
or conditions necessary for analysis. For example, if the engine uses Bayes’ Rule,
medical knowledge must be encoded in such a manner that it allows for
computation with this method of probabilistic estimates.
There are four forms of knowledge representation: logic, procedural,
graph/network, and struc- tured systems [18]. Logic is widely considered to be the
most common form of knowledge repre- sentation. Medical knowledge is typically
divided into two categories: declarative and procedural. Declarative knowledge
consists of basic sentences and propositions stating hard facts, while pro- cedural
knowledge gives a more linear description of what actions or conclusions are
feasible given the knowledge at hand. Graph/network representation is, as the
name suggests, knowledge rep- resentation through the use of a graphical or
network-based system (for example, DXPlain [13]), while structured knowledge
is a categorized knowledge base.
Unfortunately, there is a crucial challenge in the implementation of knowledge
bases that em- phasize disease and treatment probability: many real-life
probabilities in the clinical environment are unknown. While medical literature and
consultation are certainly useful in terms of obtaining these probabilities, they often
contain disparate numbers and estimates from one another, leaving the physician
to guess the correct estimate. Furthermore, the probabilities of most diseases are
de- pendent not only on specific symptoms but also on external factors such as the
patient’s geographic location and other demographical information. Lastly,
knowledge bases must be regularly updated as new information becomes available.
This is an ongoing issue with no clear solution, since many CDSS begin life as
funded academic projects, for which maintenance must cease once funding has
stopped.
Output
The output of a CDSS is generally in the form of a probabilistically ranked list
of solutions. Generally, this list is in ASCII text format, but it may also be
graphical. In some cases, factors other than probability are used in the ranking
process. For example, in DXplain, diseases that are not necessarily likely but very
risky when misdiagnosed are given special rank privileges. In fact, generally
speaking, physicians are more interested in the least likely diagnoses than in the
most likely ones, since less likely diagnoses are much easier to overlook.

Nonknowledge-Based CDSS
Nonknowledge-based CDSS differ from knowledge-based ones in that, rather
than a user- defined knowledge base, they implement a form of artificial
intelligence called Machine Learn- ing. This a process by which a system, rather
than consulting a precomposed encyclopedia, simply “learns” from past
experiences and then implements these “lessons” into its knowledge base. There are
two popular types of Nonknowledge-based CDSSs: Artificial Neural Networks and
Genetic Algorithms [7].

Artificial Neural Networks


Artificial Neural Networks (ANN) simulate human thinking by evaluating and
eventually learn- ing from existing examples/occurrences [19]. An ANN consists of
a series of nodes called “neu- rodes” (corresponding to the “neurons” in the human
brain) and the weighted connections (corre- sponding to nerve synapses in the
human brain) that unidirectionally transmit signals between them. An ANN contains
three different components: input, output, and a hidden data processor. The input
segment receives the data, while the output segment gives the finalized results. The
data processing component, meanwhile, acts as an intermediary between the two. It
processes the data and then sends the results to the output segment.
The structure of an ANN is very similar to that of a knowledge-based CDSS.
However, unlike knowledge-based CDSS, ANNs do not have predefined
knowledge bases. Rather, an ANN studies patterns in the patient data and then finds
correlations between the patient’s signs/symptoms and a possible diagnosis.
Another significant difference is that knowledge-based CDSSs generally cover a
much wider range of diseases than ANNs.
In order to function properly, ANNs must first be “trained.” This is done by first
inputing a large amount of clinical data into the neural network, analyzing it, and
then hypothesizing the correct output. These educated guesses are then compared
to the actual results, and the weights are adjusted accordingly, with the incorrect
results being given more weight. We continue to iteratively run this process until a
substantial number of correct predictions have been made.
The advantage of using ANN is that it eliminates the need for manually writing
rules and seek- ing expert input. ANNs can also analyze and process incomplete
data by inferring what the data should be, with the quality of analysis being
consistently improved as more patient data is analyzed. Unfortunately, ANNs also
have certain disadvantages. Due to their iterative nature, the training pro- cess is very
time consuming. More importantly, the formulas/weights that result from this
process are not easily read and interpreted. Therefore, with the system being unable
to describe why it uses certain data the way it does, reliability is a major issue.
Nevertheless, ANNs have proven to be very successful in terms of predicting
such diseases as oral cancer and myocardial infection. They have also been
successfully used for the prediction of chronic diseases such as breast cancer
recurrence [20] and have even shown promise in aiding the field of dentistry [21].
Thus, they are widely considered to be a viable method of clinical decision making.
Genetic Algorithms
The other key example of nonknowledge-based systems is the Genetic
Algorithm. Genetic Al- gorithms are based on Charles Darwin’s theories of natural
selection and survival of the fittest. Just as species change in order to adapt to their
environment, genetic algorithms regularly “reproduce” themselves in order to
better adapt to the task at hand. As with Darwin’s theory of “survival of the fittest,”
genetic algorithms generally begin by attempting to solve a problem through the
use of randomly generated solutions [22]. The next step is to evaluate the quality
(i.e., “fitness”) of all the available solutions through the use of a “fitness function.”
The solutions are ranked by their fitness scores, with the more fit solutions having
greater likelihood of “breeding” new solutions through the mutual exchange among
themselves. These new solutions are evaluated similarly to their parent solutions,
and the process iteratively repeats until an optimal solution is found.
Because of their more cumbersome nature, genetic algorithms have seen less
use in clinical decision support than artificial neural networks. Nonetheless, they
have been successfully used in fields such as chemotherapy administration and
heart disease [23, 24].

Decision Support during Care Provider Order Entry


Care Provider Order Entry (CPOE) systems are decision support systems that
allow clinicians to electronically input medical orders for whichever patients they
are treating. Specifically, clinicians log in to a system and load their CPOE module
and select the patient they are placing the order for. They write out the order and
after successful review and modification, the order will be placed [25]. Here is an
example of a typical care provider order entry form:

While the CPOE’s methodology depends on the clinician’s specific domain,


it is generally believed that allowing the physician to place an order and
then providing feedback if the order is believed to be incorrect is the best
way of handling care provider order entry. There are two reasons why this
is preferred. One is that waiting on warning the physician of an
inappropriate order until after it has been placed allows him/her to devise
his/her own preferred course of action, discouraging overreliance on
CDSS. The other reason is that a delay in warning the physician gives
him/her the opportunity to correct any errors the system has detected.
Whereas earlier warnings might underscore the errors and leave more room
for mistakes.

In general, CPOE responsiveness depends on creating orders at the appropriate


clinical level (i.e., the clinician’s level of expertise and the user’s specific
condition). Unfortunately, because physicians and nurses generally have different
ways of viewing these orders than the people carrying them out (pharmacists,
radiologists, etc.), there tends to be confusion between the more general order of a
physician and the corresponding technical terms for its content by whichever
ancillary departments he/she consults. The accepted solution to this problem is for
CPOE systems to avoid asking clinicians to perform tasks that fall outside their line
of expertise. Pharmacists, for example, typically use pharmaceutical systems to fill
and dispense whatever is specified in the CPOE system. If a higher level order is
specified by the physician, the CPOE system could evaluate the pharmacy’s own
terminology and floor stock inventory and then determine the correct item to give
the patient, giving the pharmacist more time to evaluate factors such as the order’s
clinical validity, safety, and efficiency [7].
Roles of Decision Support within CPOE—Decision support has several roles in
CPOE [25]:
1. Creating legible, complete, correct, rapidly actionable orders: A CPOE system is able to
avoid many of the traps/failings that often come with handwritten reports [26]. For example,
illegibility and incorrectness. Improved legibility is able to both reduce errors
and reduce the amount of time clinical staff spends deciphering handwriting.
Meanwhile, a “complete” order contains all necessary information to
successfully place an order, while a “correct” order meets the requirements for
safe and effective patient care. Needless to say, most CPOE systems are
designed to ensure that both conditions are satisfied.
2. Providing patient-specific clinical decision support: A successful CPOE system should be
able to generate decision support recommendations based on a patient’s individual conditions.
It should be able to generate a safety net for the clinician by merging patient-specific infor-
mation (age, allergies, existing medications, etc.) with the general rules for proper practice.
It should also improve patient care by promoting evidence-based clinical practice guidelines
through factors such as order history or computer-based advice.

3. Optimizing clinical care: As the clinicians becomes accustomed to a CPOE system, they con-
sider ways of customizing it so that their work becomes easier and more effective. Not only
does this cater the system to the user’s liking, but it could reduce the potential for violations
such as inappropriate testing. For example, at Vanderbilt University, users of a system called
WizOrder were encouraged to modify the program so that they could create Registry Or- ders
where billing information would be more easily transferred. The challenge, in this case, comes
from the need to improve the effectiveness of the system while maintaining usability. Thus, it
is generally left up to the user to design a system that is able to successfully balance these two
issues.
4. Providing just-in-time focused education relevant to patient care: Most CPOE systems pro-
vide useful educational prompts and links to more detailed description about their material,
with the interface designed in a manner that encourages their use. These can be used in treat-
ment summaries or through a corresponding web browser. Such links have the benefit of
assisting the clinician with more complex orders.

Benefits and Challenges—The benefits of CPOE systems are that they can improve
clinical pro- ductivity, provide solid educational support, and positively impact
how patient care is given. They also make order entry much easier for both the
clinician and the user, providing a computerized framework for placing orders.
Thus, issues such as sloppy handwriting are nonexistent, while typos may be
corrected through a built-in autocorrect feature. On the other hand, the manner in
which error checking is handled may result in placing the orders containing
unidentified errors. This could be especially dangerous if the order happens to be
costly and critical to the patient’s survival. If there is an error in it, then whatever
money was spent on the order may get wasted. Worse yet, the patient’s life may be
in danger. Computerized order entry systems also have the disadvantage of re- lying
on an Internet-based framework, meaning occasionally bad transmissions and server
problems are inevitable.

Diagnostic Decision Support


Diagnostic Decision Support Systems are designed to “diagnose” diseases and
conditions based on the parameters given as the input. In formal terms, diagnosis
can be defined as “the process of determining by examination the nature and
circumstances of a diseased condition [27].” What this means is that clinicians study
the patient’s life history before the illness has begun, how the illness came to be,
and how it has affected the patient’s current lifestyle [28]. Additionally, clinicians
must ensure that the patient recognizes the seriousness of the disease and how to
properly treat it.
Diagnostic Decision Support Systems attempt to replicate the process of diagnosis in a com-
puterized format. The patient is asked a series of questions, and then a hypothetical diagnosis or set of
possible diagnoses is output to him/her. The most user-centered systems give questionnaires inquiring
about everything from the patient’s family history to the patient’s current health condi- tions. Upon
completion, the patient is given a printout summarizing the conclusions drawn by the system and then
suggesting possible courses of action. Similarly, there are certain medical web- sites sometimes offering
diagnostic tools for assessing patients and recommending possible courses of treatment. A good example
is Mayo Clinic’s depression test [29]. It asks the patient to answer a series of questions relating to
symptoms, family history, etc. (Figure 19.2) It then uses the an- swers to determine whether it would be
a good idea to consult a professional psychiatrist for further examination.

FIGURE 19.2: The scoring criteria for Mayo Clinic’s depression test. It explicitly states that it is not
meant to be used as a diagnostic tool.

An organization known as the Foundation for Informed Medical Decision Making (FIMDM)1 has
worked to expand upon the traditional diagnostic decision support process by focusing primarily on
treatment decisions that take into account the patient’s personal preferences in terms of health outcomes.
Specifically, they use video clips to depict the possible outcomes of each treatment, giv- ing the patient
an idea of what the experiences relating to these outcomes will be like and better preparing the patient
for the clinical decision-making process. FIMDM provides tools for many diseases, ranging from breast
cancer to coronary artery disease. Offline CD ROM-based software also exists for diagnostic decision
support. Interestingly, in some instances, such software actually provides deeper and more detailed
diagnostic information than what is available on the World Wide Web. For example, the American
Medical Association has the “Family Medical Guide.” This is a multilevel software package consisting
of seven different modules:
1. A listing of possible diseases, disorders, and conditions.
2. A map of the human body.

3. A symptom check for the purposes of self-diagnosis and/or hypothesizing.


4. A description of the ideal body.

You might also like