0% found this document useful (0 votes)
10 views

DATAMINING AND DATAWAREHOUSEAN IN-DEPTH REVIEW

Data warehousing and data mining are crucial aspects of modern businesses. Data mining is the process of identifying patterns in data and using these patterns to derive useful information. A data warehouse is a database applications system designed to report and analyze data. Data warehousing is a vital building block in the decision-making process. The secondary aim is to analyze the major success and failure factors in data warehousing projects, as well as to study the potential challenges for

Uploaded by

Tolulope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

DATAMINING AND DATAWAREHOUSEAN IN-DEPTH REVIEW

Data warehousing and data mining are crucial aspects of modern businesses. Data mining is the process of identifying patterns in data and using these patterns to derive useful information. A data warehouse is a database applications system designed to report and analyze data. Data warehousing is a vital building block in the decision-making process. The secondary aim is to analyze the major success and failure factors in data warehousing projects, as well as to study the potential challenges for

Uploaded by

Tolulope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/378148534

DATA MINING AND DATA WAREHOUSE AN IN-DEPTH REVIEW

Article · December 2023

CITATIONS READS
2 1,502

1 author:

Zaiba Khan
RNB Global University
25 PUBLICATIONS 24 CITATIONS

SEE PROFILE

All content following this page was uploaded by Zaiba Khan on 12 February 2024.

The user has requested enhancement of the downloaded file.


ISSN: 2096-3246
Volume 55, Issue 02, December, 2023

DATA MINING AND DATA WAREHOUSE: AN IN-DEPTH REVIEW

Ms Zaiba Khan
Assistant Professor, RNB Global University, Bikaner, RJ
Abstract:
Data warehousing and data mining are crucial aspects of modern businesses. Data mining is the
process of identifying patterns in data and using these patterns to derive useful information. A data
warehouse is a database applications system designed to report and analyze data. Data warehousing
is a vital building block in the decision-making process. The secondary aim is to analyze the major
success and failure factors in data warehousing projects, as well as to study the potential challenges
for the vendors, consultants, implementers, and the academic researchers to delve into it so that future
strategies can be developed for the successful implementation of data warehousing in organizations.
It is the recommendation of the group that changes in philosophy and policy must match changes in
the environment and changes in system design and procedures. Other recommendations include the
creation of a single approval plan for electronic and print resources, and establishment of a system
for notification of electronic serial purchases.
Keywords: Data Warehouse, Data Mining, Vendors, System Design, Environment
1. Introduction
Data warehousing and data mining are crucial aspects of modern businesses that are similar
conceConcepts and systems. They are both used to offer value to organizations and their data in
varied ways. A data warehouse is a database applications system that is designed to report and
analyze the contents of a series of different databases. In other words, a data warehouse is only as
good as the information that is put into it, and extracting good information from a data warehouse is
also relative to the data input into it. Are crucial aspects of modern businesses that are similar
concepts and systems.
1.1. Definition of Data Mining
Data mining is a systematic procedure utilized to convert unprocessed data into valuable knowledge.
A pivotal aspect of data mining involves creating an algorithm that can efficiently analyze extensive
quantities of information to identify patterns.
1.2. Definition of Data Warehouse
A data warehouse is a system used for reporting and data analysis and is considered as a core
component of a business intelligence environment. Due to the varying definitions of this system, it
is considered as a vital pivotal building block in the decision-making process within an organization.
Data in the data warehouse system is not frequently updated, therefore smaller organizations may
not have a data warehouse unless they have other analytical systems available. Data warehouses are
systems that are designed for query and analysis rather than transaction processing, and they usually
contain large amounts of historical data.
1.3. Importance of Data Mining and Data Warehouse
Today, in this fast-moving world, we have an information overload and access to this information in
a quick and reliable manner is of utmost importance to companies, so that decisions can be taken

720
Ms Zaiba Khan 2023 Advanced Engineering Science

without delays. With the tremendous amount of data available, it is a mammoth task to process and
understand the data. This is where data mining and data warehousing step in. Data mining is the
process of identifying patterns in the data and using these patterns to derive useful information,
whereas data warehousing is the process of moving data to a central location and making this data
available to users. Thus, it is no wonder that these two technologies are so critical to an organization,
irrespective of the size of the organization.
2. Data Mining Techniques
In the study of data mining, techniques for extracting useful information from large data sets have
been the focus of work for the last several decades. Among the techniques that are used for data
mining are association rule mining, classification and prediction, clustering, sequential pattern
mining, and text mining. Each and every one of these techniques is used for extracting specific types
of information from data sets in a useful way that can assist in business decision making. Of these
techniques, one of the leading techniques is association rule mining, which is finding co-occurrence
relationships within large sets of data. Additionally, classification and prediction are two very
important techniques that are not only used for mining data, but also for constructing models. Even
though large data sets have increasingly become a focus of businesses, there are still a large number
of aspects of these techniques that have yet to be covered. Many new findings have been found in
the use of these techniques, and new ways of implementing them in useful ways are still being found.
Additionally, the actual application of these techniques to the business decision making process is
still not very well understood and requires further study as well.
2.1. Association Rule Mining
To do association rule mining typically requires three steps: first, we must find all itemsets that are
present in a transaction. For example, the itemset {Bread, Butter} is present in the transaction {Bread,
Butter, Milk}. Next, we must generate association rules from the discovered itemsets and prune the
set. Finally, we evaluate the remaining rules and validate them to involve only rules that satisfy the
user's interest. If the user is particularly interested in rules that are strongly associated, we may
generate stronger rules than the ones that are already present. One may now be interested in exploring
other possibilities: how some rules evolve after a period of time, the frequency in which the rules are
satisfied and so on.
2.2. Classification and Prediction
A classifier is a data mining model which is used to discriminate one of a small number of classes.
It could be used to distinguish customers who are likely to buy goods or those who are not. Classifier
is built from a series of transactions or objects which are divided into two sets. First these are the
training set which is from which the classifier is built and second is the test set which is used to
measure the classifier performance. The classifier does its task by after to learning from a series of
transactions. It has some technicians of the learning from the vast quantity of transaction.
2.3. Clustering
Clustering is a method which only uses the input data set and tries to correlate the internal patterns.
It is an unsupervised learning technique used to segregate the data sets into groups. The main
objective of clustering is to segregate groups with similar traits and imply a relationship to achieve
the defined intent. This is achieved by linking the data sets based on analysis of the internal patterns.
721
Ms Zaiba Khan 2023 Advanced Engineering Science

This method is implemented in products like supermarkets and eBooks, etc., to implement marketing
strategies.
2.4. Sequential Pattern Mining
There are many data sets for which the attributes have a definite ordering, and a variety of sequential
data analyses can be performed on such kinds of data. For example: in biology, bioinformatics is
becoming a field where sequence analysis is becoming a very crucial attribute and Sequential pattern
mining plays a crucial part of this process. For example, in the above-mentioned DNA sequence,
finding the frequency of the occurrence of the subsequence TCT can be a problem solved through
Sequential pattern mining algorithms. To conclude, we can say that given the evolving nature of the
field of data mining and the increasing number of problems that can be defined and solved through
sequential data analysis, sequence data mining is a field that has a very promising future.
2.5. Text Mining
The text mining process is composed of three main steps: dealing with unstructured data, deriving
the pattern and finally the evaluation. After the data collection, the unstructured text data must be
pre-processed before mining. This is because raw unstructured data is usually redundant, noisy,
vague and does not match the necessary format and structure. Some techniques used to process text
data include stemming, stop word removal, synonym recognition and document summarization.
These techniques are briefly explained in the next section.
3. Data Warehouse Architecture
The data warehouse architecture includes data sources and extraction, data transformation and
integration, data storage and management, metadata, and querying. The data warehouse collects and
integrates data from multiple operational systems and stores it in a single coherent system. The most
common data sources include transactional and non-transactional systems. Operating data
warehouses create frequent data extractions from the above-stated systems. Data extraction methods
include full extraction that involves the data warehouse extracting all the data at once and incremental
extraction that only extracts new data or data that has been modified. Data extraction is followed by
data transformation where the raw data is transformed into the appropriate format that is suitable for
the data warehouse. Data transformation activities include data cleaning that is aimed at improving
the quality of data, data transformation, data summarization, data aggregation, data integration and
data consolidation.
3.1. Data Sources and Extraction
Given the nature of the Data Warehouse and the decision-support queries that it needs to support, it
is easy for one to assume that only processed data should be stored within its walls. However, as
much as traditional OLTP databases are built to ensure data integrity at the point of entry, the process
of loading data into the Data Warehouse is less controlled and more prone to errors. It is during this
process that data can be transformed. Sources of data for the Warehouse can include operational
databases, external data, legacy data, and flat files. Majority of data in most traditional Warehouse
installations comes from the organization’s operational databases. Other data sources that supply
data to the Warehouse on a regular, periodic schedule can be from applications built outside the
organization. Data from legacy systems or from flat files can also be sources of data for the Data
Warehouse.
722
Ms Zaiba Khan 2023 Advanced Engineering Science

3.2. Data Transformation and Integration


Data Transformation and Integration Data transformation takes information from the staging and
formats it according to the provided design. This is particularly useful in cases where outside groups
want to access pieces of data from different data marts. In the past, employees would have had to
physically access these data more storage areas and duplicated storage systems to organize and output
information to individuals who need it. The final portion of the process is to form aggregates for
improved performance and faster processing of data. Data Integration generally takes place after the
data has been extracted and transformed. The ultimate goal is to remove data redundancies and save
storage space as well as to ensure that the warehouse meets business specifications. Individual
records, images, audio clips, video clips, invoices and E-mails are just a few examples of different
data types and media that a business might want access and control over.
3.3. Data Storage and Management
The arriving data is stored in the central storage of the Data Warehouse called Data Warehouse
Database. This is the incremental contents of the source data. If there is an error or problem that
occurs during the data storage because of its huge size and regularly increasing format, it can be
burdened to the user. For that case, the system should be able to send a warning about the problem.
A statistical analysis can be used to notice the error about the data storage. Hunt algorithms can be
used to control the granularity of the data. By doing that the data storage can automatically be
managed.
3.4. Metadata and Querying
The metadata consists of the operational metadata and the business metadata of the data warehouse.
The operational metadata is the system generated data that is required to manage the warehouse
operations effectively. This typically includes the data usage statistics, data extraction and
transformation logs, data cleansing and scrubbing activity data, refresh statistics and log records, and
the data quality statistics. Business metadata is the data that is used to understand the business and
usage of the data. This typically includes the source data dictionary, data transformation rules, the
business rules, the warehouse table descriptions, the package level metadata and metrics, and the
reports and queries specifications.
4. Data Mining Process
The data mining process involves six stages: problem definition and data collection where the
understanding of the problem and collecting relevant data is the first stage. Data preprocessing and
cleaning is the next stage where all the data is sorted and cleaned. Then comes data transformation
and reduction, where the available data is transformed and reduced in a suitable format that is helpful
in the next stage. Model building and evaluation is carried out, where the models are built and then
evaluated depending on the selected data – the next main stage. Then a major stage called deployment
and interpretation is carried out. The realized knowledge will be handled by the decision maker; if it
satisfies the previous decision, and if the answer is yes then the knowledge discovery process will
end. If the answer is no, then one has to go back to the desired task and then perform data mining
technique again.
4.1. Problem Definition and Data Collection
However, data collection is one of the first steps that need to be undertaken to assure that the data
723
Ms Zaiba Khan 2023 Advanced Engineering Science

collected is going to be for the use of the analysis and integrating it properly into any kind of database
requires a peek. Once understanding what and where data is required, managers can then see if the
same data is traversing to the right location for the right purpose. The following chapter intends to
bring forth the significance of exploiting the methods of data warehousing and data mining to realize
the results.
The concern for data collection, storage, and management has gained substantial interest in recent
years yet a robust method that aims to work maneuver around the tools and techniques available
remains an accomplished goal awaiting fulfillment by data engineers and analysts. As a result, with
some observation of what, where and why data is to be collected, would assure that the best method
is inculcated to ensure its completeness and understandability for analysis.
4.2. Data Preprocessing and Cleaning
The main discrepancies in the data are because of human errors or problems in the technology.
Sometimes, as in credit card billing, the human mistakes seem to be typographical errors- typing
right numbers into the wrong account! Here we have to tackle the issues like a. Missing data b. Noisy
data c. Inconsistent detained data.
4.3. Data Transformation and Reduction
Data transformation is the third process in the data mining operation which transforms the data into
required forms suitable for mining purposes. The major operations performed in this process are
discrete wavelet transform and principal component analysis. The generated data may not be fully
utilized for important information and hence the information must be processed and the data volume
compressed. It is of two processes such as data reduction and data discretization.
4.4. Model Building and Evaluation
In supervised learning approach, we need to build a hypothesis before data is input into the model.
Once hypotheses of learning have been chosen then we input the data that relates to hypothesis of
learning. This forms the building data process. In the data evaluation process, we took the evaluation
data and validated it against the initial hypothesis. If the evaluation is successful, we input the data
from model data. Results from both the building and evaluation data are then used in inferences; for
instance, this could be in the form of rules. The interpretation of results from the model is important
as it allows us to draw patterns and relationships from our data. This forms the basis for decision-
making.
4.5. Deployment and Interpretation
Finally, the deployment process will take the resulting model from the building phase and apply it to
new data to generate predictions. A series of scripts or coding will be written here. It normally
involves extracting data from a set source, spoiling the data, and then predicting with the model.
When the CRISP-DM process has reached the final phase of modelling there are predictions which
have been created and the client is not entirely ready to validate the findings and value. The
interpretations aspect of deployment is a challenging process, where the outcome of small
misinterpretations can be extremely costly; thus, the findings must be presented at this phase. This
is a very important phase where only a validation report is given to the data-mining team which
solidifies that the model has actional satisfactory results. The main action at this stage is to fully
understand the concept of the report that is delivered as ends and TOE reports which maximise
724
Ms Zaiba Khan 2023 Advanced Engineering Science

productivity. These meetings are crucial as they enable the data mining group to gain a detailed
understanding from the recipients of the reports and therefore any misinterpretations that result can
rectify.
5. Applications of Data Mining and Data Warehouse
In data mining, association rules in data mining are sets of rules that seek to find useful information
in each dataset. ... The main benefit of customer relationship management tools is that they are
implemented to collect data and share it among different departments. Firms that implement them
well often experience a high customer retention rate and thus, a higher value for their customer base.
Pattern-seeking algorithms that seek to improve customer revenue recognition or drive down
incentives for customer switching are possible using data-mining techniques, along with the
maximization of lifetime customer value and generation of cross-sell and up-sell opportunities where
the customer would normally only have a single product relationship with the firm.
5.1. Customer Relationship Management
Customer relationship management (CRM) involves all aspects of interaction that a company has
with its customers. CRM is a combination of technology, business processes and procedures used by
various departments in an organization to identify, acquire, support, and retain customers. CRM is
accomplished by collecting and managing data of the customer. The stake of CRM solution relies on
data mining techniques, which mine the data about the customers to improve business relationship.
The data that has been mined can be effectively used for increasing sales to the customer or retaining
an existing customer or to find potential customers.
5.2. Fraud Detection and Prevention
Entering the topic of "Fraud Detection and Prevention", data mining is an indispensable part of
discovering the fraudulent activities in different areas of business. In today's fast-paced world,
organizations are under constant threats from fraudsters. Fraudsters are not only from the outside of
the organization but also can be working within an organization. Detecting and preventing these
unlawful activities is a major concern for any organization. Data mining technologies can be used
for the detection and prevention of fraud. Modeling, Reporting, Neural Network, Genetic Algorithm
and Decision Trees are the data mining technologies that can be used to detect and prevent fraud.
5.3. Market Basket Analysis
This next application involves tracking products that are purchased together. Market basket analysis
will show that if customer A buys product 1, then he will also buy product 2 with a likelihood of
80%. This is a simplified version of market basket analysis. The use of data mining involves coming
up with rules or algorithms that can uncover relevant patterns and relationships between the stored
transactions in the data warehouse. This is very complex, and organizations are only now beginning
to tap into the potential of this application.
5.4. Healthcare and Medical Research
Telemedicine is increasingly becoming popular in medical research. Data mining is one of the tools
used for this purpose. In this process a group of individual profiles are used to identify the common
behavior of the population which later is compared with that of several patients suffering from a
specific critical issue or disease. The comparison helps in identifying the tacit patterns and behaviour
that would have led to the present condition of the patient. Analysis and profiling have helped in

725
Ms Zaiba Khan 2023 Advanced Engineering Science

medical research to predict susceptibility of various individuals when diagnosed with such diseases.
This helps the medical professionals to take necessary precautions and measures in advance and thus
work towards preventive health care.
5.5. Social Media Analysis
Social Media Analysis can be used to study trends in content and user actions across social
networking platforms such as Facebook, Instagram, Twitter, and LinkedIn. This is used by
companies to plan and monitor their marketing strategies. Data from social media websites is user-
generated, which means that the data is raw and unstructured. The unstructured nature of this data
usually makes it hard to utilize it just like that right off the bat. However, thanks to advanced Data
Mining techniques, it is possible to extract and analyze useful information from this ever-growing
data.
6. Challenges and Limitations
After successfully implementing a data warehouse system, problems associated with this setup begin
to unfold. Continuously monitoring a data warehouse system becomes one of the largest challenges.
Nonetheless, experience proves that implementing best practices mentioned in this article makes it
relatively easy to identify the problem. Such an undertaking involves setting up more robust
monitoring processes, laying test cases and finding ways of identifying the problem beforehand.
There are predefined tolerances of performance that the warehouse data must meet and if not, an
automatic alert must be shown.
6.1. Data Privacy and Security
Data specialists should address privacy and security concerns in the warehousing process. At the
storage phase, privacy concerns must be addressed to protect individuals' information from
unauthorized access. Data mining specialists must ensure data privacy to prevent mishandling of data
traced back to individuals. Misuse of information during the analysis phase can have dire
consequences for an organization. Privacy and security concerns remain during the distribution
phase, where authorization and authentication are required to prevent unauthorized access. Data
security involves protecting databases, storage, retrieval, and warehousing technologies from
unauthorized access and attacks. Encryption technologies, access control, authorization, firewalls,
and digital signatures are used in the process. Managerial strategies can be implemented to uphold
data privacy and security, including investigations into individuals' information and analyses of
storage, warehousing, and retrieval technologies. User accounts can be used as gate-passes for
employees to access data related to their work environments, addressing security concerns related to
privacy.
6.2. Scalability and Performance
Given that data scientists frequently deal with extensive amounts of data to test algorithms and
models, scalability is a crucial factor for efficiently collecting and analyzing large datasets within a
limited timeframe. The capacity of a distributed process to handle memory, as well as its ability to
distribute and parallelize tasks, significantly impacts the performance of these processes, making
them essential requirements for achieving scalability. When confronted with an extensive number of
clusters and classes, clustering and classifying large datasets can pose significant challenges.
6.3. The comprehension and elucidation of information

726
Ms Zaiba Khan 2023 Advanced Engineering Science

Explanation and interpretability refer to the degree to which the results of data mining can be easily
understood by end users. The black-box problem involves complex models that are not able to
provide explanations or justifications of the conclusions that they draw from the information in the
data. Data mining gives rise to the "curse of dimensionality" when high-dimensional data is used,
that is data with many variables. Explanation facilities can partially address this by allowing users to
identify and locate areas of interest that require further investigation. The use of metadata is
important in the interpretability of the data. Effective use of metadata provides users with important
information about the content and context of the data.
6.4. Ethical Considerations
Given the rather invasive use of personal data that can be involved in data mining and the nature of
big data and the potential that it possesses, a range of ethical considerations need to be made in the
search for and extraction of relevant information. These ethical considerations include Informed
consent, Anonymity, Confidentiality, Offensive content, Security, Legal issues.
7. Future Trends and Innovations
There are even more revolutionary technologies, which mainly orchestrates around data mining
processes. In the present state of affairs, the data mining process is becoming more embroiled with
business intelligence techniques. This evolution is interestingly elemental in dictation of strategies
for future businesses. The usage of data mining is growing multifariously in every sector and that in
itself is a hard fact that demands a greater influence turf its recent developments. There are various
speculations about the future of data mining, but some of the most significant ones are: it will be
predominantly used in various segments of internet searches. Data mining will also be seen as an
integral part of global technologies.
7.1. Big Data and Data Mining
Big data and data mining, data warehouse and related technologies play an important role in business
data analytics. These technologies have traditionally been used in support of making data-based
decisions at managerial level by tactical and strategic decision makers. However, future trends
include targeting operational decision making using more real-time data analysis. Digital marketing,
in particular, is likely to use data mining and big data technologies to tailor marketing messages and
offers to individual customers. Customer shopping data can be used to promote timely and relevant
purchase suggestions to customers in real-time via mobile applications and social networking sites.
7.2. Artificial Intelligence and Machine Learning
Artificial intelligence and machine learning: As the world moves towards processing and evaluating
ever larger amounts of information, this is being done more intelligently than ever before. The
growth, sophistication and application of artificial intelligence and machine learning are having a
profound mental impact on our approach to managing data. This is likely to be the most significantly
impacting innovation in the years to come. It is likely that an aspect of artificial intelligence will be
embedded in the underlying infrastructure of data systems and would carry out tasks which were
previously done manually. For instance, an AI could monitor thousands of systems across a city and
be able to identify faults and divert loads within seconds. It could also automatically study and learn
an optimal preparation based on regional history or expected weather changes.
7.3. Deep Learning and Neural Networks

727
Ms Zaiba Khan 2023 Advanced Engineering Science

Neural network technology imitates the functioning of the human brain, using different layers to
learn and draw conclusions. Deep learning, a subclass of neural networks, is used for analyzing
unstructured data with hundreds to thousands of layers. There are four key differences between
traditional machine learning and deep learning. Cloud computing is growing due to the integration
of big data, providing resources for learning vast amounts of information. Sophisticated algorithms
and code are improving to rapidly process large datasets. Cloud applications for big data include
moving data across centers, hybrid computing, and enhancing security. Non-traditional data mining
in healthcare, such as epigenetics and genome interpretations, continues to advance. Computational
mathematics aids in understanding complex DNA and molecular biology.
8. Case Studies
We will be discussing three case studies from different sectors where the implementation of data
warehouse and data mining has reaped immense benefits for these organizations. The first case study
is from the retail industry, second from the financial sector, and the third one is from the healthcare
industry. Each case study is unique in the way in which data mining and data warehousing have been
implemented and our approach is to understand the business need, the nature of data captured and
the objectives of capturing and analyzing the data.
8.1. Case Study 1: Retail Industry
The businesses in the Retail industry have plentiful of data and even various sources from a
consumer’s point of purchase (POP) to the business standpoint (such as inventory transactions). A
Data warehouse can gather and keep all such data safe and forms the required relationships from all
this data. Data mining can then use these relationships to perform desired actions on the data and
eventually execute the results.
The Retail industry is the area of consumerism and high competition where everything can take an
organization to the top or can pull it down with the flow. Marketing strategies are indeed the ramp
for the Retail industry, and we cannot logically afford to keep marketers creating vague, unclear,
feebly dependent strategies to cater the consumers. Marketing, to be precise, is the only wing that
can touch the ground of not only the target audience but all the possible audience that have the caliber
and money to become a part of Retail industry’s success-run.
8.2. Case Study 2: Financial Sector
Over 30 million transactions will be recorded per month across various methods such as teller
systems, ATM, online banking, scheme transactions, etc. a minimum of 500 attributes will be
captured in the transaction. The dimension will hold the various attributes captured during
transaction. Member data such as name, customer number, designation, customer number, date of
birth, Social Security number will be part of the entity table. The transaction amount, the date of
transaction and other transaction related field will be a part of the fact table. A prime – ID will be
created to control the integrity of the entire system. There will be a period cycle run every month to
update and reconcile the various attributes. Data will be loaded through the Data stage to the
Warehouse and will be made available to the entire bank who uses appropriate reporting tools.
8.3. Case Study 3: Healthcare Industry
This case study is based on one of the largest American Healthcare companies which offers
pharmacy, health insurance, and wellness retail chain services. The data collected from each of these

728
Ms Zaiba Khan 2023 Advanced Engineering Science

various chains were in silos and never linked or used together. Once the data was integrated into one
single data warehouse, it could be used to perform analytics to generate insights that were never
feasible before. Insights were being generated to find out top selling drugs in each city, the rise in
the use of generic drugs, how patient compliance to drugs differed and the effects of health awareness
advertising on sales of drugs. The insights were also projected against secondary data like council
births, which was taken as a proxy indicator for population and an increase in prescription fills. While
narrative text and visual summaries were used to present the insights, instead of presenting numbers,
natural language generation was used to convert numbers into meaningful stories that could be
communicated.
9. Conclusion
Most of the emerging technologies such as web, e-commerce, and telecommunication companies are
becoming more open and interconnected. Hence, it is in the advantage of players of industries and
commercial sectors to view the impact of each of these technologies and look for business
opportunities supplementary to such changes. The data warehouse and data mining technologies have
immense opportunities to assist in the advancement of these industries, yet more exploration is
necessary to understand and institute the same.
The primary goal of the literature is to identify and explore the emergence of data warehousing as a
strategic tool for data storage. The secondary aim is to analyze the major success and failure factors
in data warehousing projects, as well as to study the potential challenges for the vendors, consultants,
implementers, and the academic researchers to delve into it so that future strategies can be developed
for the successful implementation of data warehousing in organizations.
9.2. Implications and Recommendations
In order to address the shortage of necessary resources, the university library and outside researchers
started a one-year project to study the issues and make recommendations that focused attention on
the creation of new alternative agreements and the redesign of interlibrary lending practices. Inter-
library agreements, such as the Illinois Compact, must be thoroughly reviewed, and new directions
must be considered and suggested to the Illinois State Library and the Illinois board of Higher
Education. It is the recommendation of the group that changes in philosophy and policy must match
changes in the environment and changes in system design and procedures. Other recommendations
include the creation of a single approval plan for electronic and print resources, and establishment of
a system for notification of electronic serial purchases.
References:
1. Al-Okaily, Aws, et al. "An empirical study on data warehouse systems effectiveness: the case of
Jordanian banks in the business intelligence era." EuroMed Journal of Business 18.4 (2023): 489-
510.researchgate.net
2. Armbrust, Michael, et al. "Lakehouse: a new generation of open platforms that unify data
warehousing and advanced analytics." Proceedings of CIDR. Vol. 8. 2021.cmu.edu
3. Li, Linze, and Jun Zhang. "Research and analysis of an enterprise E-commerce marketing system
under the big data environment." Journal of Organizational and End User Computing (JOEUC) 33.6
(2021): 1-19.igi-global.com

729
Ms Zaiba Khan 2023 Advanced Engineering Science

4. Seth, Bijeta, et al. "Integrating encryption techniques for secure data storage in the cloud."
Transactions on Emerging Telecommunications Technologies 33.4 (2022): e4108.HTML
5. Abadi, Daniel, et al. "The Seattle report on database research." ACM Sigmod Record 48.4 (2020):
44-53.HTML
6. Naeem, Muhammad, et al. "Trends and future perspective challenges in big data." Advances in
Intelligent Data Analysis and Applications: Proceeding of the Sixth Euro-China Conference on
Intelligent Data Analysis and Applications, 15–18 October 2019, Arad, Romania. Springer
Singapore, 2022.HTML
7. Bharadiya, J. P. "A comparative study of business intelligence and artificial intelligence with big
data analytics." American Journal of Artificial Intelligence, 2023.researchgate.net
8. Wang, Jin, et al. "Big data service architecture: a survey." Journal of Internet Technology 21.2
(2020): 393-405.ndhu.edu.tw
9. Plotnikova, V., Dumas, M., and Milani, F. "Adaptations of data mining methodologies: A systematic
literature review." PeerJ Computer Science, 2020.peerj.com
10. Sunhare, Priyank, Rameez R. Chowdhary, and Manju K. Chattopadhyay. "Internet of things and data
mining: An application oriented survey." Journal of King Saud University-Computer and
Information Sciences 34.6 (2022): 3569-3590.sciencedirect.com
11. Taranto-Vera, Gilda, et al. "Algorithms and software for data mining and machine learning: a critical
comparative view from a systematic review of the literature." The Journal of Supercomputing 77
(2021): 11481-11513.researchgate.net
12. Banimustafa, A. and Hardy, N. "A scientific knowledge discovery and data mining process model
for metabolomics." Ieee Access, 2020.ieee.org
13. Huang, M. J., Sung, H. S., Hsieh, T. J., Wu, M. C., and Chung, S. H. "Applying data-mining
techniques for discovering association rules." Soft Computing, 2020.HTML
14. Nahar, K., Shova, B. I., Ria, T., and Rashid…, H. B. "Mining educational data to predict students
performance: A comparative study of data mining techniques." Education and …, 2021.HTML
15. Ageed, Zainab Salih, et al. "Comprehensive survey of big data mining approaches in cloud systems."
Qubahan Academic Journal 1.2 (2021): 29-38.qubahan.com
16. Eggers, J. and Hein, A. "Turning Big Data into Value: A Literature Review on Business Value
Realization from Process Mining.." ECIS, 2020.researchgate.net
17. Zhao, Y., Zhang, C., Zhang, Y., Wang, Z., and Li, J. "A review of data mining technologies in
building energy systems: Load prediction, pattern identification, fault detection and diagnosis."
Energy and Built Environment, 2020.sciencedirect.com
18. Wu, Wen-Tao, et al. "Data mining in clinical big data: the frequently used databases, steps, and
methodological models." Military Medical Research 8 (2021): 1-12.springer.com
19. Gupta, M. K. and Chandra, P. "A comprehensive survey of data mining." International Journal of
Information Technology, 2020.HTML
20. Yan, H., Yang, N., Peng, Y., and Ren, Y. "Data mining in the construction industry: Present status,
opportunities, and future trends." Automation in Construction, 2020.HTML

730
Ms Zaiba Khan 2023 Advanced Engineering Science

21. Liu, J., Shi, D., Li, G., Xie, Y., Li, K., Liu, B., and Ru, Z. "Data-driven and association rule mining-
based fault diagnosis and action mechanism analysis for building chillers." Energy and Buildings,
2020.HTML
22. Santoso, M. H. "Application of Association Rule Method Using Apriori Algorithm to Find Sales
Patterns Case Study of Indomaret Tanjung Anom." Brilliance: Research of Artificial Intelligence,
2021.itscience.org
23. Kusak, Lutfiye, et al. "Apriori association rule and K-means clustering algorithms for interpretation
of pre-event landslide areas and landslide inventory mapping." Open Geosciences 13.1 (2021): 1226-
1244.degruyter.com
24. Guo, Y., Wang, N., Xu, Z. Y., and Wu, K. "The internet of things-based decision support system for
information processing in intelligent manufacturing using data mining technology." Mechanical
Systems and Signal Processing, 2020.fardapaper.ir
25. Bramer, M. "Principles of data mining." 2020.hoasen.edu.vn
26. Wu, Z., Li, C., Cao, J., and Ge, Y. "On scalability of association-rule-based recommendation: A
unified distributed-computing framework." ACM Transactions on the Web (TWEB),
2020.researchgate.net
27. Saxena, Akash, and Vikram Rajpoot. "A comparative analysis of association rule mining
algorithms." IOP Conference Series: Materials Science and Engineering. Vol. 1099. No. 1. IOP
Publishing, 2021.iop.org
28. Nasr, Mahmoud, et al. "An efficient algorithm for unique class association rule mining." Expert
Systems with Applications 164 (2021): 113978.academia.edu
29. Bao, F., Mao, L., Zhu, Y., Xiao, C., and Xu, C. "An improved evaluation methodology for mining
association rules." Axioms, 2021.mdpi.com
30. Hikmawati, E., Maulidevi, N. U., and Surendro, K. "Minimum threshold determination method based
on dataset characteristics in association rule mining." Journal of Big Data, 2021.springer.com
31. Hasan, Mohammad Kamrul, et al. "Fischer linear discrimination and quadratic discrimination
analysis–based data mining technique for internet of things framework for Healthcare." Frontiers in
Public Health 9 (2021): 737149.frontiersin.org
32. Chen, Chen, et al. "Automated arrhythmia classification based on a combination network of CNN
and LSTM." Biomedical Signal Processing and Control 57 (2020): 101819.HTML
33. Kutlu, H., Avci, E., and Özyurt, F. "White blood cells detection and classification based on regional
convolutional neural networks." Medical hypotheses, 2020.HTML
34. Saura, J. R. "Using data sciences in digital marketing: Framework, methods, and performance
metrics." Journal of Innovation & Knowledge, 2021.sciencedirect.com
35. Islam, M. A., Rafi, M. R., Azad, A., and Ovi, J. A. "Weighted frequent sequential pattern mining."
Applied Intelligence, 2022.researchgate.net
36. Manikandan, R., et al. "Sequential pattern mining on chemical bonding database in the
bioinformatics field." AIP Conference Proceedings. Vol. 2393. No. 1. AIP Publishing, 2022.HTML
37. Thareja, P. and Chhillar, R. S. "A review of data mining optimization techniques for bioinformatics
applications." Int. J. Eng. Trends Technol, 2020.researchgate.net

731
View publication stats

Ms Zaiba Khan 2023 Advanced Engineering Science

38. Kumar, Sunil, and Krishna Kumar Mohbey. "A review on big data based parallel and distributed
approaches of pattern mining." Journal of King Saud University-Computer and Information Sciences
34.5 (2022): 1639-1662.sciencedirect.com
39. Kumar, Sunil, Arpan Kumar Kar, and P. Vigneswara Ilavarasan. "Applications of text mining in
services management: A systematic literature review." International Journal of Information
Management Data Insights 1.1 (2021): 100008.sciencedirect.com
40. Hassani, Hossein, et al. "Text mining in big data analytics." Big Data and Cognitive Computing 4.1
(2020): 1.mdpi.com
41. Antons, D., Grünwald, E., Cichy, P., and Salge, T. O. "The application of text mining methods in
innovation research: current state, evolution patterns, and development priorities." R&D
Management, 2020.wiley.com
42. Dhaouadi, Asma, et al. "Data warehousing process modeling from classical approaches to new
trends: Main features and comparisons." Data 7.8 (2022): 113.mdpi.com
43. Rudniy, A. "Data Warehouse Design for Big Data in Academia.." Computers, .techscience.cn
44. Mia, M. R., Hoque, A. S. M. L., Khan, S. I., and Ahamed, S. I. "A privacy-preserving national clinical
data warehouse: Architecture and analysis." Smart Health, 2022.marquette.edu

732

You might also like