0% found this document useful (0 votes)
328 views

Crime Prediction

This research proposal aims to use machine learning techniques to predict crimes in Punjab, Pakistan. The proposal discusses how crime rates are influenced by various socioeconomic factors and how crime prediction can help law enforcement agencies. It also reviews previous literature where machine learning methods like classification, clustering, and deep neural networks have been applied to crime data from various countries with prediction accuracies ranging from 79% to 89.5%. The goal of this research is to develop a reliable crime prediction model for Punjab by analyzing large crime datasets using data mining techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
328 views

Crime Prediction

This research proposal aims to use machine learning techniques to predict crimes in Punjab, Pakistan. The proposal discusses how crime rates are influenced by various socioeconomic factors and how crime prediction can help law enforcement agencies. It also reviews previous literature where machine learning methods like classification, clustering, and deep neural networks have been applied to crime data from various countries with prediction accuracies ranging from 79% to 89.5%. The goal of this research is to develop a reliable crime prediction model for Punjab by analyzing large crime datasets using data mining techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Research proposal

“Crime Prediction Using Machine Learning”


Submitted by:
Ahsan Ali
Registration No. --------------

Submitted to:
Name of Research Supervisor

Faculty of Management and Social Sciences


Abasyn University Peshawar Campus
Ring Road (Charasadda Link), Peshawar
Khyber Pakhtunkhwa
I. Introduction

The biggest threat to humanity is crimes. There are numerous crimes that occur on a regular
basis. Maybe it grows and spreads quickly and widely. From the small village to the big cities,
violence takes place. Crimes of different kind involve theft, abduction, rape, assault, battery,
miscarriage, kidnapping, murder. When crimes are rising, the cases must be resolved much more
quickly. Crime has been growing more dramatically and the police department is responsible for
controlling and increasing crime activity [1]. The police department is facing serious problems
with crime prediction and crime identification because there is a huge amount of crime data.
There is a need for innovation to solve crimes more efficiently.
The issue of crime impacting quality of life and economic growth is a socio-economic one. The
characteristics of the actions of crime depend on the type of community and society. Earlier
research in crime forecasting has found that crime rates are influenced by factors such as
education, deprivation, jobs and climate [2]. Punjab is one of Pakistan most populated
metropolitan areas, ethnically diverse and multicultural.
In the last two decades, Machine Learning has become a key but hidden cornerstone of IT. The
growing amount of data generated daily by individuals and businesses requires smart analysis.
Machine learning is here become an essential component for technological development [4].
In Vancouver, the overall crime rate dropped 1.5% in 2017, while high-vehicle break-ins and
robbery remained a problem [3]. A crime predictive model for predicting property break-ins
crimes was recently introduced by the Punjab Police Department (PPD), whereby the City of
Punjab saw a 27 per cent drop of residential break-ins when they were implemented [5]. Crime
prediction is the most likely crime identification technology to use data and statistical analysis
[5]. In many parts of the world this area has been constantly researched.
Criminal activities impacting quality of life and socio-economic development are prevalent in all
regions of the world. As such, most governments that use advanced technology to solve these
issues are particularly concerned. The Crime Analysis, a sub branch of criminology, investigates
and attempts to classify the signs of criminal activities.
Machine learning deals with data and uses many methods to define data trends that make
predictive analysis very helpful. Legal authorities use various monitoring techniques on the basis
of the knowledge they are able to maintain secure areas. A machine learner is capable of learning
and evaluating a crime pattern based on reports of prior criminal activity and can classify a time,
form or other factor dependent hotspots. This technique is called classification and allows
marginal category marks to be expected. Classification of financial markets, business
intelligence, education, weather forecasting, etc. was used for a number of areas.
The science of making decisions without human intervention is machine learning. Machine
learning in self-driving cars, language recognition, web search, and advanced knowledge of the
human genome were recently employed. It also made it possible to forecast crime on the basis of
related results. Classification is a tracked strategy for predicting marginal class markings.
In several areas classification was used, including weather forecasting, medical treatment,
finance and banking, domestic security and market intelligence [6].
The crime analysis based on machine learning normally includes data collection, classification,
pattern recognition, prediction and visualization. Traditional data mining techniques–analysis of
association, classification and prediction, cluster analysis and outlier analysis identifies
structured data processes, while newer methods identify structured as well as unstructured data
patterns [7]. A prediction model that can predict crime reliably is the main objective of this
research.

II. Problem Statement


Crime prediction and criminal identification are the main problems to the police department as
there are incredible amount of crime data that occur. There is a need of technology through
which the case solving could be faster. The above problem made me to go for a research about
how can solve a crime case made easier. Through many documents and circumstances, it came
out that machine learning and data science can create the work easier and faster.

III. Literature Review:

Since the preference for governments around the world has always been the fight against crime,
many investigations have been carried out to find counteractions and crime indicators effectively
before it happens. Criminologists have sought out hotspots to be identified that the law
enforcement agencies need great attention.
The association between criminal acts and socio-economic indicators, such as unemployment
[8], income level and race, level of education was explored by researchers.
A group of researchers could predict if certain areas of the city of London would become the
crime hotspot by evaluating the use of mobile network infrastructure and population
information[8]. The argument is that confidential information from mobile networks provides
metrics of crime forecasts.
Combining two data sets-1990 US LEMAS and 1995 FBI UCR crime data and the application of
classification methods, such as the Decision Tree and Naive Bayesian method, the analysis of a
crime group in different states of the USA achieved a 83.95% precision [10]. The article will not
however disclose whether the class of crime in question is imbalanced. The same databases have
now also been investigated that employed a number of machine learning algorithms, with a
precision of 89.50 percent for a k-Nearest Neighbor algorithm. They also used Chi-square to
improve the selection of features.
The Finder Sequence, an engine instructor trying to find trends of crime committed by the same
offender or group of offenders, was proposed by Wang et al [11]. Clustering was also used to
research criminal behavior trends and global criminal history.
Remond and Baveja [12] have been investigating the issue of information noise and examining
the forms in which certain police reports or events are odd and have no clear indicative matrices.
These cases were filtered by their current system called Case Based Reasoning (CBR), which
allowed them to better predict with this system than without filters in the data. Social networks
have also been used as possible source of criminal activity indicators. Sadhana and
Sangareddy[13] used twitter data and sentiment analysis to forecast crime in real time. Such data
were also used to chart the frequency of crime incidents and to define broad points of view.
14] In the prediction of the crime hots points in London, UK, human conductor data derived
from mobile network activity combined with the demographics of real crime information were
used. The WEKA, the open source data mining software, and the 10-fold cross-validation
comparisons were conducted in [15]. The data from the 1990 US Census, 1990 US LEMAS
Survey, and 1995 FBI UCR were compiled in the socio-economic, law-enforced and criminal
fields for this study. Different circumstantial factors such as driving conditions, weather, cars and
road conditions are analyzed for road accident trends in Ethiopia[16]. The dataset of 18,288
events was compiled with three separate classification algorithms, KNN, Naïve Bayesian and
Decision Tree. The predictive precision was between 79% and 81% for all three algorithms. The
analysis of large crime data sets correctly and efficiently is a major challenge in crime prediction.
In large crime datasets, data mining is used to rapidly and effectively identify hidden patterns.
The increasing efficiency and reduced errors in the techniques of crime data mining increase the
predictability of crime. In [17] a general framework was established based on the experience of
the University of Arizona Coplink project. Many inquiries into the forecasting of crime were
based on the discovery of hotspots for violence, where the crime rates surpass the average level.
In [18], researchers presented a comparative analysis of algorithms for hotspot maps and
proposed area-specific predictive models using slick data, including the Kernel Density
Estimated (KDE) and Risk Terrasin Modeling (RTM). In [19] the Linear Discriminant Analysis
(LDA) and KNN for crime hotspot prediction, using histogram-based statistical techniques, have
been adopted.
In, the Gamma-test for prediction of Bangladeshi crime hotspots was used to train the Artificial
Neural Network (ANN). In [20], the data-driven machine-learning algorithm was used to
examine drug-related crime data in Taiwan and to forecast new hits based on broken window
theory, spatial analysis, and visualization strategies. In [21] the researchers used an open street
map (OSM) machine learning system for crime forecasting, and geospatial information for
different types of crime in the Province of Nova Scotia (NS), Canada, with the reverse geocoding
strategy and a density clustering algorithm. The feature-level data-fusion model for predicting
Crimes in the City of Chicago, based on the Deep Neural Network (DNN), was suggested in[ 22]
and trained in the spatial, time, environmental and joint representative layers. Various methods
have been explored in crime-prediction [23] and KDD techniques have been proposed as an
effective preventive tool for crime, incorporating statistical modeling, machine learning,
database management and AI software.
The[ 24] Transfer-Learning System for the use of cross domain urban data sets, weather data,
points of interest, human mobility data, and complaint data has been suggested. In[ 25], a full-
probabilistic algorithm was used to model the dependence of the demographic data on the
environmental factors of New Southwales (NSW) Australia as well as on population patterns and
the spatial location. In a[26] comparative study, WEKA was used to test the reliability and
efficacy of linear regression, additive regression or decision stub algorithms in Mississippi crime
prediction. The authors presented an ANN, Decision tree, rule induction, nearest neighbor
method and genetic analogy survey paper in [ 27] about crime data mining.

IV. Research Objective

 To investigate a simple criminal database that contains the geographical location & basic
details of the criminal activity have enough indicators to predict a type of crime.
 To analyze a geographic location and time, which we accurately can classify the crime.
 To explore different techniques to improve the results.

V. Significance of Research

Criminal activities take place all over the world and law enforcement agencies have to deal with
them effectively and efficiently. If enforcement agencies have a prior assumption of the class of
the crime, it would give them tactical advantages and help resolve cases faster. Also, an overall
study of criminal activity in a geographic area helps to understand the underlying pattern of the
crime the area suffers from.

VI. RESEARCH METHODOLOGY

1. Machine Learning
Machine learning is a sort of artificial intelligence, which uses data analysis to recognize
patterns. A computer can learn and predict the data by studying the environment without being
programmed directly. Machine learning can be divided into three main categories: supervised,
unattended, and reinforced. Supervised learning approaches are used in this paper to predict
types of crime.
2. Supervised Learning
Supervised learning is a machine learning system that can predict the output of a series of inputs.
The output labels are defined in supervised learning. The input object includes different features
and is usually shown in a vector form. Each input object is paired with a specific output object in
the training dataset. A controlled learning algorithm uses training data to create a predictive
model and fits in with new information. Separating practice and test data helps to prevent over
fitting supervised learning models. The algorithm predicts the labels of new evidence. Both
classification and regression issue supervised learning models can be implemented. The purpose
of the criminal data set is to predict the crime incidence category at a certain time.
3. Data Collection
The dataset used is Crime dataset of the province of Punjab available on Pbs [4]. The dataset
contains of crimes in Crime in Punjab from 2012 to 2018 which consists of 2782711. It contains
of features like type, year, month, day, hour, location, latitude, longitude and many more.

4. Data Preprocessing
The dataset used is Crime dataset of the province of Punjab available on Pbs [4]. The dataset
comprises of crimes in Crime in Punjab from 2012 to 2018 which contains of 2782711. It
consists of features like type, year, month, day, hour, location, latitude, longitude and many
more.

Fig. 1 - a) original having


null values b) dataset
preprocess

5. Model Selection

The crime groups are


discontinuous; this is a
supervised classification
problem. There are
dissimilar kinds of
supervised classification models.

 Gaussian Naive Bayes


 Linear Regression
 Decision Trees
 K-Nearest Neighbor
Two Ensemble Methods
 Adaboost
 Random Forest

6. Proposed Work

Data Collection

Classification

Pattern Identification

Prediction

Visulization

Figure 2: The Proposed Method Flow Chart Overview

Reference
1. Alkesh Bharati, Dr Sarvanaguru RA.K ,”Crime Prediction and Analysis Using Machine
Learning” in International Research Journal of Engineering and Technology (IRJET) ,Volume:
05 Issue: 09 | Sep 2018

2. H. Adel, M. Salheen, and R. Mahmoud, "Crime in relation to urban design. Case study: the
greater Cairo region," Ain Shams Eng. J., vol. 7, no. 3, pp. 925-938, 2016.

3. "Overall crime rate in Vancouver went down in 2017, VPD says," CBC News, Feb. 15, 2018.
[Online] Available: https://www.cbc.ca/news/canada/british-columbia/crime-rate-vancouver-
2017-1.4537831. [Accessed: 09- Aug- 2018].

4. Pakistan Bureau of Statistics. National Police Bureau, Ministry of Interior.2019 last updated

Available: http://www.pbs.gov.pk/content/crimes-reported-type

5. J. Han, Data mining: concepts and techniques, Morgan Kaufmann, 2012.

6. R. Iqbal, M. A. A. Murad, A. Mustapha, P. H. Shariat Panahy, and N. Khanahmadliravi, "An


experimental study of classification algorithms for crime prediction," Indian J. of Sci. and
Technol., vol. 6, no. 3, pp. 4219-4225, Mar. 2013.

7. H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general
framework and some examples," IEEE Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.

8. Freeman R. B. The economics of crime. Handbook of labor economics, 3:3529–3571, 1999.

9. Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F., & Pentland, A. (2014,
November). Once upon a crime: towards crime prediction from demographics and mobile data.
In Proceedings of the 16th international conference on multimodal interaction(pp. 427-434).
ACM.

10. Iqbal, R., Murad, M. A. A., Mustapha, A., Panahy, P. H. S., & Khanahmadliravi, N. (2013).
An experimental study of classification algorithms for crime prediction.

11. Maloof, M. A. (2003, August). Learning when data sets are imbalanced and when costs are
unequal and unknown. In ICML-2003 workshop on learning from imbalanced data sets II (Vol.
12, pp. 2-1).52
13. Wang X., Gerber M.S, and BrownD. E. Auto-matic crime prediction using events extracted
from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages
231–238. Springer, 2012.

14. Redmond M, Baveja A., “A Data-driven Software Tool for Enabling Cooperative
Information Sharing Among Police Departments”, European Journal of Operational Research,
Science Direct, vol. 141, no. 3, pp. 660–678, 2002.

15. Sadhana, C. S. (2015). Survey on Predicting Crime Using Twitter Sentiment and Weather
Data israce .2015

16. A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Pianesi, and A. Pentland, "Once upon a
crime: towards crime prediction from demographics and mobile data," Proc. of the 16th Intl.
Conf. on Multimodal Interaction, pp. 427-434, 2014.

17. H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general
framework and some examples," IEEE Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.

18. M. Al Boni and M. S. Gerber, "Area-specific crime prediction models," 15th IEEE Intl.
Conf. on Mach. Learn. and Appl., Anaheim, CA, USA, Dec. 2016.

19. T. Beshah and S. Hill, "Mining road traffic accident data to improve safety: role of road-
related factors on accident severity in Ethiopia," Proc. of Artificial Intell. For Develop. (AID
2010), pp. 14-19, 2010.

20. N. Mahmud, K. Ibn Zinnah, Y. Ar Rahman, and N. Ahmed, "CRIMECAST: a crime


prediction and strategy direction service," IEEE 19th Intl. Conf. on Comput. and Inform.
Technol., Dhaka, Bangladesh, Dec. 2016.

21. Y. L. Lin, L. C. Yu, and T. Y. Chen, "Using machine learning to assist crime prevention,"
IEEE 6th Intl. Congr. On Advanced Appl. Inform. (IIAIAAI), Hamamatsu, Japan, Jul. 2017.

22. F. K. Bappee, A. S. Júnior, and S. Matwin, "Predicting crime using spatial features," Can. AI
2018: Advances in Artificial Intel.-Lecture Notes in Comput. Sci., vol. 10832, pp. 367-373,
Springer, Mar. 2018.
23. H. W. Kang, H. B. Kang, "Prediction of crime occurrence from multimodal data using deep
learning," PLoS ONE, vol. 12, no. 4, Apr. 2017.

24. V. Grover, R. Adderley, and M. Bramer, "Review of current crime prediction techniques,"
Intl. Conf. on Innovative Techn. and Appl. Of Artificial Intel. pp. 233-237, Springer, London,
2007.

25. R. Marchant, S. Haan, G. Clancey, and S. Cripps, "Applying machine learning to


criminology: semi-parametric spatial-demographic Bayesian regression," Security Inform., vol.
7, no. 1, Dec. 2018.

26. L. McClendon and N. Meghanathan, "Using machine learning algorithms to analyze crime
data," Mach. Learn. And Appl.: an Intl. J. (MLAIJ), vol.2, no.1, Mar. 2015.

27. S. Prabakaran and S. Mitra, "Survey of analysis of crime detection techniques using data
mining and machine learning," Nat. Conf. on Math. Techn. and its Appl. (NCMTA 2018), IOP J.
of Physics: Conf. Series, vol. 1000, 2018.

You might also like