0% found this document useful (0 votes)
26 views

Data Modeling For Big Data: ISSN 2085-4579

The document discusses different aspects of big data including its definition, characteristics, and types of database management systems that can be used for big data. It defines big data based on its volume, velocity, variety, and veracity. It also describes relational and non-relational database management systems and how each can be used for big data management.

Uploaded by

Rocio Loayza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Data Modeling For Big Data: ISSN 2085-4579

The document discusses different aspects of big data including its definition, characteristics, and types of database management systems that can be used for big data. It defines big data based on its volume, velocity, variety, and veracity. It also describes relational and non-relational database management systems and how each can be used for big data management.

Uploaded by

Rocio Loayza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ISSN 2085-4579

Data Modeling for Big Data


M Misbachul Huda1, Dian Rahma Latifa Hayun1, Zhin Martun1
1
Informatics Engineering, ITS, Surabaya, Indonesia
[email protected],

Accepted on 26 Januari 2015


Approved on 25 Februari 2015
Abstract—Today the rapid growth of the making [2]. This definition challenges is twofold.
internet and the massive usage of the data have The first is about cost-effective innovative forms
led to the increasing CPU requirement, velocity of information processing. And the second is
for recalling data, a schema for more complex
enhanced insight and decision making [3]. Big
data is fundamentally about applying innovative
data structure management, the reliability and the and cost-effective techniques for solving existing
integrity of the available data. This kind of data is and future business problems whose resource
called as Large-scale Data or Big Data. Big Data requirements (for data management space,
demands high volume, high velocity, high veracity computation resources, in memory representation
needs) exceed the capabilities of traditional
and high variety. Big Data has to deal with two computing environments as currently configured
key issues, the growing size of the datasets and within the enterprise. The problem happened
the increasing of data complexity. To overcome in early 2000s when data volumes started
these issues, today researches are devoted to skyrocketing, storage and CPU technologies were
kind of database management system that can be overwhelmed by the numerous terabytes of big
data to the point that Information Technology
optimally used for big data management. There faced a data scalability crisis. The Enterprise,
are two kinds of database management system, then, went from being unable to afford or manage
relational database management system and non- big data to lavishing budgets on its collection and
relational system that can be optimally used for big analysis.
data management. There are two kinds of database Because of this background, nowadays
management, Relational Database Management the scalable and distributed data management
and Non-relational Database Management. This has been the vision of the database research
paper will give reviews about these two database community for more than three decades [4].
management system, including description, The research is devoted to kind of database
management system that can be optimally used
vantage, structure and the application of each for big data management. There are two kinds of
DBMS. database management system, relational database
management system and un-relational database
Index Terms—Big Data, DBMS, Large- management system. The example of relational
scale Data, Non-relational Database, Relational database management system is MySQL, and the
Database
example of un-relational database management
system is key-value, document, or graph database.
This paper will give a more general overview
on the overall definition, description, vantage,
I. INTRODUCTION data structure, and application of relational and
non-relational database management system.
The rapid growth of the internet and WWW
has led to vast amount of information available The remainder of this paper is organized as
online. The widespread use of information follows: The review of the definition and the
technology also has led to a dramatic increase
in the data availability. More documents or data characteristic of big data. Then, it will be the
are produced and stored in cloud or some kind approach of big data analysis. The next is the
of database management system. These stored modeling of big data, the application and the
and produced have more complex data structure,
need more storage, but also need to be executed discussion of this topic. And as the final part is the
and recalled fast. This kind of data, nowadays, is conclusion of the overview.
called as Large-scaled data or Big Data [1, 31].
Big data is high-volume, high-velocity and
high-variety information assets that demand
cost-effective, innovative forms of information
processing for enhanced insight and decision

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 1


ISSN 2085-4579

II. BIG DATA CHARACTERISTIC scale of this growth surpasses the reasonable
capacity of traditional relational database
management system, or even typical hardware
A. The Definition of Big Data configuration supporting file-based data access.
The rapid acceleration of data growth also causes
Most definitions of big data focus on the size the increasing data volumes pushed into the
of data in storage. Size matters, but there are network. These makes Big Data can be described
other important attributes of big data, namely by its volume or size of data [3].
data variety and data velocity. The three Vs of big
data (volume, variety, and velocity) constitute a 2. Velocity
comprehensive definition, and they bust the myth
that big data is only about data volume. In addition, Big data also can be described by its velocity
each of the three Vs has its own ramifications for or speed. There are two aspects to velocity, one
analytics [5]. The simulation is shown at Figure representing the throughput of data and the other
1. representing latency. Throughput represents
the data moving in the pipes. Latency is a time
It’s obvious that data volume is the primary interval between the stimulation or request or data
recalled and the responds [7].
attribute of big data. With that in mind, most
people define big data in terabytes—sometimes 3. Complexity/Variety
petabytes. Big data also can be described by its
Nowadays, Data Warehouse technology is
velocity or speed. You may prefer to think of it as rapidly introduced. The purpose is to create meta-
the frequency of data generation or the frequency models to represents all the data in one standard
of data delivery. format. The data was compiled from a variety
of sources and transformed using ETL (Extract,
Transform, Load) or ELT (Extract the data and
Load it in the warehouse, then Transform it inside
the warehouse). The basic premise was narrow
variety and structured content. Big Data has
significantly expanded our horizons, enabled by
new data integration and analytics technologies.
A number of call center analytics solutions are
seeking analysis of call center conversations and
their correlation with emails, trouble tickets, and
social media blogs. The source data includes
unstructured text, sound, and video in addition
to structured data. A number of applications are
gathering data from emails, documents, or blogs.
4. Veracity
Most Big Data comes from sources outside
our control and therefore suffers from significant
Fig 1. The 3 Vs of Big Data correctness or accuracy problems. Veracity
represents both, the credibility of the data source
as well as the suitability of the data for the target
B. Big Data Characteristic audience [7].
For example, if a company wants to
Talking about Big Data is not only about the collect product information from third party
big size, but also about the stream and the type. So, and offer it to their contact center employees
it is important to define the characteristic of Big to support customer queries, the data would
have to be screened for source accuracy and
Data. The defined characteristic will be used to credibility. Otherwise, the contact centers could
measure the quality of each database management end up recommending competitive offers that
system to tackle the Big Data challenge. might marginalize offerings and reduce revenue
opportunities. Likewise, the suitability for the
The characteristics are defined below. user or audience.

1. Volume 5. Reliability

According to the 2011 IDC Digital Universe The reliability in big data is about the accuracy
Study, “In 2011, the amount of information and completeness of computer processed data,
created and replicated will surpass 1.8 zeta bytes, given the uses they are intended for. Those, in
growing by a factor 9 in just five years [6].” The Big Data challenge, when there are a lot of data

2 ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015


ISSN 2085-4579

that must be executed in some ways, the expected more difficult or complex data structure and
output is the closes intention. schema, the user must using very complex query.
6. Extensibility So, it is needed to leverage new programming
language functionality to implement an object-
The extensibility is a system design principle relational mapping pattern. These programming
where the implementation takes future growth
into consideration. Because of the rapid growth environments allow developers to benefit from
of the data, Big Data will lead to a new challenge the robustness of DBMS technologies without the
to overcome. Therefore, to accomplice the current burden of writing complex SQL [8].
and the future goal of Big Data, the system must
consider what is going to be happened in the 11. Fault tolerance
future.
7. Interoperability Managing large-scale data needs to concern
about the performance. One of the performance
The available data in the cloud or in the Big points is handling the fault that occurs during the
Data environment is going to be used together,
interchangeable, and interpreted. So, for a system execution of computation. Such as the system has
to be interoperable, it must be able to exchange to deal with disk failures. Therefore, it is needed a
data and subsequently present that data such fault handling scheme. If a unit of work fails, then
that it can be understood by a user [10]. In
Big Data area, it is essential to take a global the system must automatically restart the task on
approach to interoperability and discoverability an alternate node, in order to do not waste the time
of information. by restarting the entire query [8].
8. Scalability
III. APPROACH OF BIG DATA ANALYSIS
Big Data can be considered as the tsunami
of information which has been steadily growing There are two kinds of approach for big data
and growing as result of the increasing of digital analysis, Map Reduce and parallel database
world. Nowadays, every single people movement
or activity is captured and transformed to the management system.
digital data. At the end, Big Data is going to keep
getting bigger, and more organization are going to
be looking to find out more about what to do[9]. A. Map Reduce

9. Integrity The Map Reduce programming model is


designed to process large volumes of data
Instrumentation of data requires a complete in parallel by dividing the Job into a set of
understanding of the data and the need to maintain independent Tasks. The Job referred to here as a
consistency of processing (if the data set is broken full Map Reduce program, which is the execution
into multiple pieces), the need to integrate of a Mapper or Reducer across a set of data. A
multiple data sets through the processing cycles Task is an execution of a Mapper or Reducer on a
to maintain the integrity of the data, and the need slice of data. So the Map Reduce job usually splits
for complete associated computations within the input data set into independent chunks, which
the same processing cycle. The instrumentation are processed by the map tasks in a completely
of transactional data has been a challenge parallel manner [11]. The simulation of task
considering the discrete nature of the data, and partitioning is shown at Figure 2
the magnitude of the problem amplifies with the
increase in the size of the data. This problem
has been handled in multiple ways within the
RDBMS-based ecosystem for online transaction
processing (OLTP) and data warehousing, but
the solutions cannot be extended to the Big Data
situation. So, one of the points is how to deal with
processing Big Data.
10. Flexibility

The data growth affects the flourish of


data type spread in the universe. This makes
another challenge to effectively and efficiently
recalling the data. For some cases, SQL has
insufficient expressive prowess. To perform Fig 2. Task Partitioning in Map Reduce [11]

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 3


ISSN 2085-4579

Map Reduce is being originally designed


for a largely different application (unstructured
text data processing). Map Reduce (or one of
its publicly available incarnations such as open
source Hadoop) can nonetheless be used to process
structured data, and can do so at tremendous
scale. For example, Hadoop is being used to
manage Facebook’s 2.5 petabyte data warehouse.
Unfortunately, as pointed out by DeWitt and
Stonebreaker [12], Map Reduce lacks many of the
features that have proven invaluable for structured
data analysis workloads and its immediate
gratification paradigm precludes some of the long
term benefits of first modeling and loading data Fig 3. Architecture Parallel DBMS
before processing. These shortcomings can cause Ideally, a parallel DBMS (and to a lesser
an order of magnitude slower performance than degree a distributed DBMS) should demonstrate
parallel databases. two advantages, linear scale up and linear
speedup. Linear scale up refers to a sustained
performance for a linear increase in both database
But despite of that, because Map Reduce is size and processing and storage power. Linear
designed to perform unstructured data analysis, speedup refers to a linear increase in performance
unlike a DBMS, Map Reduce systems do not for a constant database size, and a linear increase
in processing and storage power [15].
require users to define a schema for their data [13].
C. The Differences between Parallel Database
B. Parallel Database Management System
Management System and Map Reduce
The background of having the parallel database At glance, Parallel DBMS and Map Reduce
management system is the widespread adoption have many common elements. But actually, there
of the relational database [14]. A parallel DBMS are some basic differences. Parallel DBMSs
require data to fit into the relational paradigm of
can be defined as a DBMS implemented on a rows and columns. In contrast, the MR model
multiprocessor computer. This includes many does not require that data files adhere to a schema
alternatives ranging from the straightforward defined using the relational data model. That is,
the MR programmer is free to structure their data
porting of an existing DBMS, which may require in any manner or even to have no structure at all.
only rewriting the operating system interface
routines, to a sophisticated combination of All modern DBMSs use hash or B-tree indexes
to accelerate access to data. If one is looking for
parallel processing and database system functions a subset of records, then using a proper index
into a new hardware/software architecture. reduces the scope of the search dramatically. Most
As always, we have the traditional trade-off database systems also support multiple indexes
per table. Thus, the query optimizer can decide
between portability (to several platforms) and which index to use for each query or whether to
efficiency. The sophisticated approach is better simply perform a brute-force sequential search.
able to fully exploit the opportunities offered by a Because the Map Reduce model is so simple.
Map Reduce frameworks do not provide built-
multiprocessor at the expense of portability [15]. in indexes. To speed up accessing to the data
The parallel DBMS is shown at Figure 3. inside the application, any indexes must be
implemented. This is not easily accomplished, as
the framework’s data fetching mechanisms must
also be instrumented to use these indexes when
pushing data to running Map instances. Once more
this is an acceptable strategy if the indexes do not
need to be shared between multiple programmers,
despite requiring every Map Reduce programmer
re-implement the same basic functionality.

4 ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015


ISSN 2085-4579

IV. Data Modeling for Big Data dominant persistent storage technology. It has
many shortcomings which can hinder performance
Database model is a theory or specification levels. As more and more applications are
describing how a database is structured and used. launched in environments that have massive
Several such models have been suggested such as workloads such as cloud and web services, their
hierarchical, network, relational and non-relational scalability requirements change very quickly and
[20]. Nowadays, relational database models are also grow very large. It is difficult to manage with
the dominant persistent storage technology. The a relational database sitting on a single in-house
relational database model has been dominating server.
since 80s [16], with implementation like Oracle
databases [17], MySQL [18], and Microsoft SQL To solve all these matters, vendors can optimize
Server [19]. for non-relational database models. Non-Relational
databases enjoy schema-free architecture and
possess the power to manage highly unstructured
A. Relational Database Model data. They can be easily deployed to multi-core
or multi-server clusters serving modularization,
A relational database is a collection of data scalability and incremental replication. Non-
items organized in formally-described tables from relational databases being extremely scalable,
which data can be accessed or reassembled in offer high availability and reliability, even while
many different ways. Relational Database is a set running on hardware that is typically prone to
of tables referred to as relation with data category failure, thereby challenging relational database,
described in columns similar to spreadsheets. where consistency, data integrity, uptime and
Each row contains a unique instance of data for performance are of prime importance [20,21,33].
the corresponding data category. While creating
a relational database domain of possible values Non-relational database model is unlike
along with constrains are applied to the data. It Relational database model. It does not guarantee
is the relation between the tables that makes it the ACID properties [32]. Non-relational
a ‘relation’ table. They require few assumptions databases may primarily be classified on the basis
about how data will be extracted from the database. of way of organizing data as follows.
As a result, the same database can be viewed in
many different ways. Mostly all the relational 1. Key Value Store
databases use Structured Query Language (SQL)
to access and modify the data stored in the Key value store allows us to store schema-
database. Originally it was based upon relational less data. This data consists of a key which is
calculus and relational algebra and is subdivided represented by a string and the actual data which
into elements such as clauses, predicates, queries is the value in key-value pair. The data can be
and statements [21]. any primitive of programming language, which
may be a string, an integer or an array or it can
The advantages of Relational Database Model be an object. Thus it loosens the requirement of
are as follow. formatted data for storage, eliminating the need
for fixed data model [21].
• The data in relational database model are
mostly stored in database, not in application.

• The database is structured in a tabular form


with highly-related tables.

• It is quite simple to make change in the


database schema.
But the Relational Database Model do not
support high scalability, until a certain point Fig 4. Key Value Store Structure
better hardware can be employed using parallel
distributed management system. When the 2. Document Store
amount of the data become huge, the database
has to be partitioned across multiple servers. The
other disadvantage is because of the structure Document Store supports more complex
of relational database model, gives rise to data than the key-value stores. The meaning of
high complexity in case data cannot be easily document here is not like a document in Microsoft
encapsulated in a table [21].
Word file or such kind. But it refers to any kind
of pointer less object. This kind of non-relational
B. Relational Database Model database supports secondary indexes and multiple
Nowadays, relational database models are the types of object.

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 5


ISSN 2085-4579

The reason why using the graph database is


the requirement of the application itself. In Graph
Store, the interconnectivity or the topology of the
data is more important than or at least as important
as the data itself. The advantages of Graph Store
usage are, it leads to a more natural modeling,
because graph structure is visible to the user.
Graph can keep all the information about an entity
in a single node and show related information by
arcs connected to it. The queries can refer directly
to this graph structure. Explicit graphs and graph
operations allow a user to express a query at a
very high level [25]. As far as the implementation
is concerned, Graph Store may provide special
storage graph structures for the representation of
graphs and the most efficient graph algorithms
available for realizing specific operations [27].
Fig 5. Document Store Structure Although the data may have some structure, the
structure is not as rigid, regular or complete as
The Document Store are schema-less. It traditional DBMS. The illustration of Graph Store
provides a mechanism to query collections based is shown at Figure 6.
on multiple attribute value constraints [22]. 4. Column-oriented Database(COD)
Document Store is good for storing and managing
kind of text documents, email messages, and A column-oriented database stores data
in column order and not in row order as in
XML documents. It also good for storing semi- traditional relational database [28]. Regarding
structured data [23]. for join algorithm, COD id better than relational
DB. With a column store architecture, a DBMS
3. Graph Store need only read the values of columns required for
processing a given query, and can avoid bringing
The other approach for storing data is to model into memory irrelevant attributes for some query
the database directly and entirely as a graph. Big it’s better than row store[29].
Data has to deal with two key issues, the growing
size of the datasets and the increasing of data Column-stores have the advantage that
complexity. Therefore, the alternative database dictionary entries may encode multiple
models such as graph databases are more and values at once [13]. Data stored in columns is
more used to address this second problem [26]. more compressible than data stored in rows.
Compression algorithms perform better on data
A graph model is one whose single underlying with low information entropy [30].
data structure is a labeled directed graph. The
Graph Store consists of single digraph [24]. A For example, a database containing
database schema in this model is a directed graph, information about students that have attributes
where leaves represent data and internal nodes name, registration number, address, and
represent connections between the data. Directed department. Storing that data in column allows
labeled graphs are used as the formalism to specify all of the name to be stored together, all of the
and represent database schemes, instances, and registration number. Further, if the data is sorted
rules [25]. by one of the columns, that column will be super-
compressible (for example, runs of the same
value can be run-length encoded). But of course,
the above observation only immediately affects
compression ratio. It will lead to get cheap disk
space.

Fig 7. Column-oriented Database Structure


Fig 6. Graph Store Structure

6 ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015


ISSN 2085-4579

V. COMPARISON OF RELATIONAL AND


NON RELATIONAL DATABASE MODEL
The equations are an exception to the prescribed
specifications of this template. You will need to
determine whether or not your equation should
be typed using either the Times New Roman or
the Symbol font (please no other font). To create
multileveled equations, it may be necessary to
treat the equation as a graphic and insert it into the
text after your paper is styled.
Figure 8 shows the Database Engine survey Fig 9. Non-Relational Database Popularity Chart
done at January 2013 to November 2013. From
Figure 8, the points that can be taken are as follow [34]
relational database has 90.8% presentation value
and non-relational database has 9.2% presentation After comparing the number of database
value. user between users of relational database a non-
relational database, it will be discussed about
From the data given, shows that relational the comparison of relational and non-relational
database is more popular than no relational databases. We used 32 parameters for comparison
database. The user is more familiar to the to define how good databases are. The parameters
relational database, so that they use it more used in the comparison are as follows.
that non-relational database model. But as new
comer the non-relational database having a good 1. Database Model
presentation.
Database model is model data type represents
logical structure from a database. This model is
From the 9.2% of presentations value of used to store, organize, and manipulate the data.
non-relational database, it can be divide into The examples of database model are relational,
four specific part. It is shown at Figure 9. That document, key-value and graph.
four parts are:
2. Integrity model
1. Document Store 39.13% Integrity model is a computer security rule
describing a set of access control rule that is
2. Key-Value 22% designed to ensure the data integrity. The integrity
has 3 purposes. The first is preventing data
3. Wide Column/column oriented 17.39% modification by unauthorized user. The second
is preventing illegal data changing by authorized
4. Graph and other database 21.74% user. And the last is to keep the data consistency
[35].
3. Embeddable
Embeddable is a capability for being able to be
embedded at some devices. Some databases have
the ability to be embedded at specific hardware
and software. This can increase the amount of the
available resources to expand the database.
4. Query language
Query language is an interface to communicate
to the database. This interface can be an application
to database, or database to database, and remote
database through a media like internet.
5. Isolation
Fig 8. Relational and Non-Relational
Database Popularity Ranking [34] A transaction in the database cannot be
discovered by another transaction. The isolation is
a feature to tackle this problem. The database has
the isolation function in its data model for having
high data integrity.

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 7


ISSN 2085-4579
6. Horizontal Scalable control, and incorporate output in Map Reduce
completely hidden from the user application
Horizontal scalable is an ability for resource within the framework. In fact, most applications
growth and development. The development and can be implemented in the Map Reduce parallel
the growth of the resource are used to work up the during synchronized and shared global state is not
performance of a big data application. Horizontal required
scalable is devoted to arisen amount of resource
usage. 14. TTL for Entries
7. Replication TTL for entries is the ability for limiting time
process in data changing. If the limitation is
Replication is reduplication process and exceeded, the transaction will be cancelled. TTL
database object maintenance in databases at for entries can clear up the long transaction and
distributed database system. The replication is keep the DBMS condition to be not so busy.
used to create a secondary data or data backup,
so that the user can take the data from distributed 15. Secondary Indexes
database system by spending less cost.
The secondary index represents a set of
8. Replication Mode attribute in an entity. The secondary index can be
used for query from some attributes to work out
Replication is a part of node client definition the performance. Every entity can have 0, 1, 2,
indicates whether the node client is managed to or more secondary index according to the query
receive or to send data replication. Besides that, needed.
the replication mode is also used to indicate data
synchronization at first replication process. 16. Composite Keys
9. Sharding Composite Key is a key that consists of 2 or
more attributes that identify an entity. This key
Sharding is splitting the collection into smaller is built for describing more specific entity from
part that is called as chunks. Chunks, then spread some attributes.
to cluster servers called Shard. Every shard is
responsible to the stored subset data. 17. Geospatial Indexes
10. Shared Nothing Architecture Spatial data type is difference from the
common data type. Geospatial data type is an
Shared nothing architecture is a distributed encoding from data spatial vector. Indexing ability
computation architecture that every node is for geospatial will make the database have more
independent to another node. Between the nodes, value for GIS application.
there are no shared memory or disk storage. Every
node has memory and disk storage needed. 18. Query cache
11. Data types Query cache is used to increase the data
fetching for former query. With this ability, every
Data type is a kind of data that can be stored query and every query result will be stored until
in database. In this attribute, data type is data type a certain duration. Every redundant query will be
in storage. Orin other words is the most primitive taken from cache data as long as there is no any
data in a database. changing of the database.
12. Graph Support 19. Data Storage
Graph is set of entity connected to the amount Data in database will be stored in a storage
of references. Graph may cause the cyclic media. The storage media can be a memory or a
occurrence to the used reference. Graph support file system. The data store is a storage media in a
is the ability of the database handling the cyclic database. Every database has different data store.
reference.
20. Conditional Entry Update
13. Map and reduce
This feature has ‘where’ and ‘when’ clause. The
Map reduce is a parallel programming model ability to change the data in a certain condition.
to execute a large number of Meta data in
computer cluster. Map reduce is based on scale- 21. Unicode
out principal. It involves a large of computer
grouping. The main point using Map Reduce Unicode is an industry standard designed to
is to move the computation to the data nodes, allow text and symbol from all writing system
rather than bringing the data to the computation in the world to be shown and manipulated
nodes, and thus fully utilize the advantage of data consistently in a computer. Unicode guarantees
locality. The code that divides the work, it provides the data consistency to be treated in another

8 ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015


ISSN 2085-4579
platform or application.
30. Full Text Search
22. Compression
Full Text Search is searching document tube
Compression is a database ability used to done by the computer by browsing the entire
simplify the data in a smaller size. The advantage part of a document. The way is by searching the
is minimizing disk size. A better compression will document that has user query word. The query
reduce disk usage. result will be arranged according to the level of
words and commonly the frequency will be sorted
23. Atomicity from high to low. The document browsing will
only use basic operation string matching without
The atomicity is a simplest atomic activity that any other algorithm operation [36, 37].
must be done in the right time or cancelled. The
atomic activity cannot be executed partially. 31. Active
24. Consistency Active database is the ability of the database
used to perform computation. Active database
Consistency is a resource retrieval condition clears the perception that a database is only a
of one consistent part with the other parts. place to store data. An example of a database can
Consistency preserves data integrity. be called as active database is the store procedure
database.
25. Durability
32. Value Size Max
The durability is the ability to save the data,
despite of system failure. This ability is concerned Maximum data size handled by database.
to data stored rather than the active system that
handle the data.
26. Transactions At Table 1 shows some examples of the
database represent each of database model. These
The transaction is an activity or group of databases will be compared using the explained
activities used by user or application where
they access and modify the data in database. attribute. The attributes have been classified based
The activities will not be stored before commit on the characteristic of big data. The classification
command. The revocation is also can be done by is made to show the quality of a database to the
using rollback command. The database do not
have the ability to change the data in database Big Data challenges.
when the user or application is modifying the data.
Table 1. Case Study Application Database
27. Referential Integrity
No Model Database Name Ranking
1 Relational Database Oracle 1
The referential integrity is a way used to 2 Postgre 4
3 CouchDB 19
keep the consistency between the correlated data 4 Document Store MongoDB 6
model. With this ability the correlated data can be 5
6 Key Value Store Hbase
Cassandra
17
10
ensured for having consistency. At the common, 7 Wide Column Store Oracle Coherence 51
this ability is had by the relational database and 8
9
Redis
Neo4j
13
24
graph database. 10 Graph Store Titan 74

28. Revision Control


As shown at Table 2 is the comparison
The revision control is technique used to save table between the relational database and the
the system from the backdoor. Revision control
system is a system used to store configuration in non-relational database by using the explained
database. So, every single changing will be noted. attribute. From the table, it can be concluded
All the configuration changes will be stored in a that non-relational database have more complex
directory to be observed.
attribute than relational database. For example, as
29. Locking Model can be seen in the Table 2 some of non-relational
Locking model is ability to lock the model database have more attributes than relational
when changing data. This locking will be opened database
when the commit and roll back command is given.
Locking model is very useful for transaction Such as the Horizontal scalable attribute. All
proses in a long database. of the non-relational database given in the table
have this attribute. But one of relational database,

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 9


ISSN 2085-4579

object relational, has no this attribute. Another prove is that the relational database has no Map reduce
attribute that can accelerate the computation process. Unlike the non-relational database that almost all
of them having this attribute.

Table 2. Comparison some of Relational and Non-Relational Database with basic few attributes-
scalability, variety, velocity, veracity, volume.
Relational Database Non Relational Database
Attribute
Relational Object Relational Document-Stored Wide-Column Store Key-Value Stored Graph-Oriented
Characteristic
Database Name Oracle Postgre CouchDB MongoDB Hbase Cassandra Oracle Coherence Redis Neo4j Titan
Database Model Relational Object-Relational Document-Stored Document-Stored Column-oriented Column-oriented Key-value Key-value Graph-Oriented Graph-Oriented
Scalability Query language API calls, REST, SparQL,
SQL, HTTP, SparQL, Xquery, JavaScript, REST, API calls, JavaScript, API calls, REST, XML Tinkerpop, Gremlin, REST,
SQL API calls, CQL Thrift API calls, CohQL API calls, Lua Cypher, Tinkerpop,
Xpath, API calls, Java API Erlang REST Thrift API calls
remlin
Horisontal Scalable Yes No Yes Yes Yes Yes Yes Yes Yes Yes
Replication Mode Master-Slave-Replica
Master-Slave Replication Master-Slave Master-Master Master-Slave-Replica Master-Slave Master-Master Master-Slave
Replication Master-Slave Replication Symmetric Replication
Multi-master replication Replication Replication Replication Replication Replication Replication
Master-Master Replication
Sharding No Yes No Yes Yes Yes Yes No No Yes
Shared Nothing
No Yes No Yes Yes Yes Yes Yes No Yes
Architecture
Variety Data types Binnary Binnary JSON BSON, Binnary Binnary Binnary Binnary Binnary Binnary
Graph Support Yes Yes No Yes No No Yes No Yes Yes
Velocity Map and reduce No No Yes Yes Yes Yes Yes No No No
Replication Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
TTL for entries Yes Yes No Yes Yes Yes Yes Yes No No
Secondary Indexes Yes Yes Yes Yes No Yes Yes No Yes Yes
Composite keys Yes Yes Yes Yes Yes Yes Yes No Yes Yes
Geospatial Indexes Yes Yes Yes Yes No No No No Yes Yes
Query Cache Yes Yes No Yes No Yes Yes Yes Yes Yes
Veracity Data Storage Volatile memory Volatile memory File System Berkeley DB
ASM File System File System File System HDFS File System Volatile Memory
File System File System Volatile memory Cassandra Hadoop
Conditional entry updates
Yes Yes Yes Yes Yes Yes Yes Yes No No

Isolation Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Unicode Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Compression Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Atomicity Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Consistency Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Durability (data storage) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Transactions Yes Yes Yes No No No Yes Yes Yes Yes
Referential integrity Yes Yes No No No No Yes No Yes Yes
Revision control No No Yes No Yes Yes Yes No No No
Locking model Optimistic Locking
Lock on Model MVCC No MVCC MVCC Explicit locking Lock Free Model Lock on write Distributed Locking
Lock on write
Full Text Search Yes Yes No Yes No No Yes No Yes Yes
Integrity model ACID Eventual consistency
ACID ACID MVCC ACID BASE Log Replication BASE ACID ACID
Log Replication ACID
Volume Value size max. 4GB 1GB 4GB 500000GB 2000GB 2GB 64000GB 0.5GB 4GB 64GB

VI. CONCLUSION [3] D. Loshin, Big Data Analytics: From Strategic


Planning to Enterprise Integration with Tools,
Techniques, NoSQL and Graph. USA: Elsevier,
Big data be real in the present case. Volume, 2013.
velocity, variety, veracity, and scalability is a
[4] D. Agrawal, S. Das, and A. El Abadi . Big Data
challenge that must be resolved by the database. and Cloud Computing: Current State and Future
A variety of modeling approaches, both relational Opportunities. ACM, March 2011.
and non-relational been used to try to overcome [5] P. Russom. Big Data Analytics. IBM: The Data
the problems Big data. From the data shown in Warehousing Institute. 2011.
section V can be concluded that the relational
[6] IDC Digital Universe Study, Study: Extracting
database has high popularity. But in the case Value from Chaos. Accessed 29th December
studies of big data, non-relational database has 2013. Available: http://www.emc.com/collateral/
demos/microsites/emc-digital-universe-2011/
better attributes to satisfy criteria of big data. index.htm.
In this paper it has been shown that the [7] A. Sathi. Bid Data Analytics. USA: MC Press
Online. IBM, October 2012.
non-relational databases have attributes more
appropriate to resolve the problem big data. [8] A. Pavlo, E. Paulson, and A. Rasin. A Comparison
of Approaches to Large-Scale Analyisis. ACM,
June 2009.
References
[9] Sand. Simple Scalability Key Big Data. Accessed
[1] Cubrid. Database Technology for Large Scale 29th December 2013. Available: ttp://www.sand.
Data. Accesed 29th December 2013. Available: com/simple-scalability-key-big-data/
http://www.cubrid.org/blog/dev-platform/
database-technology-for-large-scale-data/ [10] Himss. What is Interoperability. Accessed 28th
December 2013. Available:http://www.himss.
[2] Gartner’s IT Glossary. Accessed 29th December org/library/interoperability-standards/what-is
2013. Available: http://www.gartner.com/it-
glossary/big-data/ [11] L. Wang, J. Tao, R. Ranjan, H. Marten, A. Streit,

10 ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015


ISSN 2085-4579

J. Chen, and D. Chen. G-Hadoop: MapReduce [25] R. Angles, C. Gutierrez. Survey of Graph
Across Distributed Data Centers for Data Database Models. Technical Report Number TR/
Intensive Computing. ACM, New York, USA, DCC-2005-10, Computer Science Department:
NY, USA, 2012, pp. 739-750. Universidad de Chile. 2005.
[12] M. Stonebreaker, D. Abadi, D.J. Dewitt, S. [26] S.Jouili, V. Vansteenberghe. An Empirical
Madden, E. Paulson, A. Pavlo, And A. Rasin. Comparison of Graph Databases.
MapReduce and Parallel DBMS: Friends or
Foes?. 2010. [27] R. H. G¨uting. GraphDB: Modeling and Querying
Graphs in Databases. In Proc. of 20th Int. Conf.
[13] S. Harizopoulos, D. Abadi, and P. Boncz. on Very Large Data Bases (VLDB), pages 297–
Column-Oriented Database System. 2009. 308. Morgan Kaufmann, September 1994.
Available: www.cs.yale.edu/homes/dna/talks/
Column_Store_Tutorial_VLDB09.pdf‎ [28] K. C. Kim, C. S. Kim. Parallel Processing of
Sensor Network Data using Column-Oriented
[14] D. J. Dewitt, J. Gray. Parallel Database Systems: Databases. AASRI Conference on Parallel and
The Future of High Performance Database Distributed Computing Systems, pp. 2-8. 2013.
processing. June, 1992.
[29] M. Stonebraker, D.J. Abadi, A. Batkin, X.
[15] [15] M.T. Ozsu, P.. Distributed and Parallel Chen, M. Cherniack, M Ferreira, E. Lau, A.Lin,
Database Systems. -. Available: www.cs.uoi. S. Madden, E. O’Neil, P. O’Neil, A. Rasin,
gr/~pitoura/courses/ddbs03/paper-to-translate. N. Tran, and S. Zdonik. C-Store: A Column-
pdf‎ oriented DBMS. Proceedings of the 31st VLDB
Conference, Trondheim, Norway, 2005.
[16] [16] A. B. M. Moniruzzaman, S. A. Hossain.
NoSQL Database: New Era of Databases for Big [30] D.J. Abadi, S. R. Madden, and N. Hachem.
data Analytics - Classification, Characteristics and Column-Store vs Row-Store. SIGMOD’08,
Comparison. International Journal of Database Vancouver, BC, Canada. June, 2008.
Theory and Application. 2013.
[31] L. Wang, M. Kunze, J. Tao, G. von Laszewski,
[17] Oracle. Oracle Databases. Accessed: 29th Towards building a cloud for scientific
December 2013. Available: Oracle Databases applications, Advances in Engineering Software
from web: http://www.oracle.com/us/products/ 42 (9), pp. 714–722. 2011.
database/overview/index.html.
[32] T. A. M. C Thantriwatte, C. I. Keppetiyagama.
[18] MySQL. MySQL Database. Accessed 29th NoSQL Query Processing System for Wireless
December 2013. Avalable: web: http://www. Ad-hoc and Sensor Networks. In Advances in ICT
mysql.com. for Emerging Regions (ICTer), 2011 International
Conference on (pp. 78-82). IEEE. September,
[19] Microsoft. Microsoft SQL Server Databases. 2011.
Accessed: 29th December 2013. Available: http://
www.microsoft.com/en-us/sqlserver/default. [33] J. Han, E. Haihong, Guan Le, and Jian Du. Survey
aspx. on NoSQL Database. Pervasive Computing
and Applications (ICPCA), 6th International
[20] U. Bhat, S. Jadhav. Moving towards Non- Conference, pp.363-366. October, 2011.
Relational Databases. International Journal of
Computer Applications, 2010. [34] A. Paul, RDBMS dominate the database market,
but NoSQL systems are catching up, Accessed
[21] N. Jatana, S. Puri, M. Ahuja, I. Kathuria, and D. 28th December 2013. Available: http://db-
Gosain. A Survey and Comparison of Relational engines.com/en/blog_post/23
and Non-Relational Database. International
Journal of Engineering Research & Technology, [35] Z. Belal, A. Essam. The Constraints of Object-
August 2012. Oriented Databases, Int. J. Open Problems
Compt. Math., Vol. 1, No. 1, June 2008.
[22] R. Cattell. Scalable SQL and NoSQL Datastore.
2011. [36] Beall, J.: The Weaknesses of Full-Text Searching.
The Journal of Academic Librarianship
[23] K. Orend, (2010) “Analysis and Classification of (September 2008)
NoSQL Databases and Evaluation of their Ability
to Replace an Object-relational Persistence [37] Yates, R.B., Neto, B.R.: Modern Information
Layer,” Master Thesis, Technical University of Re-trieval. Addison Weasley Longman Limited
Munich, Munich. (1999)
[24] M. Levene and G. Loizou. A Graph-Based Data [38] M. Young, The Technical Writer’s Handbook.
Model and its Ramifications. IEEE Transactions Mill Valley, CA: University Science, 1989.
on Knowledge and Data Engineering (TKDE),
7(5):809–823, 1995.

ULTIMA InfoSys, Vol. VI, No. 1 | Juni 2015 11

You might also like