0% found this document useful (0 votes)
131 views

Couchbase Under The Hood WP

Couchbase_Under_The_Hood_WP

Uploaded by

Binh Dq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

Couchbase Under The Hood WP

Couchbase_Under_The_Hood_WP

Uploaded by

Binh Dq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

WHITEPAPER

Couchbase
Under the Hood
Couchbase Under the Hood
An Architectural Overview

INTRODUCTION 4
Essential NoSQL requirements and features 4
Core design principles 7
Memory and network-centric architecture 7
Workload isolation 7
Asynchronous approach to everything 7
JSON DATA MODEL AND ACCESS METHODS 8
JSON data model 8
Flexible, dynamic schema 9
Document access methods 9
Key, values, and sub-documents 11
Keys 11
Values 11
Sub-documents 11
Key Couchbase concepts 11
Buckets 12
vBuckets 12
Nodes 12
Clusters 12
Services 12
COUCHBASE SERVICES 13
Data service and KV engine 13
Managed object cache 14
Document expiration 14
Memory management 14
Compression 14
Compression modes 14
Compaction 14
Mutations 14
Key-value data access 15
Query service 16
Index service for N1QL 17
Query optimization 18

WHITEPAPER 2
Query consistency 18
Memory-optimized Indexes (MOI) 18
Search service 19
Eventing service 19
Analytics 20
Mobile 21
DISTRIBUTED FOUNDATION 22
Node-level architecture 22
Cluster architecture 23
Cluster/node configuration 23
Cluster manager 24
Client connectivity 24
Topology-aware client 25
Data transport via Database Change Protocol (DCP) 25
Multi-Dimensional Scaling (MDS) 25
Homogeneous scaling model 25
Independent scaling model 26
Data distribution 26
Index partitions and replicas 27
Partitioning other services 27
Rebalancing the cluster 28
High availability 28
Intra-cluster replication 28
Node failover 29
Failover choices 30
Server group awareness 30
Cross Datacenter Replication (XDCR) 31
Conflict resolution 32
Security 32
Encryption at rest 32
Encryption over-the-wire 32
Moving data between nodes 33
Moving data between datacenters 33
Mobile client synchronization 34
Conflict resolution 34
RESOURCES 35

WHITEPAPER 3
INTRODUCTION

Couchbase is a modern distributed, multimodel NoSQL Essential NoSQL requirements and features
database. Couchbase’s core architecture supports a flexible
NoSQL databases evolved beyond enterprise relational
JSON data model at its foundation and uses familiar
databases to address performance and flexibility
relational and multimodel data access services to supply
deficiencies made evident as applications became more
data to operational and analytic applications. Couchbase
sophisticated and “Big Data” became an industry-standard
advantages include fast in-memory performance, easy
buzzword. Relational databases tend to operate primarily
scalability, mobile synchronization to, from, and among
as systems of record, maintaining transactional data
Couchbase Lite, always-on 24x365 availability, advanced
in a highly consistent manner. But several architectural
security, and affordable cloud deployment alternatives.
principles (e.g. normalization of objects, adherence to
Couchbase can be accessed as a fully-managed database-
fixed schema and data typing, single node transactional
as-a-service called Couchbase Capella, and also offers
design, two-phase commit) have made them difficult to
Kubernetes-managed containerized cluster deployments
modify after deployment and scale to larger distributed
with its Cloud Native Database Automation product line.
workloads while simultaneously delivering responsive and
Couchbase also supports local installations of its Community
highly available applications.
and Enterprise edition binary packages.
Pragmatic business needs for more advanced technical
As a multimodel database, Couchbase supports multiple
requirements have pushed multimodel NoSQL databases
data access methods within a dynamic data containment
to the forefront. The business needs for high performance,
structure, on top of a flexible JSON document data format.
application-driven flexibility over the makeup of its data,
Couchbase consolidates multiple data access layers
distributed processing and mobility, and the overarching
and engines into a single platform that would otherwise
need to lower operational costs and escape vendor lock-in
require single-purpose databases to work together. This
are key drivers as to why organizations seek out cloud-native
“polyglot persistent” design architecture was introduced
NoSQL systems. These modern requirements have driven
in the early 2000’s so that RDBMS and NoSQL databases
Couchbase’s development from inception:
could coexist in supplying data to applications. Couchbase
provides the performance of a key-value powered caching • Ensure high-performance

layer, the flexibility of a JSON document-based dynamic • Provide data model and data access flexibility

source of truth, and the reliability of a relational database • Support distributed cluster networks and mobility

system of record. Couchbase eliminates the need to • Provide incredible value and low TCO

manage data models and consistency between multiple Couchbase has been focused on setting a high standard in
systems, learn different languages and APIs, and manage each of these areas. The result is a robust and accessible
independent technologies. database platform with exceptional performance at scale,
multimodel flexibility, and SQL familiarity delivered in both
This paper describes how the internal components of the
self-managed and fully-managed cloud-based product
Couchbase database (Capella, Server, and Mobile) operate
lines. Because of this, Couchbase has become a modern,
with one another. It assumes you have a basic understanding
multipurpose NoSQL database. Learn more about how
of Couchbase and are looking for a deeper technical
Couchbase delivers on these core database requirements
understanding of how things operate beneath the surface.
in the following table and later in the text of this paper.

WHITEPAPER 4
Fast Flexible Familiar Affordable Future-Proof

• Memory-first design • JSON Document • SQL++ query language • Networks of distributed • Fully-managed DBaaS,
• Dynamic Schema & mobile databases w/o cloud lock-in
• Cloud-native scale • Multimodel Services
• ACID SQL
• Elastic cluster scaling • Self-managed
• Asynchronous clusters • Deploy Anywhere Transactions
sharding & rebalancing Kubernetes
• HA, DR & backup • Multidimensional • Cost -based query Autonomous Operator
• Geo-replication via
Scaling optimizer
• Low latency Cloud 2 XDCR • Predicatable
Edge • Mobile & Edge ready • SDKs for 12+ languages, price/performance
• High-density storage
including mobile

Couchbase Managed and Customer-Self Managed Cloud Deployments

Fast Flexible Familiar Future-Proof


• Memory-and-network- • Flexible JSON data model • Includes foundational • Fully-managed Data
centric architecture, with supports continuous delivery. RDBMS concepts like schema, base-as-a-service without
an integrated cache delivering Make schema changes ACID transactions and cloud vendor lock-in
high throughput and without downtime User-defined functions
• Self-managed Kubernetes-
sub-millisecond latency
• Extract value using a broad • Leverage common SQL++ based cloud native database
• Asynchronous clusters set of multi-model data query patterns for joins, automation with
access capabilities (full-text aggregations, and more Autonomous Operator
• Always-on, fault-tolerant
search, real-time analytics,
design • Patented cost-based • Global deployment with
data streaming, change-
query optimizer low write latency using
• Microservices architecture data-capture, and python
active-active cross cloud
with built-in auto-sharding, and JavaScript-based • SDKs for 12+ languages
replication
replication, and failover User-Defined Functions) including mobile
• Infrastructure agnostic
• Isolated and independent • Deliver and sync data • Full-stack security with
support across local, virtual
scaling of workloads, with to the edge and to end-to-end encryption and
machines, clouds, and
no downtime for upgrades mobile devices role-based access control
containerized environments.
or code changes

WHITEPAPER 5
The original multi-model
NoSQL database Couchbase is an open source database company
Couchbase was originally founded through the merger Couchbase favors and supports the open source
of two open source database companies, CouchOne and development model. The source code to the Community
Membase. CouchOne employed developers of Apache Editions of the Couchbase database and its mobile product
CouchDB, an original, highly-reliable, document database, line is available for non-commercial use under the Business
while Membase employed developers of memcached, a Software License (BSL 1.1), which converts to the permissive
high-performance, memory-first, key-value database. The Apache 2.0 license after four years. Software development
merger of these teams led to the design of Couchbase, kits (SDK’s) for more than a dozen application and mobile
a reliable, scalable, fast in-memory, key-value database programming languages are available as Apache 2.0 open
with document-based access and storage. In this model, source. Couchbase also maintains a robust library of open
document identification “keys” store “value” data as a source projects at couchbaselabs.
JSON document. Couchbase was the first of its kind, dual-
These principles of speed, flexibility, familiarity, and
model access database, setting the standard for advancing
affordability have been built in the very core of the
consolidation of single-access NoSQL datastores.
database engine to ensure low latency and reliable, yet
Couchbase further distanced itself from its origin sources
easy to manage, replication. Around this core are a set
by adding support for SQL++ (aka N1QL) as its primary
of data access services that run and scale independently
query language. Today, multimodel convergence continues
of each other. These are delivered through a unified
to grow in order to address the variety of functional
programming API, established security capabilities, and
demands from modern applications.
external technology integrations, and made available
Unfortunately, many people still confuse Couchbase through fully managed and self-manged offerings including
with CouchDB even though they have evolved along Couchbase Capella, a fully-hosed database-as-a-service,
their own diverging paths, and no longer resemble each and through the Kubernetes-based, Cloud Native Database
other whatsoever. Automation product line for self-managed deployments.

WHITEPAPER 6
Core performance design principles Workload isolation
To effectively deliver the above features, three guiding • All databases perform different tasks in support of
principles have been followed when developing Couchbase: an application. These include persisting, indexing,

memory and network-centric architecture, workload querying, aggregating, and searching data. Each of
these workloads has slightly different performance and
isolation, and an asynchronous approach to everything.
resource requirements.
Memory and network-centric architecture for
• Multi-Dimensional Scaling (MDS) isolates these workloads
speed and low latency
from one another at both a process and a node level.
• The most used data and indexes are transparently
• MDS allows these workloads to be scaled
cached in memory for fast reads.
independently from one another and their resources to
• Writes are performed in memory and replicated or be optimized as necessary.
persisted synchronously or asynchronously. Using
• MDS allows the database to be performance-matched
transaction guarantees ensures consistency, but may
to the performance needs of the application, and the
introduce lags in performance.
database to its available infrastructure. For cloud
• Internal Database Change Protocol (DCP) streams deployments, it is advantageous from a cost
data mutations from memory to memory at network perspective to “red-line” infrastructure instances before
speeds to support replication, indexing, and mobile adding them and to avoid idle and under-utilized node
synchronization. instances.
Multimodel data access blending JSON flexibility • Couchbase manages the topology, process
with Key/Value speed management, statistics gathering, high availability, and
Couchbase is a pioneer in offering multiple data access data movement between these services transparently.
methods to gain read and update access to its foundational
Asynchronous approach to everything
JSON and Key/Value storage structures. This type of
NoSQL database is referred to as “multimodel” because • Traditional databases increase latency and block
application operations while running synchronous
many NoSQL systems have only one access method which
operations, for example, persisting data to disk or
is bound to their physical storage design structures on disk
maintaining indexes.
to minimize access latency.
• Couchbase allows write operations to happen at
• Couchbase’s original access methods are:
memory and network speeds while asynchronously
• Key/Value, derived from Memcached-based processing replication, persistence, and index
design and JSON, the emerged standard format management.
for document databases like CouchDB, whose
• Spikes in write operations don’t block read or query
authors based the Couchbase design on their
operations, while background processes will persist
earlier work. While many developers may confuse
data as fast as possible without slowing down the rest
the two, Couchbase and CouchDB are not the same,
of the system.
rather Couchbase should be considered as the
unique offspring from the developers of CouchDB • ACID transactions are available to the developer
and Memcached, sharing DNA from both. to ensure durability and consistency while data is
in flight. Multiple transaction options are available
• As it evolved, Couchbase has added multiple data
allowing the developer to decide when and where
access models, including a SQL++ query service, a
to increase latency in exchange for durability and
Full-Text Search service, an Eventing service, and
consistency of transactions. Somewhat higher latency
Analytics aggregation service, and a Backup service.
can be anticipated as multi-document and cross-
In the Couchbase design, each of these access models
collection transactions are implemented.
can simultaneously utilize the cluster’s data.

WHITEPAPER 7
JSON DATA MODEL
AND ACCESS METHODS

This section outlines the foundational JSON data model


handling in Couchbase, then introduces the multiple
ways to access that data. These methods include basic
key-value operations, SQL++ querying, full-text searching,
real-time analytics, server-side eventing, and mobile
application synchronization.

JSON Document data model JSON Document Flexibility


The JSON data model supports basic and complex data In the Couchbase document model, a schema is the
types: numbers, strings, nested objects, and arrays. JSON result of an application’s structuring of its documents
provides rapid serialization and deserialization, is native and their containment structures such as Buckets, Scopes
to JavaScript, and is the most common REST API data and Collections. Schemas are defined by application
format. Consequently, JSON is extremely convenient for developers and managed by applications. This is in
web application programming. contrast to the relational model where the database
(and the database administrator) manages the schema.
Couchbase stores data as individual documents,
Couchbase created the Bucket-Scope-Collection-
comprised of a key and a value. When the value is JSON-
Document organizational hierarchy, explained below, so
formatted, Couchbase provides rich access capabilities;
as to allow maximum flexibility in defining application
when not, the document is stored as a binary BLOB and
data meta models. Notice how easily traditional RDBMS
has more limited access characteristics.
constructs map to those of Couchbase:
A document often represents a single instance of an
application object (or nested objects). It can also be
considered analogous to a row in a relational table,
Couchbase RDBMS
with the document attributes acting similar to a Bucket Database
column. Couchbase provides greater flexibility than the
Scopes Schema
rigid schemas of relational databases, allowing JSON
documents with varied schemas and nested structures. Collections Tables
Developers may express many-to-many relationships
Documents Rows
without requiring a reference or junction table.
Subcomponents of documents can be accessed and Key/Value pairs Cells
updated directly as well, and multiple document
schemas can be aggregated together into a virtual table A single JSON document’s structure offers even more
with a single query. flexibility for the developer beyond the dynamic nature
of Scopes and Collections. A JSON document’s structure
consists of its inner arrangement of attribute-value pairs.
For example, both of the following JSON document
examples are valid data models that Couchbase can
manage and query. How the documents are designed or
updated over time is up to the application developer –
normalized or denormalized, or a hybrid depending on the
needs and evolution of the application. Using JSON, the
developer can avoid lengthy schema design, testing and
deployment cycles of traditional RDBMS-based systems.

WHITEPAPER 8
Normalized — 4 documents Denormalized — 1 documents

invoice1: invoice1:
{ {
“BillTo”: “Lynn Hess”, “BillTo”: “Lynn Hess”,
“InvoiceDate”: “2018-01-15”, “InvoiceDate”: “2018-01-15”,
“InvoiceNum”: “ABC123”, “InvoiceNum”: “ABC123”,
“ShipTo”: “H. Trisler, 41 “ShipTo”: “H. Trisler, 41 Oak Drive”,
Oak Drive” “Items”: [
} { “Price”: “100”, “Product”: “Brake Pad”, “Quantity”: “24” },
{ “Price”: “10”, “Product”: Rotor”, “Quantity”: “5” }
invoice1:item1: { “Price”: “20”, “Product”: “Tire”, “Quantity”: “2” }
{ ]
“InvoiceId”: “1”, }
“Price”: “100”,
“Product”: “Brake Pad”,
“Quantity”: “24”
}

invoice1: item2
{
“InvoiceId”: “1”,
“Price”: “10”,
“Product”: “Rotor”,
“Quantity”: “5”
}

invoice1: item3
{
“InvoiceId”: “1”,
“Price”: “20”,
“Product”: “Tire”,
“Quantity”: “2”
}

Couchbase does not enforce uniformity: document structures can vary, even across multiple documents where each contains
a type attribute with a common value. This allows differences between objects to be represented efficiently. It also allows a
schema to progressively evolve for an application, as required – properties and structures can be added to the
document without other documents needing to be updated in the same way. This allows applications to
change their behavior without having to overhaul all the source data or take applications offline to make
a basic change.

Document access methods


Managing JSON data is at the core of Couchbase’s document database capabilities,
but there are several ways for applications to access the data. Each of these methods
is described in further detail later in this paper, but the following provides a basic
explanation and coding example of using it.

WHITEPAPER 9
Access method Description Example

An application provides a document ID Java:


Key-value (the key), Couchbase returns the associated JsonDocument myAirline =
JSON or binary object. The inverse occurs collection.get(“airline_5209”);
with a write or update request.

Python:
SQL-based query syntax to interact with
JSON data, similiar to relational databased,
cluster.query((
Query “““SELECT fname, lname, age
returns matching JSON results.
and Analytics FROM default
Comprehensive DML, DQL and DDL
WHERE age > $age
syntax supports nested data and
“““, age=22
non-uniform schema.
)

.NET
Using text analyzers with tokenization
await cluster.SearchQueryAsync(
and language awareness, a search is
“fts-index”,
made for a variety of field and boolean
Full-text new QueryStringQuery(“sushi”),
matching functions. Search returns
search options => {
document IDs, relevance scoring and
options.Limit(10);
optional context data.
}
);

Custom Javascript functions are executed JavaScript:


within the database as data changes or function OnUpdate(doc, meta){
based on timers. Support for accessing and log( d
̔ ocument’, doc);
Eventing updating data, writing out to a log or calling doc[“ip_num_start”]=
out to an external system. ip_trim_func(doc[“ip_start”]);
tgt[meta.id]=doc;

In addition to the above server functions, data can also be synchronized with mobile applications. Couchbase Mobile is the
end-to-end stack, comprised of Sync Gateway and Couchbase Lite, an embeddable instance of Couchbase. On a mobile
device or embedded system, data is created, updated, searched, and queried whether online or offline. The data can then be
synchronized with Couchbase Server and used by both mobile and server-based applications.

Java:
Couchbase Lite SDK provides common
MutableDocument mutableDoc = new
create, update, and delete tasks as well
MutableDocument()
as query, full-text search, and triggers on
.setString(“version”, “2.0”)
the device.
.setString(“type”, “SDK”);
Couchbase database.save(mutableDoc);
Sync Gateway keeps data updated with
Mobile a Couchbase Server database and other Swift (iOS):
devices. let newDoc = MutableDocument()
.setString(2.0, forKey: “version”)
.setString(“SDK”, forKey: “type”)
try database.saveDocument(newDoc)

WHITEPAPER 10
Key, values, and sub-documents
Keys and values are fundamental parts of JSON documents and have some limits that are important to understand.

Keys
Each value is identified by a unique key, or ID, defined by the user or application when the item is originally created. The key is
immutable: once the item is saved, the key cannot be changed.

Each key must be a UTF-8 string with no spaces. Special characters, such as (, %, /, “, and _, are acceptable, but the key may be no
longer than 250 bytes and must be unique within its bucket.

Values
The maximum size of a value is 20 MB. Each document consists of one or more attributes, each of which has its own value. An
attribute’s value can be a basic type, such as a number, string, or boolean; or a complex type, such as an embedded document
or an array.

JSON documents can be parsed, indexed, and queried by other Couchbase services. A value can also be any form of binary,
though it won’t be parsed, indexed, or queried.

Sub-documents
A sub-document is an inner component of a JSON document. The sub-document API uses a path syntax to specify attributes
and array positions to read/write. This makes it unnecessary to transfer entire documents over the network when only partial
modifications are required.

Document key: user200

Path Example Result


{
“age”: 21, age – a numeric value 21
“fav_drinks”: {
“soda”: [ “fizzzy”, “lemon” ] fav_drinks.soda – an array of strings fizzy,
} lemon
“addresses”: [
{ “street”: “pine” },
fav_drinks.soda[0] – first string in array fizzy
{ “street”: “maple” }
]
} addresses[1].street – string value in sec-
maple
ond part of array

Key Organizing Concepts for Documents


Flexible, dynamic data containment model
Couchbase offers a flexible multi-level data containment and • Ephereral Buckets: Ephemeral buckets exist in
organization structure to organize documents, which helps memory only. This allows the database to support
optimize cluster performance and facilitate horizontal scaling. applications where data is processed, but not
This data containment model consists of four levels: Buckets, permanently persisted.
Scopes, Collections and Documents, and maps easily to • Scope: Scopes are an intermediate data organization
familiar RDBMS constructs of databases, schema, tables and structure similar to a relational database schema.
rows. The model structure is further explained below: Scopes are defined by the collections of documents

• Buckets: The topmost container in Couchbase is that they contain or can access.
the Bucket. One or many Buckets can be defined • Collections: Collections are categorical or logically
and assigned to a Couchbase cluster. A Bucket is the organized groups of documents. The premise of
logical equivalent of a database in relational systems. Collections is to behave as traditional table structures.
• vBuckets: Virtual Buckets are how Couchbase creates Most group-oriented access activities are processed
and uses its data location map, by segmenting a at the Collection-level, to minimize full-database
bucket into 1024 vBuckets. vBuckets are distributed operations, simplify replication logic
across nodes in a cluster and replicated for availability and streamline indexing options.
and redundancy.

WHITEPAPER 11
Buckets
Buckets hold scopes, collections, and JSON documents –
these are the primary organizing structures in Couchbase.
Applications connect to a specific bucket that pertains
to their application scope, applications query data by
inquiring about documents within collections inside that
scope. Memory quotas are managed on a per-bucket and
per-service basis. Security roles are applied to users with
various bucket-level, scope-level, collection-level, and
document-level constraints.
Documents
In addition to standard Couchbase buckets, there are Documents are the foundational construct of Couchbase
two specialized bucket types useful for different use and conform to JSON structural standards.
cases. Ephemeral buckets do not persist data but allow
highly consistent in-memory performance, without
Cluster Design Concepts
disk-based fluctuations. This delivers faster node
Nodes
rebalances and restarts.
Couchbase nodes are physical or virtual machines that host
Memcached buckets also do not persist data. It is a legacy single instances of Couchbase Server. Multiple instances of
bucket type designed to be used alongside other database Couchbase Server cannot be installed on a node.
platforms specifically for in-memory distributed caching.
Clusters
Memcached buckets lack most of the core benefits of
A cluster consists of one or more nodes running Couchbase
Couchbase buckets, including compression.
Server. Nodes can be added or removed from a cluster.
vBuckets Replication of data occurs between nodes and cross
vBuckets are shards or partitions of data (and replicas) datacenter replication occurs between different clusters that
that are automatically distributed across nodes. Couchbase are geographically distributed.
automatically segments buckets into 1024 vBuckets. They
Services
are managed internally by Couchbase and not interacted
The core of Couchbase is the Data Service that feeds and
with directly by the user. The Couchbase SDK automatically
supports all the other systems and data access methods.
distributes data and workloads across vBuckets using this
There are multiple services, that offer different types of
information as the data location map for the application.
data access or processing including, Query, Indexing,
Scopes Backup, Full Text Search, Analytics and Eventing. A service
Scopes are a mid-level organizing structure in Couchbase is an isolated set of processes dedicated to particular
that simplifies support for isolating data and access to tasks. For example, indexing, search, or querying are each
that data for the purposes of supporting concepts such managed as separate services. One or more services can
as multi-tenancy or separating departmental access to be run on one or more nodes as needed.
sensitive data.

Collections
Collections offer a similar construct to relational tables,
in that they contain documents that are alike. Collections-
level processing enables Couchbase to isolate operations
on smaller data sets, hence improving performance.
Indexes are built and managed at the collection level, for
example, making queries not only fast but portable. Data
sharding and rebalancing is also a collection-level operation.
Collections, like tables, can be joined and filtered.

WHITEPAPER 12
COUCHBASE SERVICES

Couchbase implements the above data access methods Data Service and
through a set of dedicated services, with the Data Service Key/Value Engine
at its center. Each service has its own resource quotas The Data Service is the foundation for
and, where applicable, related indexing and inter-node storing data in Couchbase Server must run on
communication capabilities. This provides several very at least one node of every cluster – it is responsible for
flexible methods to scale services when needed – not caching, persisting, and serving data to applications and
just scaling up to larger machines or scaling out to more other services within the cluster. The principal component
nodes. Couchbase provides both options, as well as the of the data service architecture is the key-value
ability to scale specific services independently from one management system known simply as the KV Engine.
another. Multi-dimensional scaling is covered in a later
KV engine is composed of a multi-threaded, append-only
section but is the foundation for Couchbase workload
storage layer on disk with a tightly integrated managed
isolation and scaling capabilities.
object cache. The cache provides consistent low latency for
This is different than other platforms where a monolithic individual document read and write operations and streams
set of services are installed on every node in a cluster. documents to other services via DCP.
Instead, Couchbase uses a core data capability that then
Each node running the data service has its own KV engine
feeds all the other services. A shared-nothing architecture
process and is responsible for persisting and caching a
allows developer control over workload isolation. Small-
portion of the overall dataset (both active and replica).
scale environments can share the same workloads across
one or more nodes, while higher scale and performance Couchstore and Couchbase Magma,
can be achieved with dedicated nodes to handle specific High Data-Density Storage
workloads – the ultimate in scale-out flexibility. The cluster Couchbase has also introduced a new storage engine

can be scaled in or out and its service topology changed on- format which is defined as buckets are created. Users may

demand with zero interruption or change to the application. choose between the original Couchstore or the new high-
density storage engine, Magma. High-density storage is
Applications communicate directly with each service
the long-term preferred storage engine for the KV engine.
through a common SDK that is always aware of the
High-density storage has both compute and storage
topology of the cluster and how services are configured.
separation advantages, performance processing is up to
Developers do not have to know anything about how the
4x faster while utilizing up to 10x less memory, and storage
services and nodes are configured, the SDK gets all the
capacity holds 3x larger data sets per node. (from 3TB to
information it needs from the cluster manager. The same
10TB per cluster node). These advantages result in smaller
application can be deployed against a cluster of any size
affordable clusters, holding and processing more data,
or configuration without changing its behavior. Having
with higher processing throughput power.
this knowledge built into the SDK results in reduced
latency (direct application to database access) and less Magma combines the performance of a log-structured

complexity (no proxy or router components), better merge trees (LSM) with the compaction, reorganizability,

performance, and simplified auto-scaling. and immutability of sorted string tables (SSTables)
to provide a high performance in a well organized,
The core data service handles all the document mutations
low latency engine that suits write-heavy, low latency
and streams them to all the related services described
point lookup workloads. This design minimizes disk
below. The remainder of this section walks through each
space increases called “storage amplification” and the
service and describes how they work on a node, in a
accompanying complexity which occurs when documents
cluster, and with each other.
are heavily mutated without being reorganized.
Later in this paper, the Distributed Foundation section
will discuss inter-node connectivity, data flow, cluster
topology, and data streaming.

WHITEPAPER 13
Managed object cache
The managed object cache of each node hashes the Data can optionally be compressed by the client (SDK)
document into an in-memory hash table based upon prior to writing into a bucket, within memory of the bucket
the document ID (key). The hash table stores the key, and on disk. It is also compressed between nodes of the
the value, and some meta data associated with each cluster and to remote clusters.
document. Since the hash table is in memory and lookups
Compression modes
are fast, it offers a quick way of detecting whether the
Data is always compressed on disk, but each client and
document exists in memory or not.
bucket can control whether it is also compressed on the
The cache is both read-through and write-through: if a wire and/or in memory. The SDKs communicate whether
document being read is not in the cache, it is fetched from they will be sending or requesting compressed documents,
disk, and write operations are written to disk after being and the compression mode of the bucket determines what
first stored in memory. happens within the database. The modes are as follows:

Disk fetch requests are batched to the underlying Off – Documents are actively decompressed before storing
storage engine, and corresponding entries in the hash in memory. Clients receive the documents uncompressed.
table are filled. After the disk fetch is complete, pending
Passive – Documents are stored in memory both
client connections are notified of the asynchronous I/O
compressed and uncompressed in memory, depending on
completion so that they can complete the read operation.
how the client has sent them. Clients receive compressed
Document expiration documents if they are able and uncompressed if they are not.
Documents may also be set to expire using a time to live
Active – Documents are actively compressed on the
(TTL) setting. By default, all documents have a TTL of 0,
server, regardless of how they were received. Clients
meaning the document will be kept indefinitely. When you
receive compressed data whenever it is supported by the
add, set, or replace a document, you can specify a custom
client, even if it originally sent uncompressed data.
TTL, at which time the document becomes unavailable and
Compaction
is marked for deletion (tombstone) to be cleaned up later.
Couchbase writes all data that you append, update and
Memory management
delete as files on disk. This can eventually lead to gaps in
To keep memory usage of the cache under control,
the data file, particularly when you delete data. You can
Couchbase employs a background task called the item
reclaim the empty gaps in all data files by performing a
pager. This pager runs periodically (to clean up expired
process called compaction. For both data files and index
documents) as well as being triggered based on high and
files, perform frequent compaction of the files on disk to
low watermarks of memory usage. This high watermark
help reclaim disk space and reduce disk fragmentation.
is based on the memory quota for a given bucket and
Auto-compaction is enabled by default for all buckets, but
can be changed at runtime. When the high watermark
parameters can be adjusted for the entire cluster or for a
is reached, the item pager scans the hash table, ejecting
specific bucket in a cluster.
eligible (persisted) items that are not recently used (NRU).
Mutations
It repeats this process until memory usage falls below the
In Couchbase Server, mutations happen at a document
low watermark.
level. Clients retrieve the entire document from the server,
Compression
modify certain fields, and write the document updates
End-to-end data document compression is available
back to Couchbase.
across all features of the database using the open source
Snappy library. Client capabilities, and the compression
mode configured for each bucket, determine how
compression will run.

WHITEPAPER 14
APP

Doc 1

Replication Managed Cache


Doc 1
Queue

Disk Queue
To other
nodes/ clusters
Disk

Doc 1
NODE 1

Couchbase Server Asynchronous Architecture

When Couchbase receives a request to write a document, Delete) APIs, and provides the simplest interface when
the following occurs: accessing documents using their IDs.

1. Every server in a Couchbase cluster has its own managed The KV store contains the authoritative, most up-to-date state
object cache. The client writes a document into the for each item. Query, and other services, provide eventually
cache, and the server sends the client a confirmation. consistent indexes, but querying the KV store directly will
By default, the client does not have to wait for the always access the latest version of data. Applications use
server to persist and replicate the document as it the KV store when speed, consistency, and simplified access
happens asynchronously. patterns are preferred over flexible query options.

2. The document is added into the intra-cluster replication All KV operations are atomic, which means that Read and
queue to be replicated to other servers within the cluster. Update are individual operations. In order to avoid conflicts
that might arise with multiple concurrent updates to the
3. The document is added into the disk-write queue to
same document, applications may make use of Compare-
be asynchronously persisted to disk. The document is
And-Swap (CAS), which is a per-document checksum that
persisted to disk after the disk-write queue is flushed.
Couchbase modifies each time a document is changed.
4. After the document is persisted to disk, it’s replicated to
other clusters using XDCR and eventually indexed. Query service
The query service is an engine for processing SLQ++
Key-value data access (previously named, N1QL) queries and follows the same
While Couchbase is a document database, at its heart
scalability paradigm that all the services use which allows,
is a distributed key-value (KV) store. A KV store is
allowing the user to scale query workloads independently
an extremely simple, schema-less approach to data
of other services as needed.
management that, as the name implies, stores a unique
Couchbase SQL++, (SQL for JSON), combines the flexibility
ID (key) together with a piece of arbitrary information
of JSON with the expressive power of SQL. Couchbase
(value); it may be thought of as a hash map or dictionary.
SQL++ is an original implementation of the SQL++ standard.
The KV store itself can accept any data, whether it be a
SQL++ enables clients to access data from Couchbase
binary blob or a JSON document, and Couchbase features
using SQL-like language constructs. It includes a familiar
such as the SQL++ query service make use of the KV
data definition language (DDL), data manipulation language
store’s ability to process JSON documents.
(DML), and query language statements, but can operate
Due to their simplicity, KV operations execute with in the face of NoSQL database features such as key-value
extremely low latency, often sub-millisecond. The KV store storage, multi-valued attributes, and nested objects.
is accessed using simple CRUD (Create, Read, Update,

WHITEPAPER 15
SQL++ provides a rich set of features that let users retrieve, manipulate, transform, and create JSON document data. Its key
features include a powerful SELECT statement that extends the functionality of the SQL SELECT statement to work with JSON
documents. Of particular importance are the USE KEYS, NEST, and UNNEST sub-clauses of the FROM clause in SQL++ as well as
the MISSING boolean option in the WHERE clause.

The following examples illustrate three sample queries – showing common SQL capabilities and the JSON responses.

SELECT c.name, o.order_date {


SQL++ supports standard SE-
FROM customers AS c “results”: [
LECT, FROM, WHERE, GROUP
LEFT OUTER JOIN orders AS o {
BY clauses as well
ON c.custid = o.custid “name”: “R. Duvall”,
as JOIN capabilities
WHERE c.custid = “C41”; “order _ date”: “2017-09-02”
}
]
}

Use UNNEST to extract SELECT o.orderno, {


individual items from a i.itemno AS item_number, “results”: [
nested JSON array i.qty AS quantity {
FROM orders AS o “orderno”: 1002,
UNNEST o.items AS i “item _ number”: 680,
“quantity”: 150
WHERE i.qty > 100;
},
{
“orderno”: 1005,
“item _ number”: 347,
“quantity”: 120
},
{
“orderno”: 1006,
“item _ number”: 460,
“quantity”: 120
}
]
}

{
Use MISSING boolean keyword SELECT o.orderno, SUM(o.cost) AS
“results”: [
in WHERE clause to adapt cost {
queries when a schema has FROM orders AS o “orderno”: 1002,
changed or lacks specific keys WHERE o.cost IS NOT MISSING “cost”: 220
GROUP BY o.orderno; },
{
“orderno”: 1005,
“cost”: 623
}
]
}

ACID Transactions in SQL++


Couchbase supports the ability to define ACID transactions within SQL++. Transactions can be applied to one or more
documents, and can span one or more collections and across cluster nodes. Due to the guarantees required by ACID,
transactions may execute more slowly than non-transactional queries in Couchbase, but this flexibility is available to the
developer. Transactions in SQL++ have adopted a near identical syntax to SQL for relational databases.

WHITEPAPER 16
Compare SQL Transactions to Couchbase SQL++

TYPICAL RDBMS COUCHBASE 7


START TRANSACTION; START TRANSACTION;

UPDATE customer SET balance + 100 WHERE cid = 4872; UPDATE customer SET balance = balance + 100 WHERE cid = 4872;

SELECT cid, name, balance FROM customer; SELECT cid, name, balance FROM customer;

SAVEPOINT s1; SAVEPOINT s1;

UPDATE customer SET balance = balance – 100 WHERE cid = 1924; UPDATE customer SET balance = balance – 100 WHERE cid = 1924;

SELECT cid, name, balance FROM customer; SELECT cid, name, balance FROM customer;

ROLLBACK WORK TO SAVEPOINT s1; ROLLBACK WORK TO SAVEPOINT s1;

SELECT cid, name, balance FROM customer; SELECT cid, name, balance FROM customer;

COMMIT; COMMIT;

Transactions can also be built into and managed by the application using the Java SDK.

Cost-based Query Optimization


The query service uses a cost-based query optimizer to take advantage of indexes that are available. Index nodes can handle
much of the data aggregation pipeline as well, so that less data is sent back to the query node for processing. Cost-based
optimization for queries accessing JSON structures was patented by Couchbase in 2021.

Index service
Secondary indexing is an important part of making queries run efficiently and Couchbase provides a robust set of index types
and management options. The index service is responsible for all of the maintenance and management tasks of indexes, known
as Global Secondary Indexes (GSI). The index service monitors document mutations to keep indexes up to date, using the
database change protocol stream (DCP) from the data service. It is distinct from the query service, allowing their workloads to
be isolated from one another where needed.

The following is a sample of some of the types of indexes supported by the index service:

• Primary – indexes whole bucket or collection using the document key

• Secondary – indexes a scalar, object, or array using a key-value

• Composite/Covered – multiple fields stored in an index or an array index

• Functional – secondary index that allows functional expressions instead of a simple key-value

• Array – an index of array elements ranging from plain scalar values to complex arrays or JSON
objects nested deeper in the array

• Adaptive – secondary array index for all or some fields of a document without having to
define them ahead of time

• Flex – Flex indexes are used for queries containing compound selection criteria such as those
derived from form-base applications with multi-select options. Flex Index uses the inverted indexes
from the Full Text Search engine to accommodate unknown values.

WHITEPAPER 17
Index Advisor
Is a built-in query command, ADVICE, that interrogates the database for which index or GSI to use or build given the object and
predicate selections contained in the query statement.

Data Service Index Service


bucket 1 bucket 2
Index 1 Index 2

Index 3 Index 4
DCP
streaming
Index 5 Index 6

Query
Projector and Router Supervisor
Service

Data Service Node Index Service Node Query Service Node

Query consistency
Under the hood, Couchbase indexes are updated Indexes are updated as fast as possible, regardless of
asynchronously after the data has been changed by the query consistency requirements. Even a query requesting
application. In comparison to other database technologies, strong consistency may return extremely quickly if its
this allows for much higher write throughput but indexes are not processing large volumes of updates.
introduces the possibility of inconsistency between the
Memory-optimized indexes (MOI)
data and its indexes. Couchbase therefore provides several
Memory-optimized indexes use a skip list structure as
levels of control over query consistency, allowing the
opposed to B-tree indexes, optimizing memory
application to choose between faster queries (ignoring
consumption and concurrent processing of index updates
pending mutations) and stronger consistency.
and scans. MOI provide the most optimized index for
The following consistency levels are specified per-query, processing high-velocity mutations and high-frequency
allowing for even finer grained and dynamic control of scans. MOI is essentially “in-memory” and therefore
these trade offs: requires enough RAM to store all indexes.

• not_bounded (default) – Return the query response


immediately, knowing that some data may still be flowing
through the system. This consistency level is useful for
queries that favor low latency over consistency.

• at_plus – Block the query until its indexes have been


updated to the timestamp of the last update, knowing
that some updates may have happened since but don’t
need to be consistent. This is for “read-your-own-write”
semantics by a single application process/thread.

• request_plus – Block the query until its indexes are


updated to the timestamp of the current query request.
This is a strongly consistent query and is used to
maintain consistency across applications/processes.

WHITEPAPER 18
Search service
The Search service is an engine for performing Full-Text Couchbase developed Bleve, the open
Searches (FTS) on the JSON data stored within a bucket source, Go-based search project, for the
or a collection. FTS lets you create, manage, and query FTS capabilities, including language support, scoring, etc.
inverted indexes for searching of free-form text within a
Eventing service
document. The service provides analyzers that perform
The eventing service supports custom server-side
several types of operations including multi-language
functions (written in JavaScript) that are automatically
tokenization, stemming, and relevance scoring.
triggered using an Event-Condition-Action model.
Search nodes incorporate both an indexer and query These functions receive data from the DCP stream for a
processor, much like the query and index services, except particular bucket or collection and execute code when
these don’t run on separate nodes – both workloads run triggered by data mutations. Similar to other services, the
on each search node. eventing service scales linearly and independently.

As with the other services, data nodes use the DCP Eventing service offers both “change data capture” like
stream to send mutations to the FTS indexer process for features found in event handlers as well as multi-channel data
index updating whenever data changes. Index creation is streaming features found in solutions such as Apache Kafka.
highly configurable through a JSON index definition file,
Code processes the source data and commits it as a new
Couchbase SDK, or through a graphical web interface as
or updated document in another bucket.
part of the administration console.
The core of eventing functions is a Google V8 execution
Documents can be indexed differently depending on a
container. Functions inherit support for most of the
document type attribute, a document ID, or the value
standard ECMAScript constructs that are available through
of a designated attribute. Each index definition can be
V8. Some capabilities have been removed to support
assigned its own set of analyzers and specific analyzers
the ability to shard and scale execution automatically.
can be applied to indexes for a subset of fields.
Additionally, to optimize language utilization of the server
Indexes are tied to a specific bucket or collection, but it environment, some new constructs have been added.
is possible to create virtual index aliases that combine
indexes from multiple buckets or collections into a single
seamless index. These aliases also allow application
developers to build new indexes and quickly change over
to new ones without having to take an index offline.

Searching and indexing use the same set of analyzers


for finding matching data. All data, when indexed, flows
through the analyzer steps as defined by the index. Then
search requests are received and passed through the
same steps – for example, tokenization, removing stop
words, and stemming terms. These analyzed search
requests are then looked up by the indexer in the index
and matches are returned. The results include the source
request, list of document IDs, and relevance scoring.

Other indexing and search-time options provide fine-


grained control over indexing more or less information
depending on the use case. For example, text snippets
may also be stored in the index and included in the
search response so that retrieving full documents from
the data service is not required.

WHITEPAPER 19
Code for functions is written in a web-based JavaScript code editor and features an extensive in-browser debugging environment.

Analytics
The analytics service provides an ad hoc querying capability without the need for indexes, bringing a hybrid operational and
analytical processing (HOAP) model for real-time and operational analytics on the active JSON data within Couchbase. It uses
the same SQL++ language as the query service.

It is designed to efficiently run complex queries over a large number of documents, including query features such as ad hoc
joins, set, aggregation, and grouping operations. In a typical operational or analytical database, any of these kinds of queries
may result in inefficiencies: long running queries, I/O constraints, high memory consumption, and/or excessive network
latency due to data fetching and cross-node coordination.

Because the service supports efficient parallel query processing and bulk data handling, and runs on separate nodes, it is
often preferable for expensive queries, even if the queries are predetermined and could be supported by an operational index.
With traditional database technology it is important to segregate operational and analytic workloads. This is usually done
by batch exporting of data from operational databases into an analytic database or warehouse that does further processing.
Couchbase provides both the operational database as well as a scalable analytics database – all in one NoSQL platform.

MPP Architecture:
Shadow Dataset of
Parallelization Among
a Couchbase Bucket
Core and Servers

ANALYTICS
DATA

+ =
ANALYTICS

DATA

ANALYTICS

DATA

ANALYTICS

Fast Ingest Complex Queries Real-time Insights


on Large Datasets for Business Teams

Data is pushed from the DCP stream into what are known as shadow buckets – copies of data that are processed and immediately
ready for analysis by a dedicated massively parallel processing (MPP) analytics engine. As shadowed data is linked directly to the
operational data in real time, queries do not affect the performance of that operational data. You can add more analytics nodes to
reduce analytics query time.

WHITEPAPER 20
The Couchbase Analytics approach has significant advantages compared to the commonly employed alternatives:

Common data model – supports the same flexible document data model and schema used for operational data with no
transformation required for analysis.

Workload isolation – operational query latency and throughput are protected from slowdowns due to your analytical query
workload – but without the complexity of operating a separate analytical database.

High data freshness – the DCP stream provides a fast memory-to-memory protocol that nodes use to synchronize data among
themselves, allowing analytics to run on data that’s extremely current, without extract/load or other hassles and delays.

High Availability – The Analytics service support failover and recovery to provide high availability for its execution.

Native Tableau integration – Analysts can examine the results of SQL++ queries in Tableau, a popular business intelligence and
visualization tool.

Integrations with data processing and data pipeline tools.

Consolidation from multiple Couchbase Clusters Analytics service can link to active data among multiple Couchbase clusters, and
be configured as an “Analytic cluster” who’s sole purpose is to perform analysis upon data provided from other clusters.

Remote Links allow data ingestion from standard data files stored in AWS S3, including Parquet file formats, in Microsoft
Azure Blob storage, or from Google Cloud storage. This allows analysts to enrich the analytic data set without requiring
expensive ETL exercises.

Mobile and the edge App Services


Couchbase Mobile brings the power of a NoSQL database beyond the edge. It includes Couchbase Lite, an embedded NoSQL
JSON document store that supports a SQL-based query API and a C API, and Sync Gateway, a synchronization gateway service
responsible for synchronizing data across clients and the cloud, and for enforcing access control policies, authentication,
authorization, and data routing. With Couchbase Mobile you can build offline-first mobile applications that are responsive and
always available with enterprise-grade end-to-end security and scalability.

COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER

Client WAN Middle Tier LAN Storage

Lightweight embedded NoSQL Secure web gateway with Highly scalable, highly available,
database with full CRUD and synchronization, data access, and high performance NoSQL
query functionality. data integration APIs for accessing, database server.
integrating, and synchronizing
data over the web.

Security

Built-in enterprise-level security throughout the entire stack includes user authentication,
data access control, secure transport, and full database encryption.

WHITEPAPER 21
Couchbase Mobile has several important features that help to build powerful applications with embedded databases:

Offline and offline-first – allows applications to be always- Data recovery on the edge – on-device replicas provide
on, and able to run and store data on device until the high availability and disaster recovery on edge devices.
network becomes available again. Once restored, devices Recover data from a damaged device to a backup.
can sync back to the cloud datacenter.
On-device encryption – delivers end-to-end security to
Peer-to-peer synchronization – connect and exchange business-critical mobile apps by encrypting the local
data between mobile devices when there is unreliable embedded database to better protect data at rest.
network connectivity; keep working regardless of
Deploy on premises and in any cloud – enterprise users
network availability.
can deploy the same stack on premises or in the cloud so
Delta sync – significantly reduces network bandwidth developers and QA teams can be more productive. Support
by only synchronizing changed data. Used between hybrid cloud deployments for edge devices operating
applications and Sync Gateway, or between clients using autonomously without limited connectivity.
peer-to-peer synchronization.
Multi-platform – Couchbase Lite is supported on iOS,
Client SDKs – Couchbase Lite supports applications written Android, and .NET, including desktop and Windows mobile
in C, Objective-C, Java, Swift and Kotlin. Couchbase Lite apps. The powerful SQL-based fluent query API allows
runs on Linux, Windows, iOS and Android. developers to implement powerful business logic within
their mobile apps.

DISTRIBUTED FOUNDATION

The foundation of Couchbase is a clustering approach that Couchbase node can achieve maximum throughput and
provides flexible options for scaling while maintaining resource utilization and minimum latency.
performance and availability. This scaling approach is applied
Nodes can be added or removed easily through a rebalance
across all levels of the cluster, providing high-performance
process, which redistributes the data evenly across all
flexibility for nodes, services, buckets, and vBuckets,
nodes. The rebalance process is done online and requires
scopes, and collections.
no application downtime, and can be initiated at the click
Node-level architecture of a button or one command on the command line.
A Couchbase cluster consists of a group of interchangeable, Couchbase Server is typically deployed on a cluster of
largely self-sufficient nodes that operate in a peer-to-peer commodity servers, virtual machines or cloud instance,
topology. There is just one Couchbase node type, though although for development purposes all functionality can
the services running on that node can be managed as be run on a single node. An application can be developed
required (see Multi-Dimensional Scaling). on a small scale, even on a laptop, and then deployed to a
Having a single-node type greatly simplifies the installation, distributed cluster without any architecture or behavioral
configuration, management, and troubleshooting of a changes to that application.
Couchbase cluster, both in terms of what you must do as
a human operator and what the automatic management
needs to do. There is no concept of master nodes, slave
nodes, config nodes, name nodes, or head nodes.

Components of a Couchbase node include the cluster


manager and, optionally, the data, query, index, analytics,
search, eventing and backup services. There is also the
underlying managed cache and storage components. By
dividing up potentially conflicting workloads in this way, a

WHITEPAPER 22
Capacity can be increased or decreased simply by adding or removing nodes. In this way
a cluster can grow CPU, RAM, disk, and network capacity by adding physical servers or
virtual machines that have the exact same software installed.

The maximum capacity managed by a single node can be up to 10TB using high-density
storage engine and allowing the storage capacity to exceed the node’s memory size.
Memory sizing and allocation also effect the processing capacity of a node, as do the
performance requirements of the services it hosts. Maximum manageable capacity of
nodes running versions earlier than Couchbase Server 7.1 may support up to 3TB of data.

Cluster architecture
A cluster consists of one or more instances of Couchbase Server, each running on an independent node.
Data and services are shared across the cluster.

The following figure shows application servers communicating with the cluster overall, but because they are also aware
of the individual node topology, they can adapt as needed. For example, if a replica needs to be read, the application server
can access it directly because it knows about the overall cluster configuration.

APPLICATION SERVER APPLICATION SERVER

CLIENT LIBRARY CLIENT LIBRARY

CLUSTER MAP CLUSTER MAP

Cluster Cluster Cluster Cluster Cluster Cluster


Manager Manager Manager Manager Manager Manager

Data Service Data Service Data Service Data Service Data Service Data Service

Index Service Index Service Index Service Index Service Index Service Index Service

Query Service Query Service Query Service Query Service Query Service Query Service

Search Service Search Service Search Service Search Service Search Service Search Service

Managed Cache Managed Cache Managed Cache Managed Cache Managed Cache Managed Cache

Storage Storage Storage Storage Storage Storage

Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 Couchbase Server 4 Couchbase Server 5 Couchbase Server 6

Server Cluster

The various services that Couchbase provides are also fed data through the DCP stream, which is internally used for sending
new/changed data to these services as well as providing the basis for keeping data in sync between nodes in the cluster.

Cluster/node configuration
When Couchbase is being configured on a node, it can be specified either as its own, new cluster, or as a participant in an existing
cluster. Thus, once a cluster exists, successive nodes can be added to it. When a cluster has multiple nodes, the Couchbase
cluster manager runs on each node: this manages communication between nodes, and ensures that all nodes are healthy.

Services can be configured to run on all or some nodes in a cluster, and can be added/removed as warranted by established
performance needs. For example, given a cluster of five nodes, a small dataset might require the data service on only one of the
nodes; a large dataset might require four or five. Alternatively, a heavy query workload might require the query service to run on

multiple nodes, rather than just one. This ability to scale services individually promotes optimal hardware resource utilization.

WHITEPAPER 23
Cluster manager
The cluster manager supervises server configuration and orchestrator immediately so that the cluster continues to
interaction between servers within a Couchbase cluster. operate without disruption.
It is a critical component that manages replication and
In addition to the cluster orchestrator, there are three primary
rebalancing operations in Couchbase. Although the cluster
cluster manager components on each Couchbase node:
manager executes locally on each cluster node, it elects a
The heartbeat watchdog – periodically communicates
clusterwide orchestrator node to oversee cluster conditions
with the cluster orchestrator using the heartbeat protocol,
and carry out appropriate cluster management functions.
providing regular health updates for the server. If the
If a machine in the cluster crashes or becomes unavailable,
orchestrator crashes, existing clustera server nodes will
the cluster orchestrator notifies all other machines in
detect the failed orchestrator and elect a new orchestrator.
the cluster, and promotes to active status all the replica
partitions associated with the server that’s down. The The process monitor – monitors local data manager

cluster map is updated on all the cluster nodes and the activities, restarts failed processes as required, and

clients. This process of activating the replicas is known contributes status information to the heartbeat process.

as failover. You can configure failover to be automatic The configuration manager – receives, processes, and
or manual. Additionally, you can trigger failover through monitors a node’s local configuration. It controls the cluster
external monitoring scripts via the REST API. map and active replication streams. When the cluster starts,
If the orchestrator node crashes, existing nodes will the configuration manager pulls configuration of other
detect that it is no longer available and will elect a new cluster nodes and updates its local copy.

Client connectivity
To talk to all the services of a cluster, applications use the Couchbase SDK. Support is available for a variety of languages
including Java, Scala, .NET, PHP, Python, Go, Node.js, and C/C++. These clients are continually aware of the cluster topology
through cluster map updates from the cluster manager. They automatically send requests from applications to the appropriate
nodes for KV access, query, etc.

When creating documents, clients apply a hash function (CRC32) to every document that needs to be stored in Couchbase, and
the document is sent to the server where it should reside. Because a common hash function is used, it is always possible for a
client to determine on which node the source document can be found.

Doc ID/Key

CRC32

vBucket/Partition ID

1 2 3 1024

Cluster map

Node 1 Node 2 Node 3

COUCHBASE SERVER CLUSTER

WHITEPAPER 24
Topology-aware client Multi-Dimensional Scaling (MDS)
After a client first connects to the cluster, it requests the Couchbase MDS features improve performance and
cluster map from the Couchbase cluster and maintains an throughput for mission-critical systems by enabling
open connection with the server for streaming updates. The independent scaling of data, query, and indexing
cluster map is shared with all the servers in a Couchbase workloads. Scale-out and scale-up are the two scalability
cluster and with the Couchbase clients. Data flows from a models typical for databases – Couchbase takes
client to the server using the following steps: advantage of both. There are unique ways to combine

1. An application interacts with an application, resulting and mix these models in a single cluster to maximize
in the need to update or retrieve a document in throughput and latencies. With MDS, admins can achieve
Couchbase Server. both the existing homogeneous scalability model and the
newer independent scalability model.
2. The application server contacts Couchbase Server
via the smart client SDKs. Homogeneous scaling model
3. The client SDK takes the document that needs to be To better understand the multi-dimensional scaling model,
updated and hashes its document ID to a partition ID. it is beneficial to take a look at a typical homogeneous
With the partition ID and the cluster map, the client scaling model.
can figure out on which server and on which partition
In this model, application workloads are distributed equally
this document belongs. The client can then update
across a cluster made up of the homogeneous set of nodes.
the document on this server.
Each node that does the core processing takes a similar
4. When a document arrives in a cluster, Couchbase slice of the work and has the same hardware resources.
Server replicates the document, caches it in memory
and asynchronously stores it on disk. This model is available through MDS and is simple to
implement but has a couple drawbacks. Components
Data transport via Database Change processing core data operations, index maintenance,
Protocol (DCP) or executing queries all compete with each other for
DCP is the protocol used to stream bucket-level mutations. resources. It is impossible to fine-tune each component
It is used for high-speed replication of data as it mutates because each of them has different demands on
– to maintain replica vBuckets, global secondary indexes, hardware resources. This is a common problem with other
full-text search, analytics, eventing, XDCR, and backups. NoSQL databases. While the core data operations can
Connectors to external services, such as Elasticsearch, benefit greatly from scale-out with smaller commodity
Spark, or Kafka are also fed from the DCP stream. nodes, many low latency queries do not always benefit

DCP is a memory-based replication protocol that is ordering, from wider fan-out.

resumable, and consistent. DCP stream changes are made in


memory to items by means of a replication queue.

An external application client sends the operation requests


(read, write, update, delete, query) to access or update
data on the cluster. These clients can then receive or send
data to DCP processes running on the cluster. External
data connectors, for example, often sit and wait for DCP
to start sending the stream of their data when mutations
start to occur.

Whereas an internal DCP client, used by the cluster itself,


streams data between nodes to support replication,
indexing, cross datacenter replication, incremental backup,
and mobile synchronization. Sequence numbers are used to
track each mutation in a given vBucket, providing a means
to access data in an ordered manner or to resume from a
given point in time.

WHITEPAPER 25
Independent scaling model
MDS is designed to minimize interference between services. When you separate the competing workloads into independent
services and isolate them from each other, interference among them is minimized. The figure below demonstrates a deployment
topology that can be achieved with MDS. In this topology, each service is deployed to an independent zone within the cluster.

node1 node8

node1 node8
Query Index
Data Service
Service Service
Query Index
Data Service
Service Service

Each service zone within a cluster (data, query, and index services) can now scale independently so that the best computational
capacity is provided for each of them.

Query Index node8 node9


Service Service
Query Index node8 node9
Service Service
Data Service

Data Service

In the figure above, the blue additions signify the direction of scaling for each service. In this case, query and index services
scale-up over the fewer sets of powerful nodes and data service scales out with an additional node.

Data distribution
Couchbase partitions data into vBuckets (synonymous to shards or partitions) to automatically distribute data across nodes,
a process sometimes known as auto-sharding.

DOCUMENTS BUCKETS CLUSTER


FROM/WRITTEN UTED ACROSS
AD TRIB TH
RE TO DIS E

User/application data Logical key spaces Dynamically scalable

APP SIDE SERVER SIDE

WHITEPAPER 26
vBuckets help enable data replication, failover, and dynamic Partitioning other services
cluster reconfiguration. Unlike data buckets, users and All Couchbase services partition data and workloads
applications do not manipulate vBuckets directly. Couchbase across available nodes. Full-text search, eventing, and
automatically divides each bucket into 1024 active vBuckets analytics services partition and replicate data and
and 1024 replica vBuckets per replica, and then distributes processes. They also all redistribute these across new
them evenly across the nodes running the data service cluster topologies when rebalancing occurs.
within a cluster. vBuckets do not have a fixed physical
• Search service – automatically partitions its indexes
location on nodes; therefore, there is a mapping of vBuckets across all search nodes in the cluster, ensuring that
to nodes known as the cluster map. Through the Couchbase during rebalance, the distribution across all nodes
SDK, the application automatically and transparently is balanced
distributes the data and workload across these vBuckets.
• Eventing service – vBucket processing ownership is
Index partitions and replicas distributed across eventing nodes
Global secondary indexes (GSI) can also be partitioned • Analytics service – a single copy of all analytics data is
using several approaches depending on the pattern of partitioned across all cluster nodes that run the service
queries that need to be supported. Commonly, GSI are
stored identically on each index node so that any node
can help with queries. But in some cases it is better for
performance or storage management to spread the indexes
across several nodes. There are many options for creating
these partitions – different attributes in a document can
be used as a hash key (one or more), partitions can also be
assigned to specific nodes, and more.

As documents are added and updated, those mutations


are streamed from memory to a local projector process
which then forwards them to the relevant index service
node(s). Every index node has a supervisor process
running the index service. This process listens to changes
from the projector processes on the data nodes. It evaluates
the incoming stream of changes for the specific indexes
created on the node running the index service. Each index
is then updated independently with that data.

Individual indexes can be automatically replicated to other


nodes in the cluster to achieve high availability, ensuring
that an index continues to function even if a node hosting
the index is unavailable. Queries will load balance across
the indexes and if one of the indexes become unavailable,
all requests are automatically rerouted to the available
remaining index without application or admin intervention.

Similarly, Couchbase supports independent partitioning of


indexes to distribute the data volumes and load of a single
index across multiple processes and/or nodes.

WHITEPAPER 27
Rebalancing the cluster
When the number of servers in the cluster changes due to scaling out or node failures, data partitions must be redistributed.
This ensures that data is evenly distributed across the cluster, and that application access to the data is load balanced evenly
across all the servers. This process is called rebalancing. All Couchbase services are rebalance-aware and follow their own set
of internal processes for rebalance as needed.

Rebalancing is triggered using an explicit action from the admin web UI or through a REST call. When initiated, the rebalance
orchestrator calculates a new cluster map based on the current pending set of servers to be added and removed from the
cluster. It streams the cluster map to all the servers in the cluster. During rebalance, the cluster moves data via partition
migration directly between two server nodes in the cluster. As the cluster moves each partition from one location to another,
an atomic and consistent switchover takes place between the two nodes, and the cluster updates each connected client library
with a current cluster map.

Throughout migration and redistribution of partitions among servers, any given partition on a server will be in one of three states:

• Active – the server hosting the partition is servicing all requests for this partition.

• Replica – the server hosting the partition cannot handle client requests, but can receive replication commands. Rebalance
marks destination partitions as replica until they are ready to be switched to active.

• Dead – the server is not in any way responsible for this partition.

The node health monitor receives heartbeat updates from individual nodes in the cluster, updating configuration and raising alerts
as required. The partition state and replication manager is responsible for establishing and monitoring the current network of
replication streams.

High availability Couchbase and CP Theorem


“Couch,” in name Couchbase, is actually an original acronym Couchbase is usually considered as a CP system, especially
that stands for, “Clusters Of Unreliable Commodity Hardware.” when deployed as a single cluster. The letters in the CP
We highlight this to remind readers that scalability, reliability acronym represent:
and performance are foundational characteristics of the
• Consistency (the latest information is always available
database. To meet high availability requirements, all
everywhere)
Couchbase maintenance operations can be done while the
• Partitioning Tolerance (which can be thought of as a form
system remains online, without requiring modifications
of fault tolerance)
or interrupting running applications. The system never
needs to be taken offline for routine maintenance such The CP theorem states that a database cannot
as software upgrades, data rebalancing, index building, simultaneously provide all three of the above guarantees.
compaction, instance refreshes, or any other operation. Practically speaking, most NoSQL databases are forced
Even provisioning or removing nodes can be done entirely to choose whether to favor consistency or availability in
online without any interruption to running applications, and specific scenarios. For local clusters, Couchbase favors
without requiring developers to modify their applications. consistency among cluster members, especially when

Built-in fault tolerance mechanisms protect against the cluster is supporting SQL transactions. However when

downtime caused by arbitrary unplanned incidents, creating geographically dispersed, edge or mobile systems,

including server failures. Replication and failover are Couchbase can be configured to favor AP when ensuring

important mechanisms that increase system availability. local data availability and durability are more important

Couchbase replicates data across multiple nodes to support than systemic consistency.

failover and durability. Ensuring that additional copies of the


data are available is automated to deal with the inevitable
failures that large distributed systems are designed to
recover from. All of this is done automatically without need
for manual intervention or downtime.

WHITEPAPER 28
Intra-cluster replication
Up to three replica buckets can be defined for every bucket. Each replica itself is also implemented
as 1024 vBuckets. A vBucket that is part of the original implementation of the defined bucket is referred to
as an active vBucket. Therefore, a bucket defined with two replicas has 1024 active vBuckets and replica vBuckets.
Typically, only active vBuckets are accessed for read and write operations: although vBuckets are able to support read requests.
Nevertheless, vBuckets receive a continuous stream of mutations from the active vBucket by means of DCP, and are thereby kept
constantly up to date.

To ensure maximum availability of data in case of node failures, the master services for the cluster calculate and implement the
optimal vBucket distribution across available nodes: consequently, the chance of data loss through the failure of an individual
node is minimized, since replicas are available on the nodes that remain.

Active and replica vBuckets correspond to a single, user-defined bucket, for which a single replication instance has been
specified. No replica resides on the same node as its active equivalent.

APP SERVER 1 APP SERVER 2 APP SERVER N


CLIENT LIBRARY CLIENT LIBRARY CLIENT LIBRARY

CLUSTER MAP CLUSTER MAP CLUSTER MAP

ACTIVE ACTIVE ACTIVE

REPLICA REPLICA REPLICA

Couchbase Server 1 Couchbase Server 2 Couchbase Server N

Server Cluster

When a node becomes unavailable, failover can be replicas exist. For example, in a five-node cluster with one
performed and the cluster manager is instructed to read replica, a single node can be failed over without danger,
and write data only on available nodes. Failover can be but if a second node fails, failover might result in data
performed by manual intervention, or automatically, loss, due to required replicas no longer being available.
promoting replica vBuckets to active status when needed. Similarly, in a five-node cluster with two replicas, two
Automatic failover can be performed for one node at a nodes can be failed over without danger, a third cannot. In
time, and only up to a configurable number of times, the such cases, manual recovery process is required. Failures
maximum being three. And multi-node failover can be of multiple nodes or instances can be configured based on
configured to tolerate larger scale failures such as multiple the number of failures that can be tolerated.
nodes in a racks or VMs on a host all fail at once, failover
Node failover
can switch to vBuckets or collections on an available node.
Failover is the process in which a node of a Couchbase
The cluster manager never performs automatic failover cluster is removed quickly as opposed to intentional
where data loss might result. The number of times failover removal and rebalancing.
can be safely performed depends on how many nodes and

WHITEPAPER 29
Auto-failover allows unresponsive servers to be failed and will, therefore, not failover the node. The monitoring
over automatically by the cluster manager. Data partitions system can also determine if the components around
in Couchbase are always served from a single master Couchbase Server are functioning and if the various nodes
node. As a result, if that node is down, the data will not be in the cluster are healthy.
available until restored. The server will either need to be
If the monitoring system determines the problem is
manually or automatically failed over in order to promote
only with a single node and remaining nodes in the cluster
replica data partitions on replica servers to active data
can support aggregate traffic, then the system may
partitions who become a new master, so that they can be
safely failover the node using the REST API or
accessed by the application.
command-line tools.
The administrator will not always be able to manually fail
Automatic failover
servers over quickly enough to avoid application downtime,
The cluster manager handles the detection, determination,
so Couchbase provides an auto-failover feature. This feature
and initiation of the processes to failover a node without
allows the cluster manager to automatically fail over nodes
user intervention. Once the problem has been identified
that are down and bring the cluster back to a healthy state
and fixed, it still requires you to initiate a rebalance to
as quickly as possible.
return the cluster to a healthy state.
In Couchbase Server Enterprise Edition nodes can also be
Server group awareness
automatically failed over when the data service reports
Server group awareness provides enhanced availability.
sustained disk I/O failures.
Specifically, it protects a cluster from large-scale
Failover choices infrastructure failure through the definition of groups.
As a node failover has the potential to reduce the Each group is created by an appropriately authorized
performance of your cluster, you should consider how administrator, and specified to contain a subset of the
best to handle a failed node situation and also size your nodes within a Couchbase cluster. Following group
cluster to plan for failover. definition and rebalance, the active vBuckets for any
defined bucket are located on one group, while the
Manual or monitored failover
corresponding replicas are located on another group.
Manual failover is performed by either human monitoring
This allows group failover to be enabled, so that if an
or by using a system external to the cluster. An external
entire group goes offline, its replica vBuckets, which
monitoring system can monitor both the cluster and the
remain available on another group, can be automatically
node environment so that you can make a more
promoted to active status.
data-driven decision.
Groups should be defined in accordance with the
Human intervention
physical distribution of nodes. For example, a group
Humans are uniquely capable of considering a wide
should only include the nodes that are in a single server
range of data, observations, and experiences to resolve
rack, or in the case of cloud deployments, a single
a situation in the best possible way. Many organizations
availability zone. Thus, if the server rack or availability
disallow automated failover because they want a human to
zone becomes unavailable due to a power or network
consider the implications. Human intervention tends to be
failure, group failover can allow continued access to the
slower than using a computer-based monitoring system.
affected data.
External monitoring
Data protection is optimal when groups are assigned equal
Another option is to have a system monitoring the cluster
numbers of nodes, and vBuckets are therefore distributed
via the Couchbase REST API. Such an external system
such that none ever occupies the same group as its associated
can failover nodes successfully because it can take into
active vBucket. By contrast, when groups are not assigned
account system components that are outside the scope of
equal numbers of nodes, rebalance can only produce a best
Couchbase Server.
effort redistribution of replica vBuckets. This may result in
For example, monitoring software can observe that a
replica vBuckets occupying the same group as
network switch is failing and that there is a dependency
their associated active vBuckets; meaning
on that switch by the Couchbase cluster. The system can
that data may be lost if such a
determine that failing nodes will not help the situation
group becomes unavailable.

WHITEPAPER 30
Server group awareness considerations:

• Server group awareness only applies to the data service.

• Failover should be enabled for server groups only if three or more server groups
have been established, and sufficient capacity exists to absorb the load of any
failed-over group.

• The first node, and all subsequent nodes, are automatically placed in a server group named
Group 1. Once you create additional server groups, you are required to specify a server group when
adding additional cluster nodes.

Cross Datacenter Replication (XDCR)


Cross datacenter replication provides an easy way to replicate active data, while still in-memory, to multiple, geographically
diverse data centers either for disaster recovery, or availability to bring data closer to the edge and its users.

Read-Write Read-Write
DATACENTER 1 Requests Requests DATACENTER 2

COUCHBASE SERVER CLUSTER COUCHBASE SERVER CLUSTER

ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE

REPLICA REPLICA REPLICA REPLICA REPLICA REPLICA

Replication Replication

Bidirectional XDCR

XDCR and intra-cluster replication (replication among local


cluster nodes for durability) occurs simultaneously. For Cluster 1 Cluster 2
example, intra-cluster replication is taking place within the
BucketA BucketA
clusters at both Datacenter 1 and Datacenter 2, while at
the same time, XDCR is replicating documents across data BucketB BucketB
centers. On each node, after a document is persisted to
disk, XDCR pushes the replica documents to other clusters. BucketC BucketC

On the destination cluster, replica documents received will


be stored in the Couchbase managed object cache so that
replica data on the destination cluster can undergo low XDCR provides only a single basic mechanism from which
latency read/write operations. replications are built: this is the unidirectional replication.
A bidirectional topology is created by implementing two
XDCR can be set up on a per-bucket or per-collection basis.
unidirectional replications, in opposite directions, between
Depending on your application requirements, you might
two clusters; such that a bucket or collection on each cluster
want to replicate only a subset of the data in Couchbase
functions as both source and target
Server between two clusters. With XDCR you can selectively
pick which buckets or collections to replicate between Used in different combinations, unidirectional and
two clusters in a unidirectional or bidirectional fashion. bidirectional replication can support complex topologies;
Bidirectional XDCR can be set up between Bucket C on both an example being the ring topology, where multiple
Cluster 1 and 2. There is unidirectional XDCR between Bucket clusters each connect to exactly two peers, so that a
B on both clusters. Bucket A is not replicated. complete ring of connections is formed.

WHITEPAPER 31
When a bucket is specified as the source for an XDCR will be considered the “winner.” If both the source and the
replication, all data in the bucket is replicated. Thus, if destination clusters have the same number of updates for a
replication is started between source and target buckets document, additional metadata such as numerical sequence,
that initially contain different datasets, the replication CAS value, document flags, and expiration TTL value are
process eventually establishes a complete superset of used to pick the “winner.” XDCR applies the same rule across
data within each bucket. clusters to make sure document consistency is maintained.

XDCR can also apply filters to replication streams within Priority


buckets and collections. This can further scope the data For high throughput clusters, multiple XDCR replications
being delivered to its target. may compete for resources in terms of memory and
bandwidth. Different replications can be assigned a high,
XDCR supports continuous replication of data. Data
medium, or low priority.
mutations are replicated to the destination cluster after
they are written to disk. By default, there are 32 data Security
streams per server per XDCR connection. These streams Couchbase Server can be rendered highly secure, so as
are spread across the partitions. They move data in to preserve the privacy and integrity of data, and account
parallel from the source cluster to the destination cluster. for access attempts. Couchbase provides the following

The source and destination clusters can have a different security facilities:

number of servers in varying topologies. If a server in the • Authentication – All administrators, users, and
destination cluster goes down, XDCR is able to get the applications (all formally considered users) must
updated cluster topology information and continue replicating authenticate in order to gain server access. Users can be
data to the available servers in the destination cluster. authenticated by means of either the local or an external

XDCR is push-based. The source cluster regularly password registry such as LDAP or PAM. Authentication

checkpoints the XDCR replication queue per partition can be achieved by either passing credentials directly to

and keeps track of what data the destination cluster last the server, or by using a client certificate, in which the

received. If the replication process is interrupted, for credentials are embedded. Connections can be secured

example, due to a server crash or intermittent network by means of SCRAM and TLS.

connection failure, it is not required to restart replication • Authorization – Couchbase Server uses Role-Based
from the beginning. Instead, once the replication link is Access Control (RBAC) to associate users with
restored, replication can continue from the last checkpoint specifically assigned roles, each role corresponds
seen by the destination cluster. to system-defined privileges, which allow degrees

By default, XDCR in Couchbase is designed to optimize of access to specific system resources. During

bandwidth. This includes optimizations like mutation de- authentication, user roles are determined; therefore,

duplication as well as checking the destination cluster to authorization is granted if the role has approved

see if a mutation is required to be shipped across. If the system access, otherwise it is denied. LDAP Groups

destination cluster has already seen the latest version can be mapped to Couchbase roles which authorize

of the data mutation, the source cluster does not send users to specific database functions (i.e. read, write,

it across. However, in some use cases with active-active update, query) on a whole cluster, a single or set of

XDCR, you may want to skip checking whether the buckets, scopes within buckets, collections within

destination cluster has already seen the latest version of the scopes, and even documents within collections.

data mutation, and optimistically replicate all mutations. • Auditing – Actions performed on Couchbase

Conflict Resolution Server can be audited. This allows administrators

Within a cluster, Couchbase provides strong consistency at to ensure that system management tasks are being

the document level. Likewise, XDCR also provides eventual appropriately performed. Monitoring information

consistency across clusters. Built-in conflict resolution will can be collected and reported using the Couchbase

pick the same “winner” on both clusters if the same document integration to Prometheus, a popular cloud native

was mutated on both clusters before it was replicated across. monitoring tool.

If a conflict occurs, the document with the most updates

WHITEPAPER 32
Couchbase Server supports the encryption of data that is Couchbase SDKs support SSL/TLS encryption and
at rest on disk, on the wire, and held by applications. must use the Couchbase network port 11207 for secure
Additionally, it provides a system of secret management, communication. Couchbase Server uses ciphers that are
which allows essential information to the security and accepted by default by OpenSSL. The default behavior
maintenance of Couchbase Server to be stored in employs high-security ciphers, built into OpenSSL, but can
encrypted form; and then decrypted and appropriately be overridden depending on the security level required by
used at cluster startup. the cluster. Couchbase also supports self-signing of TLS
certificates, and integrates with popular key and secret
Encryption at rest
management services.
Couchbase can run on top of commonly used third-party
encryption tools, such as Linux Unified Key Setup (LUKS), Moving data between nodes
Vormetric, Gemalto, and Protegrity, to provide a complete Couchbase Server replicates data across a cluster
solution for data encryption at rest while also being accessible to ensure high availability of data. When encrypting
to all Couchbase services. documents, replica copies are duly transmitted and stored
in encrypted form.
Couchbase SDKs provide methods for encrypting portions
of documents for added security at the application layer. Couchbase supports node-to-node transport level
Encrypted data sent back to the server is then always stored security. When enabled, Couchbase will use TLS to
in that encrypted state, at rest, and is inaccessible without encrypt traffic for intra-cluster communication between
the key used by the original application layer. Couchbase nodes. There are two settings: Control and All. When
cannot index or query application-encrypted fields. control level security is enabled, only communication
between the cluster manager and related services are
Encryption over-the-wire
encrypted but data between nodes is transmitted in the
For an application to communicate securely with
clear. When all is enabled, all communication between
Couchbase Server, SSL/TLS must be enabled on the client
cluster nodes is encrypted. Control level prevents
side. When a TLS connection is established between a
attackers who have gotten on the cluster network from
client application and Couchbase Server (running on
affecting the cluster configuration, but they could still sniff
port 18091), a handshake occurs, as defined by the TLS
data between clusters. All prevents any unauthorized user
Handshake Protocol. As part of this exchange, the client
on the network from reading any of the data over the wire.
must send to the server a cipher-suite list; which indicates
the cipher-suites that the client supports, in order of For added security, use IPSec on the network that
preference. The server replies with a notification of the connects the nodes. Note that IPSec has two modes:
cipher-suite it has duly selected from the list. Additionally, tunnel and transport. Transport mode is recommended,
symmetric keys to be used by client and server are as it is the easier of the two to set up, and does not
selected by means of the RSA key-exchange algorithm. require the creation of tunnels between all pairs of
Couchbase nodes.

Client Tier Internet Middle Tier Intranet Data Tier

Web Couchbase
Client Server

Mobile Web Services


Client 2 5
2 SYNC
GATEWAY
Embedded COUCHBASE LITE 4
System 3
1

Local Storage Secure Pluggable Secure Role-Based Access Geo-Fencing


Full Database Transport Authentication Transport Control and Secure with Secure,
AES-256 Encryption Over Wire and Role-Based Over Wire Data Storage Filtered
Access Control XDCR

WHITEPAPER 33
Enabling both Couchbase’s node-to-node transport level Using the Sync Gateway service, data can be seamlessly
security and IPSec security is not recommended. This extended to connect with remote edge devices that are
form of double encryption does not increase the amount occasionally disconnected or connected. Sync Gateway
of security and will dramatically affect performance and monitors data changes and maintains synchronization
make the cluster less resilient to denial of service attacks. between Couchbase Server and mobile applications. Sync
Gateway also supports hierarchical replication to and from
Moving data between datacenters
its own peers, allowing organizations to create multiple
To protect data transmitted between datacenters, TLS is
Sync Gateway endpoints (edgepoint), located in airports
used to encrypt XDCR connections. When TLS in XDCR
or ports of call to fully accommodate mobile applications
is enabled, Couchbase uses TLS certificates. TLS versions
on the move.
1.0 to 1.3 are supported. All traffic between source and
destination data centers is encrypted. Sync Gateway provides the facility to ensure that all
writes happen. Along with security and replication,
Mobile client synchronization
metadata is managed by Sync Gateway and abstracted
Couchbase clusters can be extended to mobile
from applications reading and writing data directly
applications running on edge devices. This is done by
to Couchbase Server. Sync Gateway uses a feature of
adding a Sync Gateway layer as an intermediary between
Couchbase Server called extended attributes (XATTRs)
device applications and Couchbase Server. Couchbase
to store that metadata in an external document fragment.
Lite’s SDK is then used to develop mobile applications
Mobile, web, and desktop applications can therefore write
communicating via this synchronization layer.
to the same bucket in a Couchbase.

Couchbase Server
Clients with
Couchbase Lite

DCP N1QL

Data sync

Sync Gateway SDKs

RESTful
API

Web clients

Peer-to-peer synchronization network connection such as Bluetooth or a local area


A unique capability of Couchbase Lite is its ability to network to support the movement of data among clients.
synchronize data from one mobile client to another, This is especially helpful when internet connectivity cannot
eliminating the need for data to complete a long internet be reliably maintained, but data must still be shared, such
traversal to and from an internet-hosted Couchbase Server. as creating pop-up medical clinics, or synchronizing data
Peer-to-peer synchronization only requires a common among pilots and aviation maintenance crews.

WHITEPAPER 34
Conflict resolution therefore, keeps track of the changes and prevents new
As Couchbase Mobile can handle several kinds of scenarios documents from simply overriding others. The developer
where multiple users may make updates – e.g., offline, is given tools to control the save and delete operations so
peer-to-peer, and live database sync modes – being able that a last-write-wins or last-write-fails protocol can
to manage mutation conflicts is essential. The system, automatically resolve any conflicts.

RESOURCES

This paper describes the architecture of Couchbase Server, but the best way to get to know the technology is to download
and use it. Couchbase Server is a good fit for a number of use cases including social gaming, ad targeting, content store, high
availability caching, and more.

Couchbase has a rich ecosystem of adapters that support other systems:

• SQL integration resources include: Tableau, CData, Knowi, Talend, and more

• Big data integration includes Spark, Kafka, and Elasticsearch

Several flexible deployment options are available to support different environments:

• Download Couchbase Server for free, visit http://www.couchbase.com/downloads

• Couchbase Autonomous Operator for Kubernetes and OpenShift

• Docker Hub hosts official Couchbase Docker images

• Managed cloud service

Training is available through in-class instructor led and free online courses.
See the course catalog at: https://learn.couchbase.com/

Comprehensive Couchbase documentation is available at: https://docs.couchbase.com/

WHITEPAPER 35
At Couchbase, we believe data is at the heart of the enterprise. We empower

developers and architects to build, deploy, and run their mission-critical applications.

Couchbase delivers a high-performance, flexible and scalable modern database that

runs across the data center and any cloud. Many of the world’s largest enterprises

rely on Couchbase to power the core applications their businesses depend on.

For more information, visit couchbase.com.


© 2022 Couchbase. All rights reserved.

You might also like