0% found this document useful (0 votes)
179 views

Moving Alfresco To Amazon Web Services: A Step by Step Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Moving Alfresco To Amazon Web Services: A Step by Step Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Moving Alfresco to

Amazon Web Services


A Step by Step Guide
December 2017

© 2017 Technology Services Group, Inc. All Rights Reserved.


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Contents
Executive Summary............................................................................................................................................ 3
Amazon Web Services Overview ...................................................................................................................... 5
Amazon Web Services Components ............................................................................................................ 5
Alfresco Components ........................................................................................................................................ 8
Installing Alfresco in the AWS Cloud ................................................................................................................ 9
Alfresco Install from Scratch ......................................................................................................................... 9
Alfresco Install with AWS Quick Start .........................................................................................................11
Migrating On-Premise Legacy ECM to Alfresco on the AWS Cloud ...........................................................12
Migration Methodology ...............................................................................................................................14
Migration Approaches .................................................................................................................................14
Alfresco Considerations ...............................................................................................................................16
AWS Considerations .....................................................................................................................................18
Migrating On-Premise Alfresco to the AWS Cloud .......................................................................................19
Lift and Shift to AWS .....................................................................................................................................19
Alfresco-to-Alfresco Migration ....................................................................................................................20
Taking Advantage of Amazon Services for your ECM Implementation ....................................................21

© 2017 Technology Services Group, Inc. All Rights Reserved. 2


Alfresco – Step by Step Guide for Moving to Amazon Web Services

EXECUTIVE SUMMARY
As Amazon Web Services (AWS) continues to innovate and become a common alternative to on-
premise Alfresco environments, progressive Alfresco customers are looking to move Alfresco or
other Legacy ECM solutions from internal environments to AWS. Amazon Web Services, an
Infrastructure as a Service (IAAS) provider, has several significant advantages over traditional on-
premise environments including:

• Ease of Deployment and Configuration


Amazon Web Services simplifies customers’ deployments, particularly Alfresco. Alfresco, as
well as Technology Services Group’s solutions for Alfresco, offer Quick Starts that not only
deploy Alfresco, but easily deploy all of the other necessary infrastructure components as
detailed later in this paper.

• Cost
AWS can replace many of the infrastructure vendors commonly required for an Alfresco
infrastructure with software components to reduce the overall cost of ownership. This
includes load balancing, routers, applications servers, databases, and storage.

• Elastic Scaling
AWS offers a variety of server scaling options to allow customers to scale Alfresco
environments up or scale down as required without additional hardware costs.

• Cloud Benefits
AWS offers other benefits associated with movement to the cloud, including geographic
expansion of systems, increased bandwidth through dedicated network connections and
content distribution networks, fault tolerance enabled by elasticity and agility, and
significant cost savings by paying for only the services used.

Technology Services Group (TSG) has been moving Alfresco clients to AWS since 2009 and is both a
platinum Alfresco partner and a Standard Amazon partner. This guide will provide a step by step
understanding of how to move Alfresco to Amazon Web Services. Whether moving an existing on-
premise solution to AWS, or starting a new Alfresco effort on AWS, this guide will provide the
following helpful information:

• Amazon Web Services Overview


The overview provides an understanding of the basic components of AWS and what is
required for a fault tolerant Alfresco infrastructure.

© 2017 Technology Services Group, Inc. All Rights Reserved. 3


Alfresco – Step by Step Guide for Moving to Amazon Web Services

• Alfresco Components on AWS


This section presents a basic architecture for Alfresco on AWS.

• Installing Alfresco on AWS


This section shares an understanding of what is required for an installation of Alfresco, as
well as how to leverage Amazon Marketplace and Quick Starts.

• Migrating On-Premise Legacy ECM to Alfresco on AWS


This section describes alternatives of how to move content from other repositories to
Alfresco on AWS.

• Migrating On-Premise Alfresco to Alfresco on AWS


Learn about additional options for existing Alfresco instances to move to AWS.

• Innovative Advantages of AWS over On-Premise Alfresco


Discusses additional advantages of running Alfresco in the AWS cloud versus on-premise
and advanced architecture implementations of Alfresco on AWS.

© 2017 Technology Services Group, Inc. All Rights Reserved. 4


Alfresco – Step by Step Guide for Moving to Amazon Web Services

AMAZON WEB SERVICES OVERVIEW


Amazon Web Services Components
Below are the minimum components TSG recommends for fault tolerant AWS/Alfresco
environments:

• Load Balancers (ELB)


AWS’s Elastic Load Balancers (ELB) act as security gateways and enable fault tolerance and
scalability for multi-tiered applications. On premise customers should understand that this is
a software replacement for F5 or other typically hardware-based load balancing
components. ELB is an AWS service and provides a consistent, always-available point of
contact for applications. TSG recommends ELB’s combination of resiliency, security, and fault
tolerance as we feel it exceeds comparable high-end solutions.

• Application Servers (EC2)


The EC2 compute cloud hosts a variety of OS options, CPU cores, and memory sizes for
flexibility in sizing the AWS server infrastructure. Application servers are used to run both
the Alfresco Content Services, as well as browser-based solutions like Alfresco Share or TSG’s
OpenContent Management Suite (OCMS) under Tomcat. Application servers can be either
Linux or Microsoft based. The Linux servers come in several “flavors” and price points, such
as Amazon Linux, Red Hat, and CentOS. Microsoft servers will cost more given the Microsoft
licensing model. Application servers take the place of dedicated hardware or on-premise
virtual instances.

• Relational Database Service (RDS)


AWS provides a relational database service to meet Alfresco’ s database requirements. AWS
manages the backup, underlying infrastructure, OS, and application layers for RDS,
eliminating the need for a database administrator. Typical clients will consider AWS’s
proprietary MySQL-based solution, Aurora, for which Alfresco has certified as well as run a
billion-object benchmark test https://www.alfresco.com/blogs/how-alfresco-powered-a-1-2-
billion-document-deployment-on-amazon-web-services/. Other common solutions on AWS
include MySQL, Microsoft SQL Server, PostgeSQL, or Oracle databases that clients are used
to supporting on-premise. Each of these options is a different price point, and there is the
option to Bring Your Own License (BYOL) for Microsoft SQL Server and Oracle.

• Storage

© 2017 Technology Services Group, Inc. All Rights Reserved. 5


Alfresco – Step by Step Guide for Moving to Amazon Web Services

AWS’s Elastic Block Storage (EBS), Elastic File System (EFS), Simple Storage Service (S3), and
Glacier are storage tiers that can be used separately or combined to meet varying
requirements. Each storage tier comes with a performance and price point, with Elastic Block
Storage being the best performance for the most cost, and Glacier being the most cost
effective but truly targeted for archival only. Most Alfresco customers choose S3 for content
storage, given typical document management needs. Alfresco offers an S3 Connector
adapter that we recommend be included in the Alfresco purchase. More thoughts on S3 and
innovative Alfresco solutions will come later in this paper.

• VPC (Virtual Private Cloud)


Each client account on AWS is deployed within its own Virtual Private Cloud. The VPC
protects the services and infrastructure and provides a private network landscape for the
applications. Clients can nest VPCs and private and public subnets to suit a variety needs.
Typical clients will build a Virtual Private Network (VPN) to their AWS VPC to allow AWS to be
seen as an extension of their on-premise network.

• Direct Connect
Direct Connect is a VPN service connecting the AWS VPC to client data centers. Direct
Connect is available through AWS telecom partners who have connected their networks
directly to AWS’s, providing a secure non-Internet connection. This connection extends AWS
to the on-premise network providing the capability for growing IT capacity without capital
expenditure. Direct Connect is a high-speed connection with different price points for
different bandwidth capabilities.

• Snowball/Snowmobile
For clients where Direct Connect isn’t feasible, or a large volume of legacy data needs to be
moved to AWS, Amazon offers more conventional file storage devices that can be shipped to
the existing on-premise data center and then shipped back to AWS for upload. Snowball is a
terabyte object store and also has the capability to run programs to manipulate the stored
data. Snowmobile can store and transfer up to 100 petabytes of data. After loading data on
the devices, AWS can transfer and store the data in S3 or Glacier.

Compared to typical on-premise solutions, AWS’s ability to provide all of the above solutions as
software solutions designed to work together significantly reduces the effort required to support the
overall solution.

Other advantages to the AWS solution include:

• Procurement (https://aws.amazon.com/free/)

© 2017 Technology Services Group, Inc. All Rights Reserved. 6


Alfresco – Step by Step Guide for Moving to Amazon Web Services

AWS provides procurement “on demand” without the need to await hardware ordering and
shipment delays. AWS offers a free tier of products for 12 months, and many services are
always free. For example, 10GB of Glacier storage, 100 GB of the AWS Storage Gateway, and
1 million Lambda (serverless) functions are always free.

• Cost Optimization (https://aws.amazon.com/pricing/cost-optimization/)


AWS operates similar to a utility, allowing customers to pay for only what gets used. Over the
last ten years, AWS has focused on reducing costs over time. Multiple times a year, AWS will
announce price decreases and cost optimization improvements. As an example, in 3rd
quarter of 2017, AWS changed EC2 on-demand billing from by the hour to by the minute,
reducing costs significantly since instances only running for minutes will no longer be
charged as a full hour. AWS provides several online tools to help customers achieve cost
optimization.

• Cloud Migration Methodology (https://aws.amazon.com/cloud-migration/)


AWS has a focus on customer satisfaction, just like Amazon.com. There are well-established
Cloud Adoption Frameworks, Application Migration Methodologies, and Well-Architected
Framework Pillars that AWS partners can bring to clients to ensure best practices and project
success. AWS provides services to review plans and architectures for clients and will make
time to meet one-on-one.

• Disaster Recovery & Business Continuity (https://aws.amazon.com/disaster-recovery/)


AWS is built for resiliency and failure management. Tools such as EC2 instance snap shots
(images), multi-region edge location services (Route 53, CloudFront), regional and availability
zone replication for storage and databases services allow for application deployments that
can survive the loss of one or more AWS regions. Below is a link that lists the AWS services as
available by region. Using this information, a cost-optimized DR and BC solution can be
available to a business of any size. https://aws.amazon.com/about-aws/global-
infrastructure/regional-product-services/

• Elasticity and Scalability (https://aws.amazon.com/autoscaling/)


AWS Cloudwatch and Auto-Scaling work hand-in-hand with compute services to expand and
contract the number and size of EC2 instances based on metrics from the AWS environment
or based on a schedule. Auto-scaling ensures the application environment maintains a
consistent level of performance as demand increases or decreases, usually by adding more
instances to an environment and using the ELB to evenly distribute traffic among the
instances.

© 2017 Technology Services Group, Inc. All Rights Reserved. 7


Alfresco – Step by Step Guide for Moving to Amazon Web Services

ALFRESCO COMPONENTS
Components of a typical Alfresco environment include:

• Alfresco Content Services (with TSG OpenContent Web Services Extension)


Alfresco Content Services provides open, flexible, highly scalable Enterprise Content
Management (ECM) capabilities. TSG’s OpenContent Web Services offer an extended REST-
based API for extending additional content management functionality like annotations, PDF
overlays, and PDF manipulation/combination.

• Activiti
Alfresco provides workflow capabilities with Activiti, an open source workflow/Business
Process Management (BPM) engine. Basic ECM workflow functionality with Activiti is included
as part of the Alfresco Content Services installation.

• Transformation Server(s)
Alfresco offers two types of transformation. By default, converting documents from Office
formats to PDF in Alfresco relies on LibreOffice, an open source tool. The LibreOffice
transformer is included in the Alfresco Content Services with no additional servers or cost.
While very quick, LibreOffice isn’t always the most accurate. Alfresco offers an external
transformation server that launches Microsoft Office and initiates a print command to
provide a more accurate PDF rendition. This transformation server requires additional
Alfresco licensing, licensing for Microsoft Office, as well as an additional server or virtual
machine.

• Solr Full-Text Indexing & Search


Alfresco provides the advanced searching capabilities of Solr, an open source search
infrastructure, as part of its deployment. Solr is used for both metadata searching, as well as
full-text search within documents. Typical Alfresco implementations include the Solr
indexing server within the Alfresco deployment. Alfresco has an option to move Solr to an
external server for better performance and fault tolerance.

• Application (OpenContent Management Suite or Share)


The application layer for Alfresco “out of the box” includes Alfresco Share. Alfresco Share can
be deployed to the same servers as Alfresco Content Services, or on separate servers for
better performance and isolation.

© 2017 Technology Services Group, Inc. All Rights Reserved. 8


Alfresco – Step by Step Guide for Moving to Amazon Web Services

OpenContent Management Suite provides users of Enterprise Content Management (ECM)


systems with a highly configurable interface to find and retrieve documents. Leveraging a
powerful web services layers, it easily integrates with ECM solutions. Each module of the
OCMS has been developed over the past 20 years based on our client’s specific business
needs and processes.

Most ECM users want to be able to target searches by particular types and attributes, rather
than the generic single field approach that searches everything. Instead of manually
browsing through countless folders, or endless content. Quickly looking at a collection of
documents and seeing all related documents and metadata allows users to have all content
accessible in a single screen. Regardless of industry, OpenContent Management Suite can
meet a user’s requirements. For instance, search for Invoices due in the next 30 days, or
Procedure documents that have recently become effective.

INSTALLING ALFRESCO IN THE AWS CLOUD


Once the decision is made to deploy Alfresco on AWS, there are different options for creating the
Alfresco stack on AWS and installing and configuring the Alfresco components. As with a traditional
on-premise installation, Alfresco can be installed manually from scratch on AWS or from a defined
AWS template. Alfresco and TSG also offer Quick Starts, allowing for an entire Alfresco stack to be
created from a template.

Alfresco Install from Scratch


For customers requiring full configurability of AWS and Alfresco components, a “from scratch”
installation will most likely be required. Some reasons for installation from scratch may include:

• The need to install Alfresco on corporate standard Amazon Machine Images (AMIs)
• Integration of Alfresco stack into an already existing network topology on AWS
• The desire to deploy Alfresco on a different RDS type then Aurora, such as MySQL, Oracle,
PosgreSQL, or Microsoft SQL Server.
• EC2 architecture that differs from what is offered with Alfresco’s AWS Quick Start

To get an Alfresco environment up and running from scratch, the following steps are required:

1. Provision Required AWS Components

© 2017 Technology Services Group, Inc. All Rights Reserved. 9


Alfresco – Step by Step Guide for Moving to Amazon Web Services

As mentioned in a previous section, there are a number of AWS components required to


build an Alfresco stack on AWS, including database, storage, compute, and networking
resources. These components can be provisioned manually via the AWS Management
Console, or scripted using AWS Cloud Formation. Cloud Formation provides the advantage
of creating a repeatable process for building additional environments (development, test,
production, etc.) and for rebuilding an environment if needed.

2. Install Alfresco Software


Once AWS components are available, the Alfresco software must be installed on the EC2
instances. Alfresco simplifies this process by wrapping all of the necessary components (Java,
Tomcat, LibreOffice, etc.) into a single package that includes an installation wizard. Generally,
customers execute the installation manually and document the procedure for repeating the
process in any additional environments. Alfresco also allows the installer to be run in silent
mode, which can be executed using DevOps automation software, such as Chef.

3. Deploy Additional Modules


Typically, Alfresco environments require add-on modules to be deployed to be fully
functional. Modules include the AWS S3 Connector, Transformation Server, and TSG’s
OpenContent services, which enable the TSG OpenContent Management Suite in Alfresco.
Modules can be easily deployed manually or via script using Alfresco’s command line-based
Module Management Tool (MMT) included with the Alfresco Content Services.

4. Configure Alfresco
Alfresco and its supporting modules are configured using properties files located on the
application servers. Default configuration values can be adjusted to meet the needs of the
specific Alfresco implementation. Configuration can be performed manually or using
DevOps automation.

One item to consider is the temptation to go “all-in” on automating the Alfresco installation and
configuration using AWS Cloud Formation and DevOps tools like Chef. It’s important to consider the
amount of time required to automate and be sure that it doesn’t outweigh the benefits of the
automation. Manual installation, once completed and documented in one environment, can be
repeated in other environments quickly. Given the relative infrequency that Alfresco would need to
be installed and configured from scratch, TSG recommends most customers choose a manual
approach for the initial installation. If automation is desired, manual installation is recommended to
initially get all Alfresco environments up and running as quickly as possible, and then add in
automation later as time allows.

© 2017 Technology Services Group, Inc. All Rights Reserved. 10


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Alfresco Install with AWS Quick Start


For customers looking to launch an Alfresco environment quickly, Alfresco offers an AWS Quick
Start. The Quick Start uses the power of AWS Cloud Formation and Chef recipes to allow for a
scalable Alfresco stack to be created within an AWS account just by answering questions in a form.
Building on Alfresco’s Quick Start, in 2018 TSG will be releasing a Quick Start enhanced with OCMS
and several pre-configured solutions: insurance policy & claims, controlled documentation, and
accounts payable.

Some of the advantages of using the Alfresco Quick Start include:

• All AWS components, including EC2, RDS, S3, VPC, and ELB, are provisioned automatically
and with optimal settings for Alfresco deployment
• Stack components, such as operating system and database platform, are created in
accordance with Alfresco’s supported platforms specification
• Alfresco’s AWS Quick Start offers the option to create a new VPC for the new Alfresco stack,
or add Alfresco to an existing VPC
• Alfresco is deployed across multiple availability zones within an AWS region, making the
system highly available
• Autoscaling is built into the deployment, allowing for additional Alfresco servers to be spun
up during times of high utilization, and turned off during periods of low utilization for cost
savings
• The Alfresco installation and base configuration is performed automatically with optimal
default configuration settings for the EC2 instance sizes selected

The AWS components deployed by Alfresco’s AWS Quick Start are depicted below:

© 2017 Technology Services Group, Inc. All Rights Reserved. 11


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Figure 1 Source: https://github.com/Alfresco/alfresco-cloudformation-chef

After the Alfresco environment has been created, additional Alfresco modules can be deployed into
the environment using the same methods as manual installation. Similarly, any additional
configuration that is needed for the environment can be performed as described in the manual
installation as well.

Alfresco’s AWS Quick Start offers a time saving way to jumpstart an Alfresco environment with
minimal effort. If at all possible, it’s recommended to utilize the Quick Start for new installations. In
some cases, configuration tweaks may be required after running the Quick Start, but these updates
are usually much easier than installing from scratch.

MIGRATING ON-PREMISE LEGACY ECM TO


ALFRESCO ON THE AWS CLOUD
Many customers looking to implement Alfresco on AWS need to migrate content from legacy ECM
systems to Alfresco. Legacy ECM systems might include Documentum, FileNet, Stellent,
Hummingbird, OpenText, SharePoint, shared network drives, or other home-grown
database/filesystem-based platforms. While the process for migrating from these various systems
may differ significantly, the end goal remains the same – to extract metadata (a.k.a. properties,

© 2017 Technology Services Group, Inc. All Rights Reserved. 12


Alfresco – Step by Step Guide for Moving to Amazon Web Services

attributes, tags) and binary content (PDFs, Office Documents, images, etc.) from the legacy ECM
system and import into Alfresco.

A migration tool, such as TSG’s OpenMigrate, is usually required for performing migrations from
legacy ECM systems to Alfresco. Tools like OpenMigrate have connectors for extracting metadata
and content from legacy ECM systems using native APIs. After extraction, metadata and content can
be transformed, if necessary, before importing into Alfresco using native Alfresco APIs.

© 2017 Technology Services Group, Inc. All Rights Reserved. 13


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Migration Methodology
A typical migration would include the following steps:

•Identify what, when, and how to migrate


Plan

•Configure OpenMigrate to migrate the content from legacy ECM to Alfresco


Configure •Configure target Alfresco repository to accept content from legacy ECM

•Run migration in test environment to confirm all objects are being migrated as expected
Test •Test on a large enough data set to calculate the expected time to complete full migration

•Execute the legacy ECM to Alfresco migration in the production environment


Migrate

•Verify the integrity of the content, metadata, and any related objects in Alfresco
Verify

Migration Approaches
There are several different approaches that can be taken when migrating from a legacy ECM to
Alfresco. Choosing the right approach is important, and will depend on many factors, including:

• Volume – How much content will be migrated?


• Speed/Throughput – How long will it take to migrate the content from the legacy ECM to
Alfresco on AWS?
• Availability – Can the legacy ECM be shut down or be put in read-only mode during
migration, or does it have to be available until Alfresco is live with the migrated content?
• Training – How many users will need to be trained on the new Alfresco system, and how
much training is required?

Common migration approaches include:

© 2017 Technology Services Group, Inc. All Rights Reserved. 14


Alfresco – Step by Step Guide for Moving to Amazon Web Services

• Big Bang Migration - All content is migrated at once during an outage. All users begin using
Alfresco after migration is complete.
Pros
 Legacy ECM can be immediately decommissioned after migration
 Migration only has to be planned, configured, tested, and executed once
Cons
 Can require a significant (sometimes unacceptable) amount of downtime
 Highest risk – if anything goes wrong, must back out and start over
 Highest chance of exceeding timeline and budget
 Everyone must move and be trained on Alfresco simultaneously

• Delta Migration - Bulk migration occurs while legacy ECM is still in use. Before cutover to
Alfresco, a smaller delta migration takes place to sync any changes since the bulk migration.
Pros
 Large portion of the migration can be completed while users are still in the
legacy ECM system
 Delta migration of changes is generally a small subset of content, minimizing
downtime
 Migration and new environment can be proven and verified prior to cutover
to significantly reduce risk
Cons
 All users must move and be trained on Alfresco simultaneously
 Additional verification is needed to ensure that delta migration was
successful

• Gradual Migration - Departments or subsets of users are migrated individually in phases.


Legacy ECM is decommissioned after all departments have been migrated
Pros
 Less risky because departments move on their own timelines
 Departments or subsets of users can be trained on Alfresco individually
 Alfresco and other system resources can be tuned gradually as more users
are brought on
Cons
 Migration timeline can extend months or years
 Additional effort to plan, configure, test, migrate, and verify multiple times
 Legacy ECM cannot be immediately decommissioned

© 2017 Technology Services Group, Inc. All Rights Reserved. 15


Alfresco – Step by Step Guide for Moving to Amazon Web Services

• Rolling Migration - Users begin using Alfresco immediately. Content is migrated from legacy
ECM on-demand when requested by user
Pros
 All users can immediately take advantage of the new user interface
 No system downtime for initial bulk migration
 Content is migrated as needed, making it easy to identify content that is not
used
Cons
 Requires custom user interfaces that know when to initiate migration from
legacy system
 Legacy ECM cannot be immediately decommissioned
 Bulk migration may eventually be required

Alfresco Considerations
When migrating content from legacy ECM systems to Alfresco, the following factors should be
considered during the planning stage of the migration. Some of the items may be analogous to
concepts that exist in the legacy ECM, while others may be unique to Alfresco.

• Content Modeling
What is the type hierarchy and what are the metadata fields for the content being migrated?
Since Alfresco supports aspects, can they be used to simplify the content model?

• System Metadata
Are there any system metadata fields (e.g. creation date, creator name, last modified date,
modifier, unique identifier) that need to be preserved from the legacy ECM when migrating
to Alfresco?

• Folder Structure
How will the migrated content be organized into a folder structure in Alfresco? Alfresco
requires that all content be placed into a folder. For performance reasons, it’s important
that folders not contain too many objects or subfolders. If the legacy ECM doesn’t have a
folder structure, one must be designed when migrating to Alfresco.

• Versions
Does the legacy ECM system support versioning? Does the version history need to be
migrated to Alfresco?

© 2017 Technology Services Group, Inc. All Rights Reserved. 16


Alfresco – Step by Step Guide for Moving to Amazon Web Services

• Renditions
Does the legacy ECM system support multiple renditions (Word documents with PDF
renditions, images/videos with multiple formats) for a piece of content?

• Annotations
Does the legacy ECM support the creation of content annotations? Do annotations need to
be migrated, and can they be converted to a standard format?

• Security
What were the access controls for the content in the legacy ECM? How will the permissions
be migrated/translated to Alfresco?

• Audit Trail
Is there any audit trail data in the legacy ECM? Does it need to be migrated/preserved?

© 2017 Technology Services Group, Inc. All Rights Reserved. 17


Alfresco – Step by Step Guide for Moving to Amazon Web Services

AWS Considerations
Below are some additional factors to consider when migrating from an on-premise legacy ECM
system to Alfresco on AWS.

Data Movement to AWS

Depending on the amount of content to be migrated and the network connection between the on-
premise datacenter and AWS, different approaches might be considered for moving the content to
AWS. For smaller migrations, or for customers that have Direct Connect with AWS, content can be
migrated over the network into Alfresco on AWS.

For larger migrations and for customers with only a VPN connection with AWS, it’s often more cost
effective and faster to utilize an AWS Snowball device to move content from on-premise to AWS. A
typical migration utilizing TSG’s OpenMigrate and AWS Snowball would include the following steps:

1. Configure OpenMigrate and execute phase 1 migration to extract content from legacy ECM
system to temporary storage on-premise
2. Request AWS Snowball device, attach to on-premise network, and copy content from the
temporary storage area to the Snowball
3. Ship Snowball device back to AWS, and then content on the device will be dumped to an S3
bucket
4. Configure OpenMigrate and execute phase 2 migration to import content into Alfresco on
AWS

Direct Content Linking with S3

OpenMigrate supports the ability to create objects in Alfresco, and then link objects to content
stored on Amazon S3. Because the content does not need to be streamed through the Alfresco
API/application server, direct content linking can significantly increase migration speeds (example –
250-450 documents per second).

Using the direct content linking approach, content is extracted from the legacy ECM system and then
dumped into the S3 bucket used for Alfresco’s content store. From there, a migration would be run
to create objects in Alfresco and set metadata. Then the content (already in S3) is linked to the
objects.

Alfresco Environment Scaling

Large migrations can put a heavy load on an Alfresco system, especially when using multi-threaded,
high throughput migration tools like OpenMigrate. Migration is a great opportunity to take
advantage of the scalability of Alfresco on AWS. During a large migration, additional Alfresco servers

© 2017 Technology Services Group, Inc. All Rights Reserved. 18


Alfresco – Step by Step Guide for Moving to Amazon Web Services

can be added to the cluster to increase migration throughput and prevent migration activities from
impacting the performance of the Alfresco system for any users that might be accessing the system
while the migration is running. After the migration is complete, the extra servers can be
decommissioned to save on AWS operating costs.

MIGRATING ON-PREMISE ALFRESCO TO THE AWS


CLOUD
Existing Alfresco customers with implementations on-premise may want to migrate to AWS to take
advantage of the many benefits of AWS. Two approaches can be used for this type of migration.

Lift and Shift to AWS


Because deployment of Alfresco on AWS is very similar to an on-premise deployment, it’s possible to
lift and shift the Alfresco content store, database, and Solr index data to AWS. The steps for this
process would include:

1. Install and configure the same version of Alfresco that’s on-premise in AWS. Alfresco’s S3
connector module must also be installed.
2. Shutdown on-premise Alfresco system.
3. Export on-premise database and load into Amazon RDS instance of the same database
platform and version.
4. Export Alfresco content store from on-premise Alfresco system and load into AWS S3 bucket
designated to for the Alfresco content store.
5. Export Solr index data from on-premise Alfresco system and load onto EBS volume(s) on
Alfresco indexing server(s) on AWS.
6. Develop and execute database script to update URLs for all content to be the new locations
of the content that’s been migrated to S3.
7. Start the AWS Alfresco system and test that the lift and shift was successful.

Additional Considerations

• For customers wanting to change database platforms (e.g. from Oracle on-premise to Aurora
RDS on AWS), database conversion utilities might be available on AWS, however, it’s critical
to thoroughly test the database conversion in advance to determine if it’s a viable option. In
the case of changing database platforms, TSG would typically recommend a migration

© 2017 Technology Services Group, Inc. All Rights Reserved. 19


Alfresco – Step by Step Guide for Moving to Amazon Web Services

approach to be able to test the migration and new environment as the migration is being
run.
• For large repositories, and for customers that don’t have AWS Direct Connect between their
on-premise datacenter and AWS, it may be necessary to utilize AWS Snowball to move data
from on-premise to AWS.
• For Alfresco installations on Linux, it may be possible to utilize AWS Elastic File System (EFS)
as the Alfresco content store, rather than S3, to avoid having to update content URLs in the
database

Alfresco-to-Alfresco Migration
Alfresco-to-Alfresco migration is another option for moving an on-premise Alfresco system to AWS.
A migration tool like TSG’s OpenMigrate can be configured with the on-premise Alfresco repository
as the migration source, and the AWS Alfresco repository as the migration target, and then the same
migration methodology and approaches as described in previous sections for legacy ECM to Alfresco
migration apply.

While Alfresco-to-Alfresco migration may require additional planning, the approach has some
distinct advantages over a lift and shift:

• Alfresco versions on-premise and on AWS do not have to match. A newer version of Alfresco
can be installed on AWS, and then content can be migrated from the older on-premise
repository to the new repository on AWS. In other words, an Alfresco upgrade can be
included as part of the migration.
• The platform and version of the AWS Alfresco database does not have to match the platform
and version of the on-premise Alfresco database. For example, content can be migrated
from an on-premise Oracle or Microsoft SQL Server implementation to AWS Aurora RDS
without the need for a risky database conversion.
• There is no need to perform database updates to modify the content URLs with the
migration approach. The migration tool takes care of migrating content from on-premise
storage into an S3 content store on AWS with no need for readdressing.
• Migration offers the opportunity to perform content cleanup before importing into the
Alfresco repository on AWS. Cleanup activities may include:
o Modification/consolidation of content model
o Reorganization of folder structure
o Updates to security model
o Leave behind any “junk” that shouldn’t be migrated

© 2017 Technology Services Group, Inc. All Rights Reserved. 20


Alfresco – Step by Step Guide for Moving to Amazon Web Services

TAKING ADVANTAGE OF AMAZON SERVICES FOR


YOUR ECM IMPLEMENTATION
As mentioned throughout this paper, hosting Alfresco on AWS provides many benefits over on-
premise solutions. For innovative Alfresco customers, other benefits are available, particularly when
it comes to the storage of the documents themselves and S3. This figure illustrates AWS services
used by several of our clients. As described in earlier sections, there are minimal components for a
successful deployment. This diagram shows how our clients are innovating on AWS. This section will
present some of those ideas for consideration.

• Glacier Archives & Vault Lock


With the large volumes of data stored in S3 there is the potential to save money by moving
content to Glacier’s low-cost storage tier. Glacier data is stored within Archives that are then
stored within Vaults. Beyond the simple storage of content, Glacier works with S3’s content
lifecycle rules and Glacier Vaults can implement a Vault Lock policy to enforce compliance
controls concerning the retention and disposition of documents.

© 2017 Technology Services Group, Inc. All Rights Reserved. 21


Alfresco – Step by Step Guide for Moving to Amazon Web Services

• S3 API
S3 has a robust API for directly accessing stored objects. TSG has taken advantage of this
capability within our OpenContent Management Suite to upload/download content directly
from the S3 object store to increase performance, particularly for large files and video
streaming, while reducing the load on the Alfresco server.

• Metadata on S3
Amazon supports metadata on S3, up to a 2K limit. TSG still recommends all metadata be
stored in Alfresco, but having some of the metadata on S3 allows for some creative
solutions, including replication as well as potential for searching S3 directly for objects. An
additional method to store metadata on S3 is as either JSON or XML files alongside content
files in the S3 bucket. By storing the metadata in a separate file, it is available to additional
AWS services such as Athena, CloudSearch, and RedShift.

• S3 Lifecyles
Lifecyles are an optional S3 feature for controlling the storage behavior of an object within
an S3 bucket. For example, a lifecycle can specify that after 60 days an object should move to
S3 Infrequent Access (S3-IA) storage and then after an additional 30 days move to Glacier
and finally after 2,555 days (7 years) from its creation be destroyed. A good lifecycle can
enforce compliance rules and save significant storage costs.

• AWS CloudFront
CloudFront is a Content Distribution Network (CDN) with capabilities to publish S3 objects to
edge storage locations around the world. Cloudfront provides for streaming and quick
access to the S3 object store without going through the Alfresco API to store or retrieve the
object, a feature not easily replicated with an on-premise Alfresco solution. TSG has taken
advantage of this capability within our OpenContent Management Suite to upload/download
content directly from the S3 object store to increase performance, particularly for large files
and video streaming, while reducing the load on the Alfresco server.

• AWS Elastic Transcoder


Amazon has been processing video for years. AWS Elastic Transcoder can handle a myriad of
video and audio formats, transforming them from one file format to another. This is ideal for
rendering videos into formats for streaming or annotating. It becomes possible to accept
several common video formats from users and transcode them into a single format for
consistency and use in OpenAnnotate Video as mp4 files. Elastic Transcoder uses AWS

© 2017 Technology Services Group, Inc. All Rights Reserved. 22


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Simple Queue Service (SQS) to process transcoding jobs, moving the files to and from S3
buckets.

• AWS AutoScaling
Over time, the typical document management solution has a slow and steady increase of
content and usage. However, for scenarios which require a large ingestion of content
initially or a huge increase in users, Amazon EC2 provides the flexibility to scale up or down
as needed. Alfresco’s Quick Start uses Chef to bootstrap and dynamically add and remove
instances from the auto-scaling group.

• AWS CloudWatch
AWS CloudWatch is used to monitor the health and behavior of EC2 instances and other
AWS services. Applications may also send metrics to CloudWatch so they can be observed.
AWS AutoScaling can be triggered by CloudWatch metrics, for example, a decision to launch
another EC2 instance can be made if the existing CPU usage exceeds 80% for 5 minutes.
Multiple thresholds for behavior can be defined and alarms set to alert a Simple
Notification Service (SNS) topic which might send an email or text message to alert an
administrator. The CloudWatch service provides the toolset to establish proactive
management of an AWS solution.

• Encryption
AWS provides encryption within several services. For Alfresco solutions, AWS offers
encryption within the S3 object store, EBS volumes, and RDS databases. AWS Certificate
Manager provides a hassle-free means for creating and managing SSL certificates. With AWS
encryption, additional software components like Alfresco encryption are no longer required.
Encryption keys can be controlled, rotated, and renewed by AWS or by the customer using
AWS Key Management Service (KMS).

• AWS CloudTrail & VPC Flow Logs


These two AWS services provide the means to monitor and respond to low-level application
and network traffic. AWS CloudTrail records AWS API-level traffic and ships the logs to S3.
These logs can be used to track actions against services and to troubleshoot issues. The VPC
Flow Logs are available in AWS CloudWatch and will monitor the IP traffic going into and out
of the VPC. This data can help with troubleshooting configurations and knowing what traffic
is coming into the instances and from where.

© 2017 Technology Services Group, Inc. All Rights Reserved. 23


Alfresco – Step by Step Guide for Moving to Amazon Web Services

Technology Services Group, Inc


22 West Washington Street, 5th Floor
Chicago, IL 60602
[email protected]
www.tsgrp.com

Readers are free to distribute this report within their own organizations, provided the
Technology Services Group footer at the bottom of every page is also present.

© 2017 Technology Services Group, Inc. All Rights Reserved.

© 2017 Technology Services Group, Inc. All Rights Reserved. 24

You might also like