Moving Alfresco To Amazon Web Services: A Step by Step Guide
Moving Alfresco To Amazon Web Services: A Step by Step Guide
Contents
Executive Summary............................................................................................................................................ 3
Amazon Web Services Overview ...................................................................................................................... 5
Amazon Web Services Components ............................................................................................................ 5
Alfresco Components ........................................................................................................................................ 8
Installing Alfresco in the AWS Cloud ................................................................................................................ 9
Alfresco Install from Scratch ......................................................................................................................... 9
Alfresco Install with AWS Quick Start .........................................................................................................11
Migrating On-Premise Legacy ECM to Alfresco on the AWS Cloud ...........................................................12
Migration Methodology ...............................................................................................................................14
Migration Approaches .................................................................................................................................14
Alfresco Considerations ...............................................................................................................................16
AWS Considerations .....................................................................................................................................18
Migrating On-Premise Alfresco to the AWS Cloud .......................................................................................19
Lift and Shift to AWS .....................................................................................................................................19
Alfresco-to-Alfresco Migration ....................................................................................................................20
Taking Advantage of Amazon Services for your ECM Implementation ....................................................21
EXECUTIVE SUMMARY
As Amazon Web Services (AWS) continues to innovate and become a common alternative to on-
premise Alfresco environments, progressive Alfresco customers are looking to move Alfresco or
other Legacy ECM solutions from internal environments to AWS. Amazon Web Services, an
Infrastructure as a Service (IAAS) provider, has several significant advantages over traditional on-
premise environments including:
• Cost
AWS can replace many of the infrastructure vendors commonly required for an Alfresco
infrastructure with software components to reduce the overall cost of ownership. This
includes load balancing, routers, applications servers, databases, and storage.
• Elastic Scaling
AWS offers a variety of server scaling options to allow customers to scale Alfresco
environments up or scale down as required without additional hardware costs.
• Cloud Benefits
AWS offers other benefits associated with movement to the cloud, including geographic
expansion of systems, increased bandwidth through dedicated network connections and
content distribution networks, fault tolerance enabled by elasticity and agility, and
significant cost savings by paying for only the services used.
Technology Services Group (TSG) has been moving Alfresco clients to AWS since 2009 and is both a
platinum Alfresco partner and a Standard Amazon partner. This guide will provide a step by step
understanding of how to move Alfresco to Amazon Web Services. Whether moving an existing on-
premise solution to AWS, or starting a new Alfresco effort on AWS, this guide will provide the
following helpful information:
• Storage
AWS’s Elastic Block Storage (EBS), Elastic File System (EFS), Simple Storage Service (S3), and
Glacier are storage tiers that can be used separately or combined to meet varying
requirements. Each storage tier comes with a performance and price point, with Elastic Block
Storage being the best performance for the most cost, and Glacier being the most cost
effective but truly targeted for archival only. Most Alfresco customers choose S3 for content
storage, given typical document management needs. Alfresco offers an S3 Connector
adapter that we recommend be included in the Alfresco purchase. More thoughts on S3 and
innovative Alfresco solutions will come later in this paper.
• Direct Connect
Direct Connect is a VPN service connecting the AWS VPC to client data centers. Direct
Connect is available through AWS telecom partners who have connected their networks
directly to AWS’s, providing a secure non-Internet connection. This connection extends AWS
to the on-premise network providing the capability for growing IT capacity without capital
expenditure. Direct Connect is a high-speed connection with different price points for
different bandwidth capabilities.
• Snowball/Snowmobile
For clients where Direct Connect isn’t feasible, or a large volume of legacy data needs to be
moved to AWS, Amazon offers more conventional file storage devices that can be shipped to
the existing on-premise data center and then shipped back to AWS for upload. Snowball is a
terabyte object store and also has the capability to run programs to manipulate the stored
data. Snowmobile can store and transfer up to 100 petabytes of data. After loading data on
the devices, AWS can transfer and store the data in S3 or Glacier.
Compared to typical on-premise solutions, AWS’s ability to provide all of the above solutions as
software solutions designed to work together significantly reduces the effort required to support the
overall solution.
• Procurement (https://aws.amazon.com/free/)
AWS provides procurement “on demand” without the need to await hardware ordering and
shipment delays. AWS offers a free tier of products for 12 months, and many services are
always free. For example, 10GB of Glacier storage, 100 GB of the AWS Storage Gateway, and
1 million Lambda (serverless) functions are always free.
ALFRESCO COMPONENTS
Components of a typical Alfresco environment include:
• Activiti
Alfresco provides workflow capabilities with Activiti, an open source workflow/Business
Process Management (BPM) engine. Basic ECM workflow functionality with Activiti is included
as part of the Alfresco Content Services installation.
• Transformation Server(s)
Alfresco offers two types of transformation. By default, converting documents from Office
formats to PDF in Alfresco relies on LibreOffice, an open source tool. The LibreOffice
transformer is included in the Alfresco Content Services with no additional servers or cost.
While very quick, LibreOffice isn’t always the most accurate. Alfresco offers an external
transformation server that launches Microsoft Office and initiates a print command to
provide a more accurate PDF rendition. This transformation server requires additional
Alfresco licensing, licensing for Microsoft Office, as well as an additional server or virtual
machine.
Most ECM users want to be able to target searches by particular types and attributes, rather
than the generic single field approach that searches everything. Instead of manually
browsing through countless folders, or endless content. Quickly looking at a collection of
documents and seeing all related documents and metadata allows users to have all content
accessible in a single screen. Regardless of industry, OpenContent Management Suite can
meet a user’s requirements. For instance, search for Invoices due in the next 30 days, or
Procedure documents that have recently become effective.
• The need to install Alfresco on corporate standard Amazon Machine Images (AMIs)
• Integration of Alfresco stack into an already existing network topology on AWS
• The desire to deploy Alfresco on a different RDS type then Aurora, such as MySQL, Oracle,
PosgreSQL, or Microsoft SQL Server.
• EC2 architecture that differs from what is offered with Alfresco’s AWS Quick Start
To get an Alfresco environment up and running from scratch, the following steps are required:
4. Configure Alfresco
Alfresco and its supporting modules are configured using properties files located on the
application servers. Default configuration values can be adjusted to meet the needs of the
specific Alfresco implementation. Configuration can be performed manually or using
DevOps automation.
One item to consider is the temptation to go “all-in” on automating the Alfresco installation and
configuration using AWS Cloud Formation and DevOps tools like Chef. It’s important to consider the
amount of time required to automate and be sure that it doesn’t outweigh the benefits of the
automation. Manual installation, once completed and documented in one environment, can be
repeated in other environments quickly. Given the relative infrequency that Alfresco would need to
be installed and configured from scratch, TSG recommends most customers choose a manual
approach for the initial installation. If automation is desired, manual installation is recommended to
initially get all Alfresco environments up and running as quickly as possible, and then add in
automation later as time allows.
• All AWS components, including EC2, RDS, S3, VPC, and ELB, are provisioned automatically
and with optimal settings for Alfresco deployment
• Stack components, such as operating system and database platform, are created in
accordance with Alfresco’s supported platforms specification
• Alfresco’s AWS Quick Start offers the option to create a new VPC for the new Alfresco stack,
or add Alfresco to an existing VPC
• Alfresco is deployed across multiple availability zones within an AWS region, making the
system highly available
• Autoscaling is built into the deployment, allowing for additional Alfresco servers to be spun
up during times of high utilization, and turned off during periods of low utilization for cost
savings
• The Alfresco installation and base configuration is performed automatically with optimal
default configuration settings for the EC2 instance sizes selected
The AWS components deployed by Alfresco’s AWS Quick Start are depicted below:
After the Alfresco environment has been created, additional Alfresco modules can be deployed into
the environment using the same methods as manual installation. Similarly, any additional
configuration that is needed for the environment can be performed as described in the manual
installation as well.
Alfresco’s AWS Quick Start offers a time saving way to jumpstart an Alfresco environment with
minimal effort. If at all possible, it’s recommended to utilize the Quick Start for new installations. In
some cases, configuration tweaks may be required after running the Quick Start, but these updates
are usually much easier than installing from scratch.
attributes, tags) and binary content (PDFs, Office Documents, images, etc.) from the legacy ECM
system and import into Alfresco.
A migration tool, such as TSG’s OpenMigrate, is usually required for performing migrations from
legacy ECM systems to Alfresco. Tools like OpenMigrate have connectors for extracting metadata
and content from legacy ECM systems using native APIs. After extraction, metadata and content can
be transformed, if necessary, before importing into Alfresco using native Alfresco APIs.
Migration Methodology
A typical migration would include the following steps:
•Run migration in test environment to confirm all objects are being migrated as expected
Test •Test on a large enough data set to calculate the expected time to complete full migration
•Verify the integrity of the content, metadata, and any related objects in Alfresco
Verify
Migration Approaches
There are several different approaches that can be taken when migrating from a legacy ECM to
Alfresco. Choosing the right approach is important, and will depend on many factors, including:
• Big Bang Migration - All content is migrated at once during an outage. All users begin using
Alfresco after migration is complete.
Pros
Legacy ECM can be immediately decommissioned after migration
Migration only has to be planned, configured, tested, and executed once
Cons
Can require a significant (sometimes unacceptable) amount of downtime
Highest risk – if anything goes wrong, must back out and start over
Highest chance of exceeding timeline and budget
Everyone must move and be trained on Alfresco simultaneously
• Delta Migration - Bulk migration occurs while legacy ECM is still in use. Before cutover to
Alfresco, a smaller delta migration takes place to sync any changes since the bulk migration.
Pros
Large portion of the migration can be completed while users are still in the
legacy ECM system
Delta migration of changes is generally a small subset of content, minimizing
downtime
Migration and new environment can be proven and verified prior to cutover
to significantly reduce risk
Cons
All users must move and be trained on Alfresco simultaneously
Additional verification is needed to ensure that delta migration was
successful
• Rolling Migration - Users begin using Alfresco immediately. Content is migrated from legacy
ECM on-demand when requested by user
Pros
All users can immediately take advantage of the new user interface
No system downtime for initial bulk migration
Content is migrated as needed, making it easy to identify content that is not
used
Cons
Requires custom user interfaces that know when to initiate migration from
legacy system
Legacy ECM cannot be immediately decommissioned
Bulk migration may eventually be required
Alfresco Considerations
When migrating content from legacy ECM systems to Alfresco, the following factors should be
considered during the planning stage of the migration. Some of the items may be analogous to
concepts that exist in the legacy ECM, while others may be unique to Alfresco.
• Content Modeling
What is the type hierarchy and what are the metadata fields for the content being migrated?
Since Alfresco supports aspects, can they be used to simplify the content model?
• System Metadata
Are there any system metadata fields (e.g. creation date, creator name, last modified date,
modifier, unique identifier) that need to be preserved from the legacy ECM when migrating
to Alfresco?
• Folder Structure
How will the migrated content be organized into a folder structure in Alfresco? Alfresco
requires that all content be placed into a folder. For performance reasons, it’s important
that folders not contain too many objects or subfolders. If the legacy ECM doesn’t have a
folder structure, one must be designed when migrating to Alfresco.
• Versions
Does the legacy ECM system support versioning? Does the version history need to be
migrated to Alfresco?
• Renditions
Does the legacy ECM system support multiple renditions (Word documents with PDF
renditions, images/videos with multiple formats) for a piece of content?
• Annotations
Does the legacy ECM support the creation of content annotations? Do annotations need to
be migrated, and can they be converted to a standard format?
• Security
What were the access controls for the content in the legacy ECM? How will the permissions
be migrated/translated to Alfresco?
• Audit Trail
Is there any audit trail data in the legacy ECM? Does it need to be migrated/preserved?
AWS Considerations
Below are some additional factors to consider when migrating from an on-premise legacy ECM
system to Alfresco on AWS.
Depending on the amount of content to be migrated and the network connection between the on-
premise datacenter and AWS, different approaches might be considered for moving the content to
AWS. For smaller migrations, or for customers that have Direct Connect with AWS, content can be
migrated over the network into Alfresco on AWS.
For larger migrations and for customers with only a VPN connection with AWS, it’s often more cost
effective and faster to utilize an AWS Snowball device to move content from on-premise to AWS. A
typical migration utilizing TSG’s OpenMigrate and AWS Snowball would include the following steps:
1. Configure OpenMigrate and execute phase 1 migration to extract content from legacy ECM
system to temporary storage on-premise
2. Request AWS Snowball device, attach to on-premise network, and copy content from the
temporary storage area to the Snowball
3. Ship Snowball device back to AWS, and then content on the device will be dumped to an S3
bucket
4. Configure OpenMigrate and execute phase 2 migration to import content into Alfresco on
AWS
OpenMigrate supports the ability to create objects in Alfresco, and then link objects to content
stored on Amazon S3. Because the content does not need to be streamed through the Alfresco
API/application server, direct content linking can significantly increase migration speeds (example –
250-450 documents per second).
Using the direct content linking approach, content is extracted from the legacy ECM system and then
dumped into the S3 bucket used for Alfresco’s content store. From there, a migration would be run
to create objects in Alfresco and set metadata. Then the content (already in S3) is linked to the
objects.
Large migrations can put a heavy load on an Alfresco system, especially when using multi-threaded,
high throughput migration tools like OpenMigrate. Migration is a great opportunity to take
advantage of the scalability of Alfresco on AWS. During a large migration, additional Alfresco servers
can be added to the cluster to increase migration throughput and prevent migration activities from
impacting the performance of the Alfresco system for any users that might be accessing the system
while the migration is running. After the migration is complete, the extra servers can be
decommissioned to save on AWS operating costs.
1. Install and configure the same version of Alfresco that’s on-premise in AWS. Alfresco’s S3
connector module must also be installed.
2. Shutdown on-premise Alfresco system.
3. Export on-premise database and load into Amazon RDS instance of the same database
platform and version.
4. Export Alfresco content store from on-premise Alfresco system and load into AWS S3 bucket
designated to for the Alfresco content store.
5. Export Solr index data from on-premise Alfresco system and load onto EBS volume(s) on
Alfresco indexing server(s) on AWS.
6. Develop and execute database script to update URLs for all content to be the new locations
of the content that’s been migrated to S3.
7. Start the AWS Alfresco system and test that the lift and shift was successful.
Additional Considerations
• For customers wanting to change database platforms (e.g. from Oracle on-premise to Aurora
RDS on AWS), database conversion utilities might be available on AWS, however, it’s critical
to thoroughly test the database conversion in advance to determine if it’s a viable option. In
the case of changing database platforms, TSG would typically recommend a migration
approach to be able to test the migration and new environment as the migration is being
run.
• For large repositories, and for customers that don’t have AWS Direct Connect between their
on-premise datacenter and AWS, it may be necessary to utilize AWS Snowball to move data
from on-premise to AWS.
• For Alfresco installations on Linux, it may be possible to utilize AWS Elastic File System (EFS)
as the Alfresco content store, rather than S3, to avoid having to update content URLs in the
database
Alfresco-to-Alfresco Migration
Alfresco-to-Alfresco migration is another option for moving an on-premise Alfresco system to AWS.
A migration tool like TSG’s OpenMigrate can be configured with the on-premise Alfresco repository
as the migration source, and the AWS Alfresco repository as the migration target, and then the same
migration methodology and approaches as described in previous sections for legacy ECM to Alfresco
migration apply.
While Alfresco-to-Alfresco migration may require additional planning, the approach has some
distinct advantages over a lift and shift:
• Alfresco versions on-premise and on AWS do not have to match. A newer version of Alfresco
can be installed on AWS, and then content can be migrated from the older on-premise
repository to the new repository on AWS. In other words, an Alfresco upgrade can be
included as part of the migration.
• The platform and version of the AWS Alfresco database does not have to match the platform
and version of the on-premise Alfresco database. For example, content can be migrated
from an on-premise Oracle or Microsoft SQL Server implementation to AWS Aurora RDS
without the need for a risky database conversion.
• There is no need to perform database updates to modify the content URLs with the
migration approach. The migration tool takes care of migrating content from on-premise
storage into an S3 content store on AWS with no need for readdressing.
• Migration offers the opportunity to perform content cleanup before importing into the
Alfresco repository on AWS. Cleanup activities may include:
o Modification/consolidation of content model
o Reorganization of folder structure
o Updates to security model
o Leave behind any “junk” that shouldn’t be migrated
• S3 API
S3 has a robust API for directly accessing stored objects. TSG has taken advantage of this
capability within our OpenContent Management Suite to upload/download content directly
from the S3 object store to increase performance, particularly for large files and video
streaming, while reducing the load on the Alfresco server.
• Metadata on S3
Amazon supports metadata on S3, up to a 2K limit. TSG still recommends all metadata be
stored in Alfresco, but having some of the metadata on S3 allows for some creative
solutions, including replication as well as potential for searching S3 directly for objects. An
additional method to store metadata on S3 is as either JSON or XML files alongside content
files in the S3 bucket. By storing the metadata in a separate file, it is available to additional
AWS services such as Athena, CloudSearch, and RedShift.
• S3 Lifecyles
Lifecyles are an optional S3 feature for controlling the storage behavior of an object within
an S3 bucket. For example, a lifecycle can specify that after 60 days an object should move to
S3 Infrequent Access (S3-IA) storage and then after an additional 30 days move to Glacier
and finally after 2,555 days (7 years) from its creation be destroyed. A good lifecycle can
enforce compliance rules and save significant storage costs.
• AWS CloudFront
CloudFront is a Content Distribution Network (CDN) with capabilities to publish S3 objects to
edge storage locations around the world. Cloudfront provides for streaming and quick
access to the S3 object store without going through the Alfresco API to store or retrieve the
object, a feature not easily replicated with an on-premise Alfresco solution. TSG has taken
advantage of this capability within our OpenContent Management Suite to upload/download
content directly from the S3 object store to increase performance, particularly for large files
and video streaming, while reducing the load on the Alfresco server.
Simple Queue Service (SQS) to process transcoding jobs, moving the files to and from S3
buckets.
• AWS AutoScaling
Over time, the typical document management solution has a slow and steady increase of
content and usage. However, for scenarios which require a large ingestion of content
initially or a huge increase in users, Amazon EC2 provides the flexibility to scale up or down
as needed. Alfresco’s Quick Start uses Chef to bootstrap and dynamically add and remove
instances from the auto-scaling group.
• AWS CloudWatch
AWS CloudWatch is used to monitor the health and behavior of EC2 instances and other
AWS services. Applications may also send metrics to CloudWatch so they can be observed.
AWS AutoScaling can be triggered by CloudWatch metrics, for example, a decision to launch
another EC2 instance can be made if the existing CPU usage exceeds 80% for 5 minutes.
Multiple thresholds for behavior can be defined and alarms set to alert a Simple
Notification Service (SNS) topic which might send an email or text message to alert an
administrator. The CloudWatch service provides the toolset to establish proactive
management of an AWS solution.
• Encryption
AWS provides encryption within several services. For Alfresco solutions, AWS offers
encryption within the S3 object store, EBS volumes, and RDS databases. AWS Certificate
Manager provides a hassle-free means for creating and managing SSL certificates. With AWS
encryption, additional software components like Alfresco encryption are no longer required.
Encryption keys can be controlled, rotated, and renewed by AWS or by the customer using
AWS Key Management Service (KMS).
Readers are free to distribute this report within their own organizations, provided the
Technology Services Group footer at the bottom of every page is also present.