The CloudForecast Resource Tagging Guide - Book
The CloudForecast Resource Tagging Guide - Book
Table of Contents
2
Part 1: An Introduction to Tagging
Strategies
If you’ve worked in Amazon Web Services for long, you’ve probably seen or used tags to organize
your team’s resources. AWS allows you to attach metadata to most resources in the form of key-value
pairs called tags. In this part of our three-part guide on tagging resources in AWS, we’ll cover some of
the most common use-cases for tags and look at some best practices for selecting and organizing
your tags. Finally, we’ll explore some examples of tagging strategies used by real companies to
improve visibility into their resource utilization in AWS.
Understanding and controlling your costs isn’t the only reason you should tag your resources. You
can use tags to answer a variety of questions, including:
Before you start adding tags to all of your resources, it’s essential to create a strategy that will help
you sustainably manage your tags. Tags can be helpful, but without a consistently applied plan, they
can become an unsustainable mess.
● Technical Tags help engineers identify and work with the resource. These might include an
application or service name, an environment, or a version number.
3
● Business Tags allow stakeholders to analyze costs and the teams or business units
responsible for each resource. For example, you might want to know what percentage of your
AWS spend is going towards the new product you launched last year so you can determine
the return on investment of that effort.
● Security Tags ensure compliance and security standards are met across the organization.
These tags might be used to limit access or denote specific data security requirements for
HIPAA or SOC compliance.
● Automation Tags can be used to automate the cleanup, shutdown, or usage rules for each
resource in your account. For example, you could tag sandbox servers and run a script to
delete them after they’re no longer in use.
A common pattern for tags is to use lowercase letters with hyphens between words and colons to
namespace them. For example, you might use something like this:
mifflin:eng:os-version 1.0
Where mifflin is the name of your company, eng designates this tag as being relevant to the
engineering team, os-version indicates the purpose of the tag, and 1.0 is the value.
Fortunately, many tags can be avoided by relying on AWS’s built-in resource metadata. For example,
you don’t have to store the creator of an EC2 instance because Amazon adds a createdBy tag by
default. Decide which tags you need and try to limit the creation of new tags.
4
6. Automate tag management
As the number of resources in your AWS account grows, keeping up with your tags, enforcing
conventions, and updating tags will get increasingly difficult. In Part 2 and 3 of this guide, we’ll look
at how you can use Terraform, CloudFormation, Cloud Custodian to manage tags across your
resources.
Amazon also offers tag policies, tagging by resource group, and a resource tagging API to help you
govern and assign tags in bulk. Automating as much of the tag management process as possible will
result in higher quality, more maintainable tags in the long run.
Amazon Web Services provides a comprehensive document of their recommended practices for
tagging resources. Be sure to review it if you’re new to tags and want to dive deeper into some of
these best practices.
Key Value
5
If these two services share a single RDS instance, then the database can be tagged
service=cart|search (to indicate that this resource serves both services) and the architecture
might look something like this:
6
If you choose a strategy like the one above, you have to consider how tags will change over time. For
example, if you add a new service that shares the same RDS instance, you’ll have to update the
database’s tags to include the name of the new service. For this reason, some teams opt to use a
single tag to indicate that a resource may be used by all services (eg: service=common).
Service-based tagging strategies like this are usually a good starting point if you’d like to understand
which services contribute the most to your AWS costs. The business team can use these tags to see
how much they’re paying for each service or environment and reach out to the appropriate contact if
they have questions.
In this example, the company tags resources that contain user data with user-data=true so that
they can audit them more frequently and ensure they meet specific standards. All resources have a
contact and env tag to designate the responsible team member and ensure someone is accountable
for keeping them up to date.
7
Using a compliance strategy does not preclude you from using other strategies as well. One of the
advantages of tags is that they let you segment your AWS resources in a nearly infinite number of
ways.
8
9
In this example, the organization designated business unit and team tags to each resource, with each
environment having a separate AWS account.
10
This allows them to generate reports in each environment to see what their resource costs are for the
marketing (mktg) unit is vs. the data warehousing (data) unit. If the team uses this method of
account-segmented tagging, they’ll need to use a master account to see resource usage across their
entire organization. You can also use CloudForecast to generate regular cost reports and breakdowns
across multiple AWS accounts.
Conclusion
Any organization that uses AWS at scale will need to develop a tagging strategy that works for them.
Consider the best practices and examples above, as well as your organization’s goals.
Once you decide on a strategy, you will need a plan for adding and maintaining tags. In the next part
of this guide, we’ll look at two tools you can adopt to ensure your engineering teams are using tags
consistently across all your AWS resources.
11
Part 2: Using Terraform and
Cloudformation to Enforce Your AWS
Tagging Strategy
Once you have adopted a tagging strategy, you’ll need to make sure that all your existing AWS
resources and any new ones you create abide by it. Consistency is the key - if you don’t proactively
enforce your AWS tagging strategy, you’ll always be playing catch up and chasing down team
members to make sure they add the right tags to their resources.
While you can tag your AWS resources manually using the AWS CLI or AWS Tag Editor, you’ll
probably find this cumbersome and error-prone at scale. A better approach is to automatically apply
AWS allocation tags to your resources and use rules to enforce their consistent usage.
Depending on the tool you use to maintain your infrastructure on AWS, your method of proactively
enforcing AWS tags on new resources may vary. In this guide, I’ll highlight two tools: Terraform and
CloudFormation. You’ll see how to use each to create and update AWS cost allocation tags on your
resources and then enforce the proper use of specific tags for new resources. By proactively enforcing
your AWS tagging strategy, you’ll minimize your time spent auditing and correcting improper AWS
tags and force developers to learn best practices for tagging their AWS resources.
Terraform
The first infrastructure management tool I’ll cover is Terraform. Terraform works across a variety of
cloud hosting providers to help you provision and maintain your AWS resources. With Terraform,
you can define your servers, databases, and networks in code and apply your changes
programmatically to your AWS account.
If you’re new to Terraform, they have a well-documented Getting Started guide and several AWS
template examples on GitHub. In this section, I’ll show you some snippets from a demo Terraform
project and module that is available on GitHub.
12
private_key = file(var.private_key_path)
}
instance_type = "t2.micro"
ami = var.aws_amis[var.aws_region]
key_name = aws_key_pair.auth.key_name
vpc_security_group_ids = [aws_security_group.default.id]
subnet_id = aws_subnet.default.id
provisioner "remote-exec" {
inline = [
"sudo apt-get -y update",
"sudo apt-get -y install nginx",
"sudo service nginx start",
]
}
tags = {
contact = "j-mark"
env = "dev"
service = "cart"
}
}
The above example includes three tags: contact, env, and service with values described as
strings. When you apply this configuration, Terraform will connect to AWS and create an EC2
instance having the tags you specified.
13
Updating Tags
Terraform makes it easy to update your resources in reversible and consistent ways. If you’re using
tags to keep track of a resource’s contact (e.g.: j-mark in the above example), you’re likely to need to
update the tag when the team member leaves or changes roles.
To update the tags on your resource, simply update the corresponding tags in your Terraform
configuration. The new tags will overwrite any previous tags assigned to the resource, including tags
added outside of Terraform.
For example, to change the contact tag on the EC2 instance above, you might update the tags
block above with the following:
tags = {
contact = "l-duke"
env = "dev"
service = "cart"
}
When you apply this configuration, your tags will be automatically updated in AWS.
If you keep your Terraform configuration files in version control - which is probably a good idea -
you will be able to see how tags have changed over time. You can also review changes using the same
code review process that your application code goes through to help you catch mistakes in the
execution of your tagging strategy.
Enforcing Tags
As your infrastructure grows, a code review process likely won’t be enough to prevent improper
tagging. Fortunately, you can enforce tag names and values using variables and custom validation
rules in Terraform.
14
In the examples above, the tags list was hard-coded into the EC2 instance definition. A more
scalable pattern would be to break your EC2 instance template into its own module and use a tags
variable. You can then write a custom validation rule to check that the tags comply with your
strategy.
variable "tags" {
description = "The tags for this resource."
validation {
condition = length(var.tags) > 0 && contains(["j-mark", "l-duke"],
var.tags.contact) && var.tags.env != null && contains(["cart", "search",
"cart:search"], var.tags.service)
error_message = "Invalid resource tags applied."
}
}
Now when you run terraform plan with a missing or invalid tag, you’ll get an error:
Your rules can be as complex as Terraform’s Configuration Language allows, so functions like
regex(), substr(), and distinct() are all available. That said, there are some caveats to this
approach.
First, custom variable validation is an experimental feature in Terraform. Experimental features are
subject to change, meaning that you might need to pay attention to Terraform update mores closely.
To enable variable_validation, add the following to your terraform block:
terraform {
experiments = [variable_validation]
}
15
Second, Terraform’s variable validation only happens during the terraform plan phase of your
infrastructure’s lifecycle. It can’t prevent users from accidentally changing your tags directly in the
AWS console, and it’s only as good as the validation rules you write. If you start using a new resource
but forget to add validation rules, you might end up with lots of resources that don’t adhere to your
tagging strategy.
Another option for paid Terraform Cloud customers is Sentinel, which allows you to create custom
policies for your resources. I won’t cover this method here, but Terraform has created an example
policy to show you how to enforce mandatory tags in AWS.
CloudFormation
Similar to Terraform, CloudFormation lets you provision AWS resources based on configuration
files. Unlike Terraform, CloudFormation is part of Amazon’s offerings, so it won’t necessarily help
you if you want to use another infrastructure provider. The approach to tagging your resources in
CloudFormation is similar to that used by Terraform, but as you’ll see, the configuration format is
different.
If you’re new to CloudFormation, Amazon’s official walkthrough will help you get started deploying
some basic templates. In this section, I’ll show you some snippets from a demo CloudFormation
template which is also available on GitHub.
For example, to create a new EC2 instance with the same three tags used in the Terraform example
above, add an array of Tags to the resource’s Properties block:
"Resources" : {
"WebServerInstance": {
"Type": "AWS::EC2::Instance",
"Metadata" : {...},
"Properties": {
"Tags" : [
{
"Key" : "contact",
"Value" : "j-mark"
},
{
"Key" : "env",
"Value" : "dev"
},
{
16
"Key" : "service",
"Value" : "cart"
}
],
...
}
},
...
},
Using the AWS Command Line Interface, you can deploy this template as a new stack. This will
ensure your template is valid and create the specified resources with their tags on AWS:
If you have lots of similar resources in your template, you can add tags to all the resources in the
stack at once using the --tags flag with the create-stack or update-stack commands:
Updating Tags
If you want to change the contact on your EC2 instance created above, simply change the Tags
section of your template file and use the update-stack command to deploy your changes.
"Tags" : [
{
"Key" : "contact",
"Value" : "l-duke"
},
...
],
17
CloudFormation behaves the same way that Terraform does when you update tags outside your
template file. Any tags set manually will be overridden by the update-stack command, so be sure
that everyone on your team adds tags through CloudFormation.
Enforcing Tags
AWS provides Organization Tag Policies and Config Managed Rules to help you find improperly
tagged resources, but neither of these tools prevents you from creating resources with missing or
invalid tags. One way to proactively enforce your tagging strategy is by using the CloudFormation
linter.
cfn-lint is a command-line tool that will make sure your CloudFormation template is correctly
formatted. It checks the formatting of your JSON or YAML file, proper typing of your inputs, and a
few hundred other best practices. While the presence of specific tags isn’t checked by default, you can
write a custom rule to do so in Python.
For example, if you want to ensure that your CloudFormation web servers follow the same rules as
the Terraform example above and have:
class TagsRequired(CloudFormationLintRule):
id = 'E9000'
shortdesc = 'Tags are properly set'
description = 'Check all Tag rules for WebServerInstaces'
18
if not tags:
message = "All resources must have at least one tag"
matches.append(RuleMatch(web_server, message.format()))
return matches
Using linting to validate your CloudFormation rules is a great way to enforce your tagging strategy
proactively. If you’re storing your CloudFormation templates in version control, you can run
cfn-lint using pre-commit hooks or by making it part of your continuous integration workflow.
Because these rules are written in Python, they can be as complex as you need them to be, but they
have drawbacks as well. Like Terraform’s custom variable validation, linting rules won’t tell you
about existing problems in resources that aren’t managed by CloudFormation, so they work best
when combined with a reactive tag audit and adjustment strategy.
Conclusion
19
Properly tagged resources will help you predict and control your costs, but your tagging strategy can’t
just be reactive. Having proactively enforced patterns will require an up-front investment, but will
save you time and money in the long-run.
Once you’ve adopted a tagging strategy and proactive enforcement method, the last piece of the
puzzle is catching up when you fall behind. In the final part of this guide, you’ll see how to audit and
find mistagged resources to ensure your tagging strategy continues to succeed in the future. If you’re
interested in getting help with your tagging reach out to our CTO at [email protected] to
receive a free tagging compliance report.
20
Part 3: Maintaining AWS Tags When You
Fall Behind
After adopting an AWS tagging strategy, you face two new challenges: finding improperly tagged
AWS resources and enforcing your tagging strategy going forward. In the previous part of this guide,
you saw how to enforce your AWS tagging strategy when creating new resources with Terraform or
CloudFormation. In this part, you’ll see how to find improperly tagged AWS resources.
Tagging your resources in AWS will help prevent misuse and maintain your infrastructure. When
combined with a tool like CloudForecast, tagging your resources can help you predict and control
your monthly spending, so it’s usually worthwhile to invest in tag maintenance.
Unfortunately, it’s easy for AWS tags to get out of date. If you’re adopting a new tagging strategy, you
likely have a lot of catching up to do, but even if you’ve been following one for years, things can get
stale. Your team may want to change the name of a tag or update the rules surrounding certain tags.
Developers may leave without handing off resources they managed, or someone might forget to
change an outdated tag on an AWS resource. Even the best AWS tagging policy will require ongoing
maintenance, especially as the number of resources you manage grows.
To help maintain your AWS tags, you should regularly audit them to make sure they’re still accurate.
At a small company, it might be possible to do these audits manually, but you’ll need to use tools to
help you automate the process as you grow. This article will show you six tools you can use to help
find and fix outdated AWS tags. You’ll see some of the use cases for each so you can decide which
ones are best for your organization. I’ll do a deep dive into the first two because they’re the most
commonly used today, but it’s a good idea to have a few options to choose from.
Cloud Custodian
Cloud Custodian is an open-source collection of scripts that help developers manage their public
cloud accounts. For this article, I’ll limit my focus to Cloud Custodian’s tag maintenance features in
AWS, but it supports a variety of use cases and most of the major cloud hosting providers.
One use case for Cloud Custodian is to find AWS resources that don’t comply with your tagging
policy. Let’s say you have an application consisting of an elastic load balancer and three EC2
instances like this:
21
And you want all your EC2 instances to:
Cloud Custodian’s policies are stored in YAML files so there’s no state to maintain outside of the
policy itself. To create a policy that will tell you if any of your EC2 instances aren’t in compliance with
the above rules, create a new policy file called policy.yml:
policies:
- name: ec2-tag-compliance-report
resource: ec2
comment: Report on ec2 instances without required tags
filters:
- or:
- "tag:env": empty
- type: value
key: "tag:contact"
op: ni
value: ["j-mark", "l-duke"]
22
- type: value
key: "tag:service"
op: ni
value: ["cart", "search"]
You can run Cloud Custodian from your local machine or set up a cron job to run it from a server on
a recurring basis. Once installed, run the above Cloud Custodian policy from your command line:
Cloud Custodian will output a “count” of the number of EC2 instances that are missing one of your
required tags and generate a report in your output folder (./ in this case) that includes more details
about the resources so you can remedy their tags.
Cloud Custodian can also perform actions to remediate or alert you to issues automatically. For
example, you may want to call a webhook that triggers a notification when an EC2 instance doesn’t
adhere to your tagging policy.
To add a webhook action, replace your policy.yml file with the following:
policies:
- name: ec2-tag-compliance-webhook
resource: ec2
comment: Trigger a webhook when ec2 instances are incorrectly tagged
filters:
- or:
- "tag:env": empty
- type: value
key: "tag:contact"
op: ni
value: ["j-mark", "l-duke"]
- type: value
key: "tag:service"
op: ni
value: ["cart", "search"]
actions:
- type: webhook
url: https://example.com/hooks?id=1
Now when you run Cloud Custodian, it will call https://example.com/hooks?id=1 if any EC2
instances fail to adhere to your policy. There are many examples in their docs for using filters and
actions to perform other tasks. While running Cloud Custodian locally from your command line is a
good way to test your configuration, you should ultimately deploy your policies to a server to run
23
them against your production environment on a schedule. You can even deploy Cloud Custodian
policies as a periodic Lambda function to help minimize the cost.
Using AWS Config, you can specify which resources should have tags and the expected values for
each tag. AWS Config allows you to run remediation steps when a violation is found so that your
team can quickly fix tagging mistakes, but it doesn’t prevent you from creating resources with
improper tags. For that, you’ll have to go back to the previous installment of this guide.
To reproduce the Cloud Custodian policy above, go to AWS Config > Rules > Add rule. Enter
“required-tags” in the search bar and select the required tags rule.
Name the rule, add a description, and select the resources you want this rule to apply to. In this case,
enter EC2: Instance only if you just want to make sure your EC2 instances are correctly tagged.
24
Next, enter the tags and allowed values that you want to enforce. To replicate the Cloud Custodian
policy above, set:
25
Skip the remediation details for now and click “Save.” You’ll be taken back to the list of AWS Config
Rules. The initial check will take a few minutes to run, so wait a bit then refresh the page. AWS
Config will show you a list of which resources are in compliance and which are not. Be aware that
AWS Config can take up to 6 hours to reindex your tags after they’ve been changed on your
resources.
If you want to call a webhook like you did for Cloud Custodian above, you will need to create an SNS
topic that calls a webhook. After creating the topic, edit the rule and select
AWS-PublishSNSNotification.
26
Add an IAM Role ARN with SNS access to the AutomationAssumeRole field, add a message, and
paste the TopicArn from your SNS topic. This will allow AWS Config to trigger the SNS topic
references when the config rule fails.
Click “Save.” Now, when new EC2 instances are found that fail this compliance check, the SNS topic
will call your webhook. This isn’t the only remediation step available, so read more about your
remediation options in the AWS Config docs.
While the AWS Config rules aren’t as powerful as Cloud Custodian’s, there are some advantages. It’s
built into AWS and works with CloudFormation. This means there are fewer extra files to manage,
and AWS Config provides a visual configuration timeline that will help you track down improper
tagging practices.
That said, there are other problems with it like its lack of support for some resources and cost. If
you’re deciding between AWS Config and Cloud Custodian, you might want to do a deeper dive into
each tool to determine which will work best for your team.
27
Retro Tag
Unlike the previous two tools, Retro Tag wasn’t built for finding untagged resources directly, but
instead, it focuses on helping you track down the creator of every resource in your AWS account. This
is a useful step in auditing your tags because the person who created the resource usually knows the
most about it. Once you find the creator, you can ask them how the resource is being used (if at all)
and put them in charge of fixing the tags.
Retro Tag is open-source and built on the Auto Tag engine by GorillaStack. It works by using AWS
Athena to export a CSV of your relevant CloudTrail events. The Retro Tag script uses this CSV to tag
any resources that still exist and don’t yet have an AutoTag_Creator or AutoTag_CreateTime
tag.
If you’re using Auto Tag to add creator tags to all new resources, Retro Tag is a great way to catch up
on resources that were created before Auto Tag was installed. That said, you don’t have to run both of
them together. You may just want to add creator tags to old resources so you can assign
responsibility for updating tags to the creator rather than doing it yourself.
Gold Fig
While Cloud Custodian and AWS Config audit your resources on the fly, Gold Fig takes a different
approach to the problem. It loads all your AWS resource data into a Postgres database and gives you
a command-line interface to run queries against it. Gold Fig is also open-source, so you can deploy it
to your local machine or a server that automatically syncs the data regularly.
Gold Fig doesn’t offer remediation actions like Cloud Custodian, but developers familiar with SQL
might find the query syntax more natural than learning another flavor of YAML. Because loading
your data from AWS takes several minutes, it also isn’t ideal for tracking real-time updates to your
resources. But, if you’d like to perform periodic audits on your resources using complex logic that’s
tricky to build in Cloud Custodian or AWS Config, Gold Fig might be worth a try.
28
free report, CloudForecast sends you a daily AWS cost report and savings plan so that you can spot
opportunities to save money on your AWS bill every month.
This compliance report will save you a lot of time as looking for gaps in your tagged resources can be
pretty mind-numbing. We’ll help you quickly discover untagged resources and sort them by cost so
you can fix the low-hanging fruit and increase compliance.
Conclusion
While managing all your resources in Terraform or CloudFormation will help you create resources
with the right tags in the future, you’re probably not starting your AWS account with a clean slate.
Maybe you started out creating resources manually via the CLI and only recently adopted
infrastructure automation. Maybe some teams are using CloudFormation, and others prefer the GUI.
At some point, you’ll need to find and fix your improperly tagged resources.
The tools above should help, but if you’re still lost, feel free to reach out to our CTO, Francois. We’d
love to help you set up your organization’s tags for success and are happy to offer a free tagging
compliance report as well.
29