ECS-ComposeX

ECS ComposeX aims to make life easy to take your application to AWS ECS, with using AWS Fargate as the primary focus (still, allows to run on EC2 should you need to).

AWS AppMesh integration

AWS AppMesh is a service mesh which allows you to define how services talk to each other at an application L7) level, and optionally, TCP (layer 4) level. It is extremely powerful

Since the beginning of the project, we have been using AWS Cloud Map to create a private DNS Hosted Zone linked to the VPC created at the same time. This allowed us to very simply register into the PHZ (private hosted zone) via Service Discovery.

We are going to use these entries to make a 1-1 mapping between our services defined in the services section of the docker-compose file and the nodes listed in the x-appmesh section.

AppMesh uses envoy as a side-car proxy that will capture our services packets and route these to their defined backends. Using AWS AppMesh empowers developers to declare how services are supposed to communicate together, what to do in case of errors, and administrators can define whether or not the traffic between all the components should be done using TLS termination end-to-end, to ensure no man-in-the-middle attacks could happen.

The syntax for AppMesh in ECS ComposeX is a mix of Istio, Envoy and AWS AppMesh definitions.

See also

x-appmesh

Services autoscaling integration

You can now define scaling for your ECS Services using * CPU / RAM Target Tracking scaling * SQS Messages (visible) depth with step scaling.

For example, we want to scale our front-end based on CPU usage and our backend, dealing with queues, based on messages numbers.

services:
  frontend:
    ports:
      - 80:80
    image: my-nginx
    deploy:
      replicas: 2 # by default I want 2 containers
    x-configs:
      scaling:
        range: "1-10" # 1 to 10 containers to deploy for the service
        target_scaling:
          cpu_target: 80 # Means 80% average for all containers in the service.
  backend:
    image: my-worker
    deploy:
      replicas: 1  # Initially I want 1 container running to make sure everything is working
    x-configs:
      scaling:
        range: "0-10" # I can have between 0 to 10 containers. 0 because I am happy not paying when nothing to do

x-sqs:
  jobs-queue:
    Properties: {}
    Settings: {}
    Services:
      - name: frontend
        access: RWMessages
      - name: backend
        access: RWMessages
        scaling:
          steps:
            - lower_bound: 0
              upper_bound: 10
              count: 1
            - lower_bound: 10
              upper_bound: 20
              count: 2
            - lower_bound: 20
              count: 21

As you can see we defined scaling for SQS only on the backend, as we don’t need to scale the frontend based on that. Also we set the count for final step to 21, which is higher than the range indicated.

Our frontend will be managed by ECS itself which will be ensuring that the average CPU usage across the service remains under 80%.

Hint

In composex, you must define a generic range first, and if you override it in the scaling, it will take the highest count of all scaling policies.

Note

Scaling with target tracking based on ELBv2 metrics is coming too.

Fargate CPU/RAM auto configuration

When you want to create services on ECS, you first need to create a Task Definition. Among the IAM permissions and the network configuration, the Task definition also defines how much CPU and RAM you want to have available for all your containers in the task.

If you have only one service, you might as well just not put any limits at the Container Definition level, and let it use all the available CPU and RAM defined in the Task Definition.

Hint

The Task definition CPU and RAM is the maximum CPU and RAM that your containers will be able to use. The amount of CPU and RAM in AWS Fargate is what determines how much you are paying.

But when you start to add side-cars, such as Envoy, X-Ray, or your WAF, your reverse-proxy, you want to start setting how much CPU and RAM these containers can use out of the Task Definition.

In docker-compose (or with swarm), you already have the ability to define the CPU limits and reservations you want to give to each individual service in the compose file.

To help having to know the different CPU/RAM settings supported by AWS Fargate, ECS ComposeX, if defined, will automatically use the limits and reservations configuration set in your Docker compose file, and determine what is the closest CPU/RAM configuration that will allow your services to run into.

Hint

Setting at least the reservation values so your containers are guaranteed some capacity in case other containers get to use more resources than expected.

See also

deploy reference.

We have the following example:

---
# Blog applications

version: '3.8'
services:
  rproxy:
    image: ${IMAGE:-nginx}
    ports:
      - 80:80
    deploy:
      replicas: 2
      resources:
        reservations:
          cpus: "0.1"
          memory: "32M"
        limits:
          cpus: "0.25"
          memory: "64M"
    depends_on:
      - app01

  app01:
    image: ${IMAGE:-nginx}
    ports:
      - 5000
    deploy:
      resources:
        reservations:
          cpus: "0.25"
          memory: "64M"
    environment:
      LOGLEVEL: DEBUG
      SHELLY: ${SHELL}
      TERMY: "$TERM"
    links:
      - app03:dateteller

  app02:
    image: ${IMAGE:-nginx}
    ports:
      - 5000
    deploy:
      resources:
        reservations:
          cpus: "0.25"
          memory: "64M"
    environment:
      LOGLEVEL: DEBUG

  app03:
    image: ${IMAGE:-nginx}
    ports:
      - 5000
    deploy:
      resources:
        reservations:
          cpus: "0.25"
          memory: "64M"
    environment:
      LOGLEVEL: DEBUG

We have CPU and RAM limits set for both limits and reservations. So we know that we can use the limits, add them up, and this will indicate us our CPU configuration.

Hint

In docker compose, you indicate the CPU as a portion of vCPU. A value of 1.0 means 1024 cycles, or 1vCPU. A value of 0.25 equals to 256 cycles, which equivals to .25 of a vCPU.

We get: * 0.75 vCPU (limits) * 192MB of RAM.

The closest configuration for Fargate that will cater for the amount of vCPU, is 1024. With 512 only, we could run low in cpu cycles.

So then, from there, we know that Fargate will allow for a minimum of 2GB of RAM. So our CPU/RAM configuration will be 1024 CPU cycles and 2048MB of RAM.

Now, let’s say we know that our rproxy (NGINX based) will only need .1 CPU at most and 128M of RAM, and we want to make sure that the application container, does not take all the CPU and RAM away from it, but also that it should not go over these limits.

So we are going to set these limits for the rproxy container.

Hint

If you do not set the reservations, the container could potentially free compute resources to the benefit of others, but at the risk of having none available.

Now, let’s say we know our application will use a minimum of 256M, and up to .25 of a CPU.

Let’s count: * .1 vCPU (limit+reservation) and .25 (reservation). We get 0.35vCPU. * 128MB RAM (limit+reservation) and 256M (reservation), We get 284MB.

The closest configuration for Fargate is .5vCPU and 1024MG of RAM. But, also, our application container can use up to 1024-128 = 896MB of RAM, as we did not set a limit. For some applications where you are not totally sure of the RAM you might need, this is a good way to keep for free space, just in case.

Note

Chances are, if you are using so low CPU/RAM for your microservice, you might be running it in AWS Lambda!

Hint

You might think that for the CPU you need, ie. 1vCPU, which means you need at least 2GB of RAM for the appropriate Fargate profile, is a lot of RAM wasted.

However, in this configuration, the CPU represents ~80% of the costs (29.5$+6.5$=36$).

Multiple services, one microservice

Regularly developers will build locally multiple services which are aimed to work together as a group. And sometimes, these services have such low latency requirements and dependency on each other, that they are best executed together.

In our example before, where we use NGINX to implement webserver logic, configuration and security, and leverage the power of a purpose-built software, as opposed to re-implement all that logic directly in your application, we might to run these two together.

On your workstation, when you run docker-compose up, it obviously is going to run it all locally. However, by default, these are defined as individual services.

To allow multiple services to be merged into a single Task Definition, and still treat your docker images separately, you can use a specific label that ECS ComposeX will recognize to group services into what we called a family.

ECS already has a notion of family, so I thought, we should use that naming to group services logically.

The deploy labels are ignored on a container level, therefore, none of these tags will show when you deploy the services.

Hint

The labels can be either a list of strings, or a “document” (dictionary).

Here is an example where we use the label, both as a string (requires the `=` to be present to define key/value) and a dictionary. The family for this case is **app01

---
# Blog applications

version: '3.8'

services:
  rproxy:
    deploy:
      labels:
        ecs.task.family: bignicefamily
  app01:
    secrets:
      - testsecret
    deploy:
      labels:
        ecs.task.family: bignicefamily
      replicas: 2
    environment:
      name: toto
      LOGLEVEL: INFO
      ACCOUNT_ID: "${AWS::AccountId}"
      REGION: "${AWS::Region}"
    x-configs:
      scaling:
        range: "1-10"
        target_scaling:
          cpu_target: 80
      use_xray: True
      network:
        is_public: True
        lb_type: application

  app02:
    x-configs:
      scaling:
        range: "0-20"

  app03:
    x-configs:
      scaling:
        range: "1-10"
        allow_zero: True
        target_scaling:
          cpu_target: 75
          memory_target: 80
          disable_scale_in: False
          scale_out_cooldown: 60
      use_xray: True
      iam:
        boundary: arn:aws:iam::aws:policy/PowerUserAccess
        policies:
          - name: somenewpolicy
            document:
              Version: 2012-10-17
              Statement:
                - Effect: Allow
                  Action:
                    - ec2:Describe*
                  Resource:
                    - "*"
                  Sid: "AllowDescribeAll"


x-appmesh:
  Properties:
    MeshName: root
  Settings:
    nodes:
      - name: app03
        protocol: http
      - name: app02
        protocol: http
      - name: bignicefamily
        protocol: http
        backends:
          - dateteller # Points to the dateteller service, not router!
    routers:
      - name: dateteller
        listener:
          port: 5000
          protocol: http
        routes:
          http:
            - match:
                prefix: /date
                method: GET
                scheme: http
              nodes:
                - name: app02
                  weight: 1
            - match:
                prefix: /date/utc
              nodes:
                - name: app03
                  weight: 1
    services:
      - name: api
        node: bignicefamily
      - name: dateteller
        router: dateteller

x-dns:
  PrivateNamespace:
    Name: mycluster.lan
  PublicNamespace:
    Name: lambda-my-aws.io

x-sqs:
  queueA:
    Properties:
      QueueName: abcd
      RedrivePolicy:
        deadLetterTargetArn: queueB
        maxReceiveCount: 4
    Settings:
      EnvNames:
        - QUEUEA
    Services:
      - name: app03
        access: RWMessages
        scaling:
          steps:
            - lower_bound: 0
              upper_bound: 10
              count: 1
            - lower_bound: 10
              upper_bound: 20
              count: 2
            - lower_bound: 20
              count: 21

  queueB:
    Properties:
      QueueName: x-y-z
    Settings:
      EnvNames:
        - QUEUEB
    Services:
      - name: rproxy
        access: RWMessages
      - name: app02
        access: RWMessages
        scaling:
          steps:
            - lower_bound: 0
              upper_bound: 10
              count: 1
            - lower_bound: 10
              upper_bound: 20
              count: 2
  queueC:
    Properties:
      FifoQueue: True
      QueueName: x-y-z
    Settings:
      EnvNames:
        - QUEUEB
    Services:
      - name: rproxy
        access: RWMessages

x-sns:
  Topics:
    topicA:
      Properties: {}
      Services:
        - name: bignicefamily
          access: Publish
        - name: app03
          access: Publish
    topicB:
      Properties:
        Subscription:
          - Endpoint: queueB
            Protocol: sqs

x-rds:
  dbA:
    Properties:
      Engine: "aurora-mysql"
      EngineVersion: "5.7.12"
    Settings:
      EnvNames:
        - DBA
    Services:
      - name: bignicefamily
        access: RW
      - name: app03
        access: RW

x-dynamodb:
  tableA:
    Properties:
      AttributeDefinitions:
        - AttributeName: "ArtistId"
          AttributeType: "S"
        - AttributeName: "Concert"
          AttributeType: "S"
        - AttributeName: "TicketSales"
          AttributeType: "S"
      KeySchema:
        - AttributeName: "ArtistId"
          KeyType: "HASH"
        - AttributeName: "Concert"
          KeyType: "RANGE"
      GlobalSecondaryIndexes:
        - IndexName: "GSI"
          KeySchema:
            - AttributeName: "TicketSales"
              KeyType: "HASH"
          Projection:
            ProjectionType: "KEYS_ONLY"
          ProvisionedThroughput:
            ReadCapacityUnits: 5
            WriteCapacityUnits: 5
      ProvisionedThroughput:
        ReadCapacityUnits: 5
        WriteCapacityUnits: 5
    Services:
      - name: app03
        access: RW
      - name: bignicefamily
        access: RO

  tableB:
    Properties:
      AttributeDefinitions:
        - AttributeName: "Album"
          AttributeType: "S"
        - AttributeName: "Artist"
          AttributeType: "S"
        - AttributeName: "Sales"
          AttributeType: "N"
        - AttributeName: "NumberOfSongs"
          AttributeType: "N"
      KeySchema:
        - AttributeName: "Album"
          KeyType: "HASH"
        - AttributeName: "Artist"
          KeyType: "RANGE"
      ProvisionedThroughput:
        ReadCapacityUnits: "5"
        WriteCapacityUnits: "5"
      GlobalSecondaryIndexes:
        - IndexName: "myGSI"
          KeySchema:
            - AttributeName: "Sales"
              KeyType: "HASH"
            - AttributeName: "Artist"
              KeyType: "RANGE"
          Projection:
            NonKeyAttributes:
              - "Album"
              - "NumberOfSongs"
            ProjectionType: "INCLUDE"
          ProvisionedThroughput:
            ReadCapacityUnits: "5"
            WriteCapacityUnits: "5"
        - IndexName: "myGSI2"
          KeySchema:
            - AttributeName: "NumberOfSongs"
              KeyType: "HASH"
            - AttributeName: "Sales"
              KeyType: "RANGE"
          Projection:
            NonKeyAttributes:
              - "Album"
              - "Artist"
            ProjectionType: "INCLUDE"
          ProvisionedThroughput:
            ReadCapacityUnits: "5"
            WriteCapacityUnits: "5"
      LocalSecondaryIndexes:
        - IndexName: "myLSI"
          KeySchema:
            - AttributeName: "Album"
              KeyType: "HASH"
            - AttributeName: "Sales"
              KeyType: "RANGE"
          Projection:
            NonKeyAttributes:
              - "Artist"
              - "NumberOfSongs"
            ProjectionType: "INCLUDE"

    Services:
      - name: app02
        access: RW
    Settings:
      EnvNames:
        - TABLEB
        - tableb

  tableC:
    Lookup:
      Tags:
        - name: tableC
    Services:
      - name: app02
        access: RO

x-kms:
  keyA:
    Properties: {}
    Settings:
      Alias: alias/keyA
    Services:
      - name: bignicefamily
        access: EncryptDecrypt

  keyB:
    Properties:
      PendingWindowInDays: 14
    Settings:
      Alias: keyB
    Services:
      - name: app02
        access: SQS
      - name: app03
        access: EncryptOnly
      - name: bignicefamily
        access: DecryptOnly

x-tags:
  costcentre: abcd
  contact: you@me.com

x-vpc:
  Create:
    VpcCidr: 10.21.42.0/24
    SingleNat: True

#  Lookup:
#    VpcId:
#      tags:
#        - Name: vpcwork
#    AppSubnets:
#      tags:
#        - vpc::usage: application
#    StorageSubnets:
#      tags:
#        - vpc::usage: storage
#    PublicSubnets:
#      tags:
#        - vpc::usage: public


x-cluster:
#  Properties:
#    CapacityProviders:
#      - FARGATE
#      - FARGATE_SPOT
#    ClusterName: testabcd
#    DefaultCapacityProviderStrategy:
#      - CapacityProvider: FARGATE_SPOT
#        Weight: 4
#        Base: 2
#      - CapacityProvider: FARGATE
#        Weight: 1

  Lookup: test2

secrets:
  testsecret:
    external: true
    x-secrets:
      Name: /path/to/my/secret

  testabcdsecret:
    file: /dev/null
    x-secrets:
      Name: /nowhere
      LinksTo:
        - EcsTaskRole

But then you might wonder, how come are the permissions going to work for the services?

Remember, the permissions are set at the Task definition level. So any container within that service, will get the same permissions.

However, for the database as an example, which creates a Secret in AWS Secrets Manager, which we would then expose to the service with the Secrets attribute of the Container Definition, ECS ComposeX will specifically add that secret to that container only. Equally, for the services linked to SQS queues or SNS topics (etc.), the environment variable providing with the ARN of the resource, will also only expose the value to the container set specifically.

In case you wanted to allow an entire family of services to get access to the resources, you can also give, as the service name in the definition, the name of one of your families defined via the labels.

For example,

services:
  worker01:
    image: worker01
    deploy:
      labels:
        ecs.task.family: app01

  worker02:
    image: worker02
    deploy:
      labels:
        ecs.task.family: app01

x-sqs:
  Queue01:
    Properties: {}
    Services:
      - name: app01
        access: RWMessages

ACM Certificates auto-create for public services

AWS CloudFormation now supports to auto-validate the Certificate by adding on your behalf the CNAME validation entry into your Route53 hosted zone.

See also

x-acm