Scaling to 1 million users on AWS

1 User

You could start by deploying a simple application on a single box. For example, you may be a LAMP stack developer or a .NET developer and decide to host your web server, application server, and database on the same EC2 instance.

Next, you want to deploy this EC2 instance somewhere inside a secure virtual network, such as a public subnet inside a VPC. To host DNS records for your server, you could use Amazon Route53.

> 1 User

A quick solution is to vertically scale by spinning up a bigger box. EC2 instances can have up to 244 GB of RAM or 40 virtual cores, and upgrading to a more powerful instance is easy.

Problems with vertical scaling:

  • vertical scaling will run into a ceiling sometime.
  • you are using a single machine which means you don’t have a redundant server
  • your machine resides in a single AZ

To address the vertical scaling challenge, start with decoupling your application tiers. Application tiers are likely to have different resource needs and those needs might grow at different rates. By separating the tiers, you can compose each tier using the most appropriate instance type based on different resource needs. When I apply the concept of decoupling to the original architecture, which hosted the entire stack on a single EC2 instance, the new architecture will look like the following diagram.

Next, try to design your application so it can function in a distributed fashion. Store application state independently so that subsequent requests do not need to be handled by the same server. Once the servers are stateless, you can scale by adding more instances to a tier and load balance incoming requests across EC2 instances using Elastic Load Balancing (ELB).

> 10K Users

My goal is to achieve an architecture that will serve over 10,000 users. I start with the database options for creating an independent data tier, and I enlist the help of several AWS services along the way.

Database Options on AWS

When it comes to deploying databases on AWS, there are two general directions:

1. Use a managed database service such as Amazon Relational Database Service (Amazon RDS) or Amazon DynamoDB

2. Host your own database software on Amazon EC2

For the example that I walk you through in the following sections, I use RDS. I start with the assumption that I will need to extract relationships from the data to search for things such as the top five searched products or the number of users who visited last Tuesday and looked at a specific product category.

The following illustration shows an updated architecture. The green arrow between the master and standby databases represents the synchronous data replication when the Multi-AZ feature is enabled.

When your applications become popular and you need to serve a high number of read requests, you can scale the database tier with Amazon RDS Read Replicas. Read Replicas are available if you are using MySQL, PostgresSQL, or Amazon Aurora. RDS MySQL and RDS PostgresSQL allow up to five Read Replicas and leverage native replication capability of MySQL and PostgresSQL that are subject to replication lag. Amazon Aurora allows up to 15 Replicas and experiences minimal replication lag because Aurora Replicas use the same underlying storage. The following illustration shows an updated architecture with Amazon RDS Read Replicas.

Horizontal scaling at the application layer, RDS Multi-AZ, and RDS Read Replicas will allow your system to scale pretty far. But for serving over ten thousand users, I suggest we decouple the architecture even further.


• Design a horizontally scalable architecture across Availability Zones to increase application availability. AWS offers Elastic Load Balancing (ELB) to distribute traffic across EC2 instances.

• Leverage RDS features to achieve redundancy and increase availability with ease. Multi-AZ creates a standby database to increase availability. Read Replicas can be used to scale and increase performance for read-heavy workloads.

• Think about decoupling and spreading workload over appropriate AWS services to scale and to improve performance and efficiency. S3 is great for storing static assets; CloudFront helps you deliver content with lower latency; DynamoDB and ElastiCache allow you to offload session and cache data from a main architecture.

> 500K Users

Auto Scaling : After you shift workload components to the appropriate AWS services and decouple your application (as discussed in part 1 and part 2), you can introduce Auto Scaling to squeeze more efficiency out of your infrastructure.

How does Auto Scaling know when to automatically resize a group of EC2 instances in a tier of your application? You can write policies based on metrics or a time schedule. For example, you could write a metrics-based policy that adds more EC2 instances when CPU utilization has been above 60% for the past five minutes. Alternatively, you could have a policy that provisions and ensures a fixed number of instances every weekday at 9:00 AM.

The following illustration shows a simple, single-application tier that uses Auto Scaling to automatically scale web servers. You could have another Auto Scaling group for your application tier in private subnets as well. When used in combination with Elastic Load Balancing (ELB), Auto Scaling knows to register or deregister EC2 instances by using ELB-based health checks, and forwards traffic only to healthy EC2 instances.

> 1 Million Users

In the first three parts I discussed the following foundational AWS concepts:

  • Decoupling application tiers to prepare for horizon scaling
  • Choosing among the database options available on the AWS platform
  • Shifting components of workload to appropriate AWS services for better performance and efficiency
  • Using Auto Scaling to ensure that you have the correct number of EC2 instances available to handle the load for your application
  • Automating your software development lifecycle with AWS services

Let’s assume my application has become hugely popular, and it’s now getting close to a million visitors. Serving a million users and more is going to require that we bring together all the approaches I’ve discussed so far:

  • Deploying across multiple Availability Zones
  • Using Elastic Load Balancing (ELB) between tiers
  • Using Auto Scaling
  • Using service-oriented architecture
  • Serving content using appropriate AWS services (for example, local vs. Elastic Block Store vs. S3)
  • Caching off DB using ElastiCache
  • Moving state off tiers so you can horizontally scale

The following diagram illustrates usage of the aforementioned approaches.


Leave a Reply

Your email address will not be published. Required fields are marked *