How to Build a Microservices Architecture on AWS | Configuration Patterns and Service Selection Guide

Eye-catching image
table of contents

As system features are added and released more frequently, more and more companies are moving towards microservices in search of systems that are more resistant to change and have a high degree of independence.Amazon Web Services (AWS)allows you to build a flexible microservices architecture by combining execution platforms such as Amazon ECS, Amazon EKS, and AWS Lambda with integration services such as API Gateway and Step Functions.

In this article, we will organize and explain the key points to keep in mind when selecting services to design and build microservices on AWS, typical configuration patterns, and design issues to be aware of when implementing them.

Microservices Basics and AWS Implementation Concepts

Microservices is a design concept that divides a large system into multiple independent small services, each of which can be developed and operated independently. To support this concept, AWS allows you to build independent execution environments, communication methods, and data management for each service.

Below we will summarize three points that are particularly important during design: "division criteria," "communication method," and "data independence."

Splitting criteria (using domain-driven design)

Microservices are divided based on "business domains" rather than functional units. The Domain-Driven Design (DDD) approach defines business boundaries as "bounded contexts" and extracts each context as an independent service.

On AWS, these boundaries are used to separate API Gateway, ECS tasks, Lambda functions, DynamoDB tables, etc. for each service. By clearly separating areas with different change frequencies and scope of responsibility, you can reduce the cost of coordinating between teams and parallelize releases.

Communication and Coupling (REST/gRPC vs. Event-Driven)

The communication method between microservices is an important design element that determines the degree of coupling. Synchronous communication such as REST and gRPC has clear requests and responses, making it easy to handle and suitable for systems that require real-time performance. In AWS, a common configuration is to combine Amazon API Gateway for externally exposed APIs with App Mesh or Service Connect for internal communication.

On the other hand, event-driven asynchronous communication reduces coupling and increases fault tolerance. By using Amazon EventBridge, Amazon SQS, or AWS Step Functions, you can operate each service independently using an event as a trigger. This configuration is effective for systems that require control of processing order and tolerance to delays.

Data isolation for each service

The principle of microservices is to maintain data independence. If multiple services share a single database, the scope of impact of changes will expand, making it difficult to isolate the cause of a failure.

In AWS, each service has its own data store, and Amazon Aurora, RDS, DynamoDB, S3, or other options are selected depending on the purpose. If there is a reference relationship, access is via API or asynchronous replication is performed using event delivery. In addition, to maintain consistency, the Saga pattern is introduced, and a design that allows for eventual consistency is adopted.

Selection of execution platform

When running microservices on AWS, the choice of execution platform directly affects the stability of the configuration and operational costs. We will make an appropriate selection based on the characteristics of each platform, scale, team structure, and future scalability.

Amazon ECS/Fargate: Easy to operate and standard

Amazon ECS is suitable for situations where you want to reduce operational load and promote container operations. When used in conjunction with Fargate, server management is unnecessary and you can specify and deploy only the resources necessary to run containers.

Because it can be built using only standard functions, including load balancing and log collection, it is easy to adopt as a foundation for the initial introduction of microservices. It is an easy option to use for small to medium-sized systems when the team does not have specialized knowledge of Kubernetes operation.

Amazon EKS: Flexible and Built for Scale

Amazon EKS is suitable for large-scale systems and organizations with advanced requirements. Because it allows you to manage configurations using the standard Kubernetes API, you can operate systems with consistent operations, even when using multilingual services or systems with complex network control.

It offers a high degree of freedom, allowing you to implement a service mesh and build your own CI/CD pipeline, and is a platform that works well with companies that already use Kubernetes internally. While it allows for detailed control, the burden of cluster management is high, so it is necessary to implement it after having put in place a system in place.

AWS Lambda: Event-driven, small-scale

AWS Lambda is suitable for situations where you need to execute small processing units quickly. Because it executes functions in response to events rather than running servers all the time, it is a platform that works well with services that experience large fluctuations in access volume.

It is also suitable for highly independent processes such as batch processing and image conversion. It is easy to develop and expand in a short time, but it is not suitable for services that run for a long time or involve complex state management, so it is necessary to clearly define the purpose when using it.

Comparison table: operational burden/flexibility/cost

To get an overview of the features of the execution platform, organizing them in terms of operational load, flexibility, and cost will make it easier to grasp the direction of your selection.

基盤operational loadelasticitycostSuitable cases
ECS/FargateLowMedium to slightly highMedium-sized services that require stable operation with a standard configuration
EKShighhighMedium to highLarge-scale systems that require a high degree of freedom
LambdaLowestLow to mediumOptimized according to traffic volumeSmall event-driven services

Typical configuration patterns

Amazon API Gateway + BFF + ECS

This configuration is often adopted by medium-sized web services. The API Gateway acts as the entrance for the public API, and a BFF (Backend for Frontend) optimized for each client runs on ECS.

BFF isolates the processing for each client, making it less likely that changes will affect the entire system. It also allows for deployment on a service-by-service basis in line with front-end changes, making it easier to promote continuous improvement.

Amazon EKS + App Mesh (gRPC communication)

In a large-scale microservice environment, the combination of EKS and App Mesh is ideal. App Mesh manages inter-service communication and unifies traffic control, retries, visualization, and more.

gRPC enables high-speed communication even among services that use a mix of languages ​​and frameworks. This configuration is used in environments that require high availability and complex service networks.

Amazon EventBridge + AWS Lambda + AWS Step Functions

This configuration is suitable for event-driven processing and business logic including workflows. EventBridge enables loosely coupled integration between services, Lambda executes unit processing, and Step Functions manages a series of processes.

It is suitable for processes that are not dependent on timing, such as asynchronous processing, batch processing, and back-office automation. It is also characterized by its ability to keep the amount of code small and easy to start and expand.

Common design points

When operating microservices on AWS, it is necessary to unify not only the design of each individual service but also the infrastructure design that supports the entire system.

Network design (VPC, ALB/NLB, PrivateLink)

In a microservices environment, a network design that clearly separates routes for communication between services and those for external access is essential. Each service is separated within a VPC, and an ALB or NLB is used for ingress communication. Using PrivateLink for communication between internal services allows services to work together without exposing them outside the VPC. The basic policy is to have a configuration that allows only necessary communication while clearly defining network boundaries.

Authentication and authorization (Amazon Cognito, IAM roles)

Authentication and authorization are designed separately, with Cognito for user authentication and IAM roles for inter-service permission management. By adopting Cognito, user management and token issuance can be unified.

Meanwhile, backend service integration uses IAM roles and assigns least privileges to each service, allowing services to operate safely while limiting unnecessary access.

Observability (Amazon CloudWatch, X-Ray, log normalization)

With microservices, the cause of a failure may span multiple services. It is effective to aggregate metrics and logs with CloudWatch and visualize the processing paths between services with X-Ray. Also, standardizing the log format across each service makes analysis and troubleshooting easier. By increasing observability, you can maintain stability while reducing operational load.

Deployment (AWS CodePipeline, Blue/Green)

In microservices, where updates are made on a service-by-service basis, a standardized deployment method is necessary. Using CodePipeline, you can unify the flow from build to deployment. Furthermore, by adopting blue/green deployment, you can verify the new version before switching over, minimizing the impact of release. The more frequently services are updated in an environment, the more effective it is to have a well-organized deployment infrastructure.

Reliable and cost-optimized

Microservices require both reliability design and cost optimization to ensure availability while continuing to use resources efficiently. AWS has a range of mechanisms to support these two points, and combining them appropriately leads to stable operation.

Scaling and fault-tolerant design (Auto Scaling, retry control)

Because service load fluctuates depending on the time of day and events, introducing an auto-scaling mechanism can ensure stability. Amazon ECS, EKS, and Lambda can all be integrated with Auto Scaling, automatically adjusting the number of instances and tasks based on factors such as CPU, memory, and queue length.

Additionally, assuming that failures will occur when communicating between services over a network, we implement controls such as retries, backoffs, and circuit breakers. This creates a configuration that can withstand momentary failures.

Cost optimization (AWS Graviton, Spot, AWS Savings Plans)

In environments that will be operated for a long time, cost optimization through resource selection is effective. If you want to reduce CPU costs, using an instance type based on AWS Graviton may result in lower prices with equivalent performance. Furthermore, in environments where stable operation is a requirement, you can reduce expenses by making effective use of Savings Plans.

On the other hand, it is more efficient to combine Spot instances with batch processing or re-executable services. By using multiple options depending on the purpose, you can reduce costs while maintaining service quality.

Phased migration and common mistakes

Replacing an existing system with microservices all at once is a heavy load and there is a risk of problems occurring during the migration. By adopting a step-by-step division method, you can proceed with modernization while maintaining safety. Understanding the failures that tend to occur during migration will make it easier to create a stable release plan.

Safely split with the Strangler Fig pattern

The Strangler Fig pattern is useful when splitting a large monolithic system. This is a method of migrating a part of an existing system by wrapping it in a new service. By extracting a small area and making it an API, and gradually switching traffic to the new service, the migration proceeds while maintaining existing functionality. This makes it possible to split the system with a limited impact range, avoiding situations where the entire service is affected during the migration.

Examples of failures due to shared/synchronized databases

One of the common problems when migrating to microservices is when multiple services end up sharing a database without being able to separate it. When relying on a common database, schema changes and increased loads tend to affect the entire system, and independence cannot be maintained.

Furthermore, relying on batch integration for synchronization can lead to inconsistencies due to differences in update timing. By ensuring that each service owns its own data and switching to integration via API, you can move to independent operation.

Countermeasures for oversights in monitoring and authority management

As the number of services increases, the monitoring targets and IAM permissions become more complex. Lack of standardized monitoring settings can lead to delayed detection of anomalies, and starting operations with insufficient permission design can lead to access control errors. Ensure observability by combining log and trace infrastructure such as CloudWatch, AWS Config, and X-Ray, and separate IAM roles with least privileges per service. Standardize your settings to ensure stable operation after migration.

FAQ

Amazon ECS vs. Amazon EKS: Which Should You Choose?

ECS is a container platform optimized for AWS, and is suitable for cases where you want to reduce operational burden. When used in conjunction with Fargate, server management is unnecessary. EKS is suitable for those who want to utilize Kubernetes, have multi-cloud requirements, or have advanced operational requirements. The selection criteria are whether or not you have existing Kubernetes assets.

How should you use Amazon API Gateway or ALB?

API Gateway specializes in publishing and managing APIs and is suitable when advanced control such as authentication and rate control is required. ALB focuses on L7 routing to web apps and container services. API Gateway is suitable for strict API requirements, while ALB is suitable for simple app ingress.

When should asynchronous processing be implemented?

Asynchronous processing is effective when you want to maintain the response speed of user operations or when internal processing takes a long time. Processing such as batch processing, external API integration, and file conversion can be stabilized by asynchronous processing using EventBridge or SQS. It is also suitable when you want to avoid direct dependency between services.

My Feelings, Then and Now

To build a microservices architecture on AWS, it is essential to consider not only basic design such as service division criteria, communication methods, and data independence, but also the execution infrastructure, network, authentication, and monitoring in a consistent manner.

AWS offers a wide range of options, primarily ECS, EKS, and Lambda, allowing you to combine the optimal configuration to meet your requirements. By gradually migrating, streamlining coupling and dependencies, and improving observability and operability, you can achieve a configuration that is resistant to change. Clarifying the role of each service and proceeding with the design while keeping load and cost in mind will lead to stable operation.



Kazuki Kato
The person who wrote the article
Kazuki Kato

Serverworks Co., Ltd. Marketing Department, Marketing Section 1 After working as a sales representative for an independent ISP and SIer, optimizing customer systems and networks, he joined Serverworks. Since joining the company, he has worked on development standardization projects for an electric power carrier and proposed and implemented an in-station reading system for a railway operator. He is currently in charge of event marketing and inside sales. His hobby is washing cars. AWS Certified Database – Specialty (DBS)

We offer end-to-end solutions to address all your AWS-related challenges.

Image of a city nightscape intersecting with blue lines of light symbolizing a digital network