- serverless integration platform
Amazon Web Services (AWS)When it comes to security logs, CloudTrail, CloudWatch, and various detection services come to mind. However, in practice, settings are often configured without clarifying "what should be enabled" or "how much data is sufficient," resulting in situations where the logs are unusable when needed.
Security logs have two distinct roles: they serve as "evidence" to explain facts later, and they also play a crucial role in "detection" to quickly identify anomalies. Designing security logs without considering this distinction will result in a system that is mediocre in both areas, leading to confusion during operation and investigation.
This article organizes AWS security logs from a design decision perspective. Focusing on CloudTrail, it explains what needs to be decided to make them function as evidence, and where to differentiate the design when early detection is required. While covering the concept of minimal configuration and common pitfalls in practice, it organizes an overall picture of usable log design.
AWS security logs offer numerous configurable options and well-established procedures. However, configuration often proceeds without a clear understanding of the purpose of logging, leading to confusion and difficult decisions. This article outlines two common pitfalls in practical implementation.
Security logs don't automatically become valuable just by being enabled. In many workplaces, there's a sense of security simply because "logs are being collected," without actually verifying whether they can be used for investigations or explanations.
For example, even if you try to find out "who did what and when" after an incident occurs, you may not be able to access the necessary information because the data acquisition scope is insufficient or the retention period is too short. What is important for logs is not just that they "exist," but that they "can be accessed when needed," and if this premise is not shared, design decisions will become ambiguous.
AWS has separate mechanisms for recording operation history, accumulating and visualizing logs, and detecting anomalies. However, in practice, these are often treated collectively as "security logs."
As a result, designs arise where roles and expectations don't align, such as expecting real-time detection from CloudTrail or comprehensive evidence from CloudWatch Logs. If you proceed with configuration without clarifying the role of each service, the feeling of "I can't use it as I expected" will become apparent later.
The first thing you need to do is to clearly define the purpose of each log and service. If you leave this ambiguous, all subsequent design decisions will become inconsistent.
The first thing to understand when designing security logs is that not all logs serve the same purpose. AWS security logs are broadly used for two purposes: "audit trails" and "early detection." If you design them without being aware of this difference, you're likely to end up with logs that are mediocre at both.
The role of audit logs is to create a system that can accurately explain events that occurred afterward. When unauthorized access or configuration errors are suspected, it is necessary to be able to track "who," "when," and "what operation" was performed in chronological order.
For this application, comprehensiveness and archiving are more important than real-time performance. It is essential that all operations are recorded without omission, retained for the required period, and preserved in a format that can be explained to third parties. Evidence logs serve as the basis for decisions during audits and post-incident investigations.
On the other hand, the purpose of early detection logs is to notice anomalies before the damage escalates. When suspicious logins or unexpected operations occur, it is necessary to detect them as quickly as possible and use that information to initiate an initial response.
In this application, immediacy and notification are prioritized over comprehensiveness. In many cases, early detection is more important, even if the information is somewhat rudimentary. Expecting early detection using the same design as audit logs is likely to lead to problems such as delayed detection or malfunctioning alerts.
Security logs should be categorized into those "for later explanation" and those "for immediate detection." Failure to make this distinction will result in an ambiguous overall log design.
When using security logs as "evidence," CloudTrail is central. However, simply enabling CloudTrail is not enough to provide sufficient evidence. Only when design decisions are made regarding the scope, conditions, and method of storage can the logs withstand post-incident investigations and accountability.
CloudTrail is a mechanism for recording operations performed within your AWS account. It covers operations performed via the Management Console, CLI, SDK, and APIs through other services, and its key feature is that it allows you to track "which credentials were used," "which operation was performed," and "when it was performed."
This mechanism allows you to review the history of configuration changes and resource operations in chronological order. However, this assumes that the operations to be recorded are correctly configured. If you use it without understanding which operations are recorded and how far back you can track them, you may end up in a situation where "the information you want is not recorded."
CloudTrail records operations in two main categories: "administrative events" and "data events." Administrative events primarily involve configuration changes for IAM, EC2, VPC, etc., and are the first things to check when investigating a security incident.
On the other hand, data events target operations that are closer to actual data, such as manipulating S3 objects or invoking Lambda functions. These are often disabled by default, and some operations are not visible unless they are enabled.
The scope of data acquisition depends on how much factual information you want to preserve as evidence. You need to decide whether management events alone are sufficient, or if data events are also required, based on your intended use.
CloudTrail event history is only retained for 90 days by default. After this period, past operation history can no longer be viewed. This limitation is insufficient for audit compliance or long-term investigations.
Furthermore, configuring settings for only a specific region may cause operations performed in other regions to be missed. In AWS environments where multi-region usage is common, it is necessary to consider data collection that covers all regions.
If you intend to use it as evidence, you need to have a system in place for long-term log storage and a design that ensures it can be accessed at any time for the required period.
Simply having evidence logs "existing" is insufficient. In the event of an incident, it will also be questioned whether those logs have been deleted or tampered with afterward.
Therefore, it is necessary to establish a system that prevents users from freely deleting or modifying logs, including the log storage location and permission settings. Separating log acquisition and log storage, and designing it so that it cannot be accessed from normal operational accounts, increases the reliability of the logs as evidence.
For CloudTrail to function as an evidence trail, it's crucial to consider three aspects together: "acquisition," "storage," and "preservation." If any of these are lacking, the logs will become unreliable when needed.
CloudTrail is extremely useful as an audit trail, but it has limitations when designed for early detection. If you entrust monitoring to it without understanding this premise, you will end up in a situation where "logs are being collected, but the problem is noticed too late."
CloudTrail does not provide immediate notifications of operations. Events are recorded first and then distributed and stored as logs, which may result in a delay of several minutes to over ten minutes.
While this isn't a problem for reviewing history later, it can be critical in cases where you want to detect anomalies in real time. This is because if there is a delay between an unauthorized operation and its detection, the damage may escalate during that time.
It's important to understand that CloudTrail is a mechanism for "recording the facts of an operation," and it's not primarily designed to "detect anomalies on the spot."
While not all security incidents require immediate attention, there are cases where delays in initial response can have significant consequences. For example, unexpected logins, permission changes, and operations on critical resources are incidents that should be identified as quickly as possible.
In such cases, a practical approach is to use CloudTrail as the primary evidence source while also employing another mechanism to ensure immediacy. The key is not to try to satisfy both requirements with CloudTrail alone.
By separating the concepts of evidence trails and early detection, it becomes possible to design systems that are appropriate for each. Correctly defining the role of CloudTrail leads to a security log design that is neither excessive nor insufficient.
Security logs aren't necessarily better just because there are more of them. The important thing is to first decide what purpose the logs will serve and then consider the minimum configuration necessary to achieve that purpose. Here, we'll organize our thinking based on three common practical purposes.
When the purpose is auditing or internal control, the most important thing is "being able to explain it afterward."
It is necessary to keep records that allow a third party to understand who performed what actions and when.
The core of this system is an audit trail log that comprehensively records the history of operations. In particular, it is essential that all critical control operations, such as permission changes and configuration changes, are recorded without fail and retained for the necessary period. Real-time functionality is not a requirement; a configuration prioritizing comprehensiveness and retention is appropriate.
Incident investigations require not only knowing "what happened," but also tracing "how it happened." To achieve this, it's necessary to be able to track the facts, including not only operation logs but also traces of communications and access.
For this purpose, it's necessary to consider the evidence logs as the core, while also incorporating other logs that provide clues for the investigation. The key is whether it's possible to confirm "what happened before and after this operation" during the investigation, and the focus should be on creating a structure that allows for retrospective analysis of causal relationships.
The goal in daily operations is to detect anomalies as quickly as possible. Rather than meticulously tracking every operation, the key is being able to notice "unusual conditions."
For this application, immediacy and notification are paramount. Rather than relying entirely on audit logs, it's more practical to design a configuration that can detect anomalies by focusing on operationally important points. By designing separate logs for early detection and logs for later review, you can reduce the operational burden while increasing effectiveness.
Even if security logs are collected at considerable time and cost, they will be largely unused in practice if they are poorly designed. Here, we will summarize common failure patterns seen in the field. These are not so much technical problems as they are the result of insufficient design judgment and proper assumptions.
H3: Logging is enabled without a clear purpose.
The decision to "just enable it for now" may seem safe at first glance, but in practice, it can be counterproductive. If logs are collected without a clear purpose, no one will be able to explain what the logs are for.
As a result, it becomes impossible to determine which logs to review, making them unusable in investigations and audits. Unless you decide "what to use it for" and "what decisions to make," it simply becomes stored data.
Even if logs are stored for a long period, they are meaningless if you cannot access the necessary information. In practice, it is common to encounter situations where "the logs exist, but we don't know where or how to search for them."
This is due to the fact that the log format and storage location are not organized, and that the system is not designed to allow tracking by time or by individual operation. If the system is to be used for evidence gathering or investigation, it is essential to design it with the assumption that it can be traced later.
Security logs are often not referenced during normal operation and are only used when a problem occurs. If it's not clearly defined who will review them and in what order, the response will be delayed.
Designs understood only by specific individuals, or operations that don't consider handover procedures, pose significant risks in actual incident response. Logs only function effectively in practice when they are designed with the intended audience and usage scenarios in mind.
AWS security logs don't offer a simple list of "what to enable." The key is deciding which decisions to make first. Finally, let's summarize the core design principles.
Before designing security logs, it's important to clarify the purpose, not the technical elements. Specifically, you need to define the purpose of use, such as "Do you want to fulfill audit and accountability requirements?", "Do you want to facilitate incident investigations?", or "Do you want to quickly detect anomalies in daily operations?".
Without a clear objective, decisions regarding the scope of data acquisition, retention period, and whether or not to send notifications will all be made on an ad-hoc basis. Since it is difficult to redesign logs later, aligning the decision-making criteria from the beginning will ultimately lead to a minimal configuration and reduced operational burden.
CloudTrail is essential for recording evidence of operations in the AWS environment. However, there's a difference between using CloudTrail as the core of your operations and relying entirely on it.
While CloudTrail is a powerful audit trail, it's not suited for designs that handle immediate detection or operational monitoring on its own. By separating the roles of audit trails and early detection, and positioning CloudTrail as a "foundation for later explanation," you can create a design that is neither too long nor too short.
Security logs are not a panacea. The key is to use CloudTrail as the central tool and supplement it as needed. Understanding this approach is essential for AWS security logs to truly function effectively in practical applications.