- serverless integration platform
Amazon Web Services (AWS)Even though you've configured security groups, have you ever felt uneasy wondering, "Is this design really okay?" Often, this isn't due to a lack of knowledge, but rather a failure to organize your thoughts and make informed decisions, such as being unable to explain why 0.0.0.0/0 is open, or being vague when asked about the difference between NACL and security groups.
Security groups are not configuration items for listing rules. They are a design mechanism for determining "who is allowed to communicate with whom." If you follow configuration examples without understanding this premise, you will continue to be confused every time the environment changes.
This article will explain the structure and roles of security groups, why vulnerable configurations arise, and how to differentiate their responsibilities from those of NACLs.
AWS security groups are one of the first security features you'll encounter when using AWS. The term "virtual firewall" is often used without a clear understanding of what it does and where other mechanisms should be considered for further security needs.
AWS security groups are a mechanism that controls whether or not communication is allowed for resources such as EC2 instances. They are generally described as "virtual firewalls," but understanding them solely through this description can easily lead to an overestimation of their role.
Security groups can only control where, to what destination, and on which ports/protocols communication is permitted. Based on inbound and outbound rules, they determine whether or not to allow communication; in other words, they define the "conditions for the entry and exit points of communication."
On the other hand, security groups do not perform intrusion detection. They do not inspect the content of communications or detect abnormal behavior. It is important to clearly distinguish that their role is simply to decide "whether or not to allow it," and they do not monitor "what happens after it has passed."
If this aspect is left ambiguous, you might mistakenly believe that security groups alone are sufficient, which can easily distort the entire design.
Security groups are not designed to be applied to the entire VPC at once. They are intended to be used in conjunction with individual resources such as EC2, RDS, and ALB.
Because of this nature, it is more appropriate to view security groups not as a "wall protecting the entire network," but rather as "components for defining communication partners for each resource with a specific role." This is also why we consider separate groups for web servers, databases, and so on.
The important point is that even if you configure security groups, not all communications within your VPC will automatically become secure. If you don't understand which security groups are applied to which resources, the design is not truly effective.
The reason why configuring security groups can be confusing isn't because the mechanism itself is unclear, but because the assumptions under which that mechanism should be used haven't been clearly defined. Here, we'll organize basic concepts such as inbound/outbound and stateful security groups in a way that will help you make informed design decisions.
Security groups operate on an "permission model," which assumes that only explicitly permitted communications are allowed through, and all others are denied.
If you look at the settings screen without understanding this mechanism, you might feel confused because "no denial rules exist." However, in security groups, "not written = denial" is always true. In other words, adding a rule is similar to "making a decision to increase the number of exceptions."
A crucial design principle is to avoid symmetrical thinking between inbound and outbound traffic. Inbound traffic involves determining the external boundary—"who can access our network?"—while outbound traffic defines the internal-to-external behavior—"where this resource is allowed to communicate?"
If you configure the system without being aware of this difference, you'll likely end up with a design that strictly controls inbound traffic while unconditionally allowing outbound traffic.
Security groups are called "stateful" because they treat the round trip of communication as a single continuous flow. Return communication to a communication that is permitted inbound will automatically pass through even if it is not explicitly permitted outbound.
While this behavior is convenient, it can also easily lead to misunderstandings. This is because it can easily lead to conclusions such as "I don't need to configure outbound settings" or "It's fine to just allow everything for now."
However, from a design perspective, this is no reason to omit outbound settings. If you don't define where that resource is allowed to communicate, you cannot prevent unintended external communication.
Being stateful is not a feature to reduce configuration, but rather a mechanism to simplify communication relationships. If you misunderstand this premise, you'll end up accumulating settings without any judgment, simply because "it's working, so there's no problem."
Whenever security groups are discussed, the question of whether 0.0.0.0/0 is appropriate inevitably arises. Many articles condemn it as "dangerous" and "a setting that should never be used," but the fact that this setting repeatedly appears in practice is not simply due to technical negligence.
The problem lies in the fact that the configuration phase is entered before design decisions have been properly organized.
When we break down the reasons why 0.0.0.0/0 is used, the common causes are limited.
One example is when development proceeds before the requirements have been finalized.
Because it's not determined "who will access it" or "where the communication is coming from," it's not possible to specify a particular IP address or security group, and as a result, the decision is made to "allow everything for now."
Another possibility is that the communication partner cannot be identified.
When integration with external services or future changes to connection destinations are anticipated, the network may be left wide open with the understanding that it will be fixed later.
In all cases, these settings are not so much a configuration error as they are the result of postponing a decision. Ignoring this background and rejecting 0.0.0.0/0 will only lead to the same problem recurring in a different form.
The 0.0.0.0/0 setting should be avoided as a general rule. There is no reason to assume this setting as a permanent feature in security group design.
Nevertheless, the appearance of 0.0.0.0/0 in the field is often due to temporary situations where design decisions cannot keep up. During verification and troubleshooting, it is necessary to isolate communication conditions, and there are cases where the range is temporarily expanded for verification.
However, such measures should only be used in exceptional cases and should not be employed without a clear plan for reverting to the original state. If 0.0.0.0/0 persists in a production environment, it should be considered a situation where the design has not been properly organized before deployment, regardless of the appropriateness of the configuration.
The important point is not whether or not it's acceptable to use it, but the inability to explain why that setting was implemented. The longer the ambiguity is left unaddressed, the more difficult it becomes to impose restrictions later.
Many articles explain the differences between security groups and network ACLs, but focusing only on the differences in terminology and specifications often leaves the question of "which one should I use and how?" unclear. Here, instead of comparing their functions, we will organize the information by focusing on the responsibilities that each should fulfill.
— "Who is allowed to communicate with whom?"
The responsibility of a security group is clear: to define which parties a particular resource is allowed to communicate with.
From where does a web server accept access?
Which servers does the database only allow communication from?
In this way, security groups make decisions based on who the "parties involved in the communication" are.
Therefore, security groups are used by linking them on a resource-by-resource basis. From a design perspective, it's easiest to understand them as components that represent the relationships between resources with specific roles.
— "Is it alright to use this route?"
On the other hand, the responsibility of a network ACL is to control the communication path itself. It doesn't decide who is communicating, but rather "whether or not to allow communication to pass through this subnet."
Network ACLs are applied on a subnet basis and affect all traffic passing through them.
Due to its nature, it is not suitable for applications that require detailed representation of the relationships between specific servers.
Instead, it handles path-level control, such as "blocking all unexpected communication paths" and "restricting communication ranges in large units."
Security groups and network ACLs are not designed to be self-contained or independent of each other. This is because the targets they protect and the level of granularity of their decisions are fundamentally different.
Security groups define "relationships," while network ACLs define "pass-through conditions." Based on this division of roles, it becomes easier to make decisions such as "don't over-control with security groups" and "don't cram too many individual rules into network ACLs."
The important thing is to decide where and what to judge. If you can make that distinction, the two will not compete with each other and will coexist naturally in the design.
The reason security group designs often fail isn't a lack of knowledge about configuration options. In many cases, it's because operations begin without a clear understanding of the underlying concepts. Here, we'll address some common misconceptions repeatedly seen in the field and clarify where the misjudgments stem from.
While dividing security groups into smaller ones isn't inherently wrong, designing them with the idea that "more groups mean more security" can actually obscure the overall picture.
When the number of groups becomes too large, it becomes easy to run into situations where "it's difficult to keep track of what applies to which resource" and "similar rules are duplicated, making it impossible to manage the differences."
What we should really be considering is not the number of groups, but whether the communication relationships are organized according to their respective roles. Simply increasing the number of groups without clearly defined roles will not improve security.
Security group rules tend to increase over time. This is because decisions to "just go ahead" accumulate each time a problem is resolved or a new feature is added.
The problem is that the criteria for adding rules are not clearly defined. Without clarifying why a particular communication is necessary or what scope should be permitted, exceptions simply keep piling up.
As a result, a situation arises where "the purpose of the rule is unclear" and "it's impossible to decide whether it's okay to delete it," effectively fixing the design.
Security groups are not a one-time configuration. Communication requirements inevitably change with configuration changes and service additions.
Nevertheless, if the decision to "not touch it because it's working now" continues, unnecessary rules will remain in place, making it difficult to assess the risks.
What's crucial during the design phase is managing things with the assumption that they will change. If you don't incorporate operational practices such as regular reviews and not leaving rules with unclear justifications unaddressed, the design will remain incomplete.
As we've seen, the reason for confusion in security groups isn't a lack of understanding of the mechanisms. The problem lies in the fact that the system is designed and operated with ambiguous premises for decision-making. Finally, let's summarize some ways of thinking that will help reduce inconsistency in decision-making in practice.
When designing security groups, there are some basic prerequisites that must be understood before considering detailed rules.
The first is the [source of the communication].
Where is the expected source of the communication? Is it a specific IP address, another AWS resource, or is it necessary to temporarily broaden the scope? If this judgment remains ambiguous, the rules will inevitably become excessive.
The second is the [communication destination].
To which resource is that communication headed? Is it a web server, or a stepping stone for management purposes? If the role of the communication destination isn't clearly defined, the purpose of separating security groups becomes less meaningful.
The third is the required period of time.
Is the communication a permanent necessity, or is it a temporary solution? Failure to distinguish between these two can lead to exception rules remaining in place.
If these three points are clear, the decision of "should I proceed?" and "how far should I proceed?" will be largely determined before even looking at the settings screen.
Security groups are not a one-time configuration. Communication requirements inevitably change as the configuration changes and the system grows.
The key is to manage things with the assumption that they will change. This means documenting the reasons for adding rules, reviewing them regularly, and deleting unnecessary communications. If you design without considering these operational aspects, you'll end up with a buildup of rules that are difficult to interpret.
It is also important to maintain a state where you can explain to a third party "why this communication is permitted." This ultimately reduces the burden of audits and reviews and helps maintain the soundness of the design.
Security groups are both the front line of defense and a place where a history of decisions is accumulated. Whether or not you treat them with this understanding will make a difference in long-term operation.
AWS security groups are not a collection of configuration items. They are a design mechanism for determining "who is allowed to communicate with whom."
Even if you understand the mechanisms and terminology, if the premises for your decisions aren't clearly defined, risky settings like 0.0.0.0/0 will naturally arise. The problem isn't the setting itself, but what you're deciding before you start setting it.
The responsibility of a security group is to define the relationships between resources. This differs from the role of a network ACL, which controls the entire communication path; neither can function independently. Separating these responsibilities simplifies the design.
To avoid confusion in practical situations, it's essential to first clarify the prerequisites—the source of communication, the destination, and the required duration—and treat security groups as operational targets that can be modified. If these decisions are clearly articulated, the settings will naturally follow.
Security groups are not something to be "configured correctly," but rather something to be kept in a state where the intent can be explained. Having this perspective is what makes the difference between design and operation.