The 20 worst AWS annoyances and misdesigns
.webp&w=3840&q=75)
Here are twenty significant AWS "gotchas" that can cause unexpected problems in production:
-
IAM Permissions Propagation Delay - New IAM permissions can take up to 30 minutes to propagate, causing unexpected "access denied" errors after you've already updated policies.
-
S3 Strong Consistency Misconception - Before December 2020, S3 was eventually consistent. Legacy code might still have unnecessary workarounds that now cause problems.
-
CloudFormation Drift Detection Limitations - Doesn't detect drift for all resource types, giving false confidence about infrastructure state.
-
Lambda Cold Starts - The first invocation of a Lambda can be significantly slower, especially for JVM-based runtimes, causing timeout issues.
-
API Gateway 29-second Timeout - API Gateway has a hard 29-second timeout that cannot be increased, forcing workarounds for longer-running operations.
-
CloudWatch Logs Delayed Delivery - Logs can take several minutes to appear in CloudWatch, making real-time debugging nearly impossible.
-
RDS Connection Limits - RDS instances have connection limits based on instance size, but the AWS console doesn't make this clear.
-
ECS Task Definition Immutability - Task definitions cannot be modified after creation, requiring a complete replacement flow for simple changes.
-
Default VPC Deletion Consequences - If you accidentally delete your default VPC, many services that rely on it will fail to launch without clear error messages.
-
CloudFront Cache Invalidation Quotas - Limited to 1,000 free invalidations per month, with per-path charges afterward, encouraging poor cache practices.
-
CDK Resource Naming - CDK generates cryptic resource names that change with refactoring, causing unexpected resource replacements during deployment.
-
DynamoDB Hot Keys - Partition key design can lead to throttling if traffic concentrates on specific partitions, despite having sufficient overall capacity.
-
S3 Object Ownership Model - The default ACL behavior can cause objects uploaded by other accounts to be inaccessible to the bucket owner.
-
Lambda Environment Variable Size Limits - Limited to 4KB total for all environment variables, which is easy to exceed with configuration settings.
-
ALB Slow Rule Evaluation - Application Load Balancers evaluate rules sequentially, causing performance issues with many complex rules.
-
Step Functions Express vs. Standard - Two different execution models with different guarantees, pricing, and execution limits that aren't clear from the interface.
-
Route 53 Health Check Limitations - Cannot directly health check internal resources without additional infrastructure.
-
SQS FIFO Queue Lambda Limitations - Lambda won't automatically scale past 5 concurrent executions when processing SQS FIFO queues, causing unexpected bottlenecks.
-
RDS Snapshot Restoration Parameter Reset - When restoring from a snapshot, custom parameter groups are not maintained.
-
SES Sandbox Limitations - New accounts are placed in the "SES sandbox," limiting emails to verified addresses without clear warnings in the console.
These gotchas typically aren't well-documented in the main AWS documentation and often only become apparent after encountering them in production scenarios.