The 20 worst AWS annoyances and misdesigns

The 20 worst AWS annoyances and misdesigns

Here are twenty significant AWS "gotchas" that can cause unexpected problems in production:

  1. IAM Permissions Propagation Delay - New IAM permissions can take up to 30 minutes to propagate, causing unexpected "access denied" errors after you've already updated policies.

  2. S3 Strong Consistency Misconception - Before December 2020, S3 was eventually consistent. Legacy code might still have unnecessary workarounds that now cause problems.

  3. CloudFormation Drift Detection Limitations - Doesn't detect drift for all resource types, giving false confidence about infrastructure state.

  4. Lambda Cold Starts - The first invocation of a Lambda can be significantly slower, especially for JVM-based runtimes, causing timeout issues.

  5. API Gateway 29-second Timeout - API Gateway has a hard 29-second timeout that cannot be increased, forcing workarounds for longer-running operations.

  6. CloudWatch Logs Delayed Delivery - Logs can take several minutes to appear in CloudWatch, making real-time debugging nearly impossible.

  7. RDS Connection Limits - RDS instances have connection limits based on instance size, but the AWS console doesn't make this clear.

  8. ECS Task Definition Immutability - Task definitions cannot be modified after creation, requiring a complete replacement flow for simple changes.

  9. Default VPC Deletion Consequences - If you accidentally delete your default VPC, many services that rely on it will fail to launch without clear error messages.

  10. CloudFront Cache Invalidation Quotas - Limited to 1,000 free invalidations per month, with per-path charges afterward, encouraging poor cache practices.

  11. CDK Resource Naming - CDK generates cryptic resource names that change with refactoring, causing unexpected resource replacements during deployment.

  12. DynamoDB Hot Keys - Partition key design can lead to throttling if traffic concentrates on specific partitions, despite having sufficient overall capacity.

  13. S3 Object Ownership Model - The default ACL behavior can cause objects uploaded by other accounts to be inaccessible to the bucket owner.

  14. Lambda Environment Variable Size Limits - Limited to 4KB total for all environment variables, which is easy to exceed with configuration settings.

  15. ALB Slow Rule Evaluation - Application Load Balancers evaluate rules sequentially, causing performance issues with many complex rules.

  16. Step Functions Express vs. Standard - Two different execution models with different guarantees, pricing, and execution limits that aren't clear from the interface.

  17. Route 53 Health Check Limitations - Cannot directly health check internal resources without additional infrastructure.

  18. SQS FIFO Queue Lambda Limitations - Lambda won't automatically scale past 5 concurrent executions when processing SQS FIFO queues, causing unexpected bottlenecks.

  19. RDS Snapshot Restoration Parameter Reset - When restoring from a snapshot, custom parameter groups are not maintained.

  20. SES Sandbox Limitations - New accounts are placed in the "SES sandbox," limiting emails to verified addresses without clear warnings in the console.

These gotchas typically aren't well-documented in the main AWS documentation and often only become apparent after encountering them in production scenarios.