How To Reduce Cloud Costs Without Sacrificing Performance
Cloud cost overruns are one of the most common operational surprises for growing SaaS companies. A $300/month AWS bill at launch can become $3,000/month at moderate scale without any deliberate cost control. The good news: most cloud cost problems have the same root causes, and fixing them consistently reduces bills by 30–60% without touching performance.
The Most Common Sources of Unexpected AWS Bills
Before optimising, identify where the money is actually going. These five categories account for the majority of unexpected AWS costs:
- Over-provisioned compute: Instances running at 5–15% average CPU utilisation that were sized for peak load and never right-sized
- Unused resources: Unattached EBS volumes, old snapshots, unused Elastic IPs, and forgotten load balancers running at full cost
- Data transfer (egress) costs: AWS charges for data leaving the cloud — between regions, to the internet, or between services. High-traffic applications pay significant egress fees
- On-demand pricing for predictable workloads: Running production databases and application servers on on-demand pricing instead of reserved instances costs 30–40% more
- Over-retained logs and backups: CloudWatch Logs, S3 backups, and RDS snapshots accumulate indefinitely by default
Right-Sizing: The Highest-ROI Optimisation
Right-sizing means running each workload on the smallest instance type that meets its performance requirements. Most production systems are significantly over-provisioned:
- Use AWS Compute Optimizer — it analyses CloudWatch metrics and recommends right-sized instance types. Free to use.
- Target CPU utilisation of 40–70% under normal load — this leaves headroom for spikes without the waste of 5% utilisation
- Web application servers: t3.medium or t3.large cover most startup SaaS products under 50,000 DAU
- Background workers (Celery, queue processors): t3.small or t3.medium — these processes are typically CPU-light
- RDS databases: db.t3.medium covers most early-scale products. Switch to db.m5 series only when query latency requires it
- Use Graviton instances (ARM-based) — AWS's Graviton3 instances deliver 20–40% better price-performance than equivalent x86 instance types
Reserved Instances and Savings Plans
On-demand pricing is intended for unpredictable workloads. Production infrastructure running 24/7 should use reserved capacity:
- EC2 Reserved Instances (1-year, no upfront): 30–35% discount over on-demand. Commit to an instance family and region.
- EC2 Savings Plans (1-year): 20–25% discount. More flexible than Reserved Instances — covers any instance type in a region.
- RDS Reserved Instances (1-year, no upfront): 30–40% discount. Apply to your production database immediately.
- When to commit: After 3 months of stable production traffic. Committing too early, before the right instance size is confirmed, locks you into the wrong size.
- Spot Instances: 60–90% discount for interruptible workloads. Use for background processing, batch jobs, and CI/CD runners — not for web servers.
Caching to Reduce Database and Compute Costs
Caching reduces the number of expensive database queries and compute cycles, directly lowering costs:
- ElastiCache (Redis): A single cache.t3.micro instance ($15–$30/month) can eliminate 50–80% of repetitive database reads, reducing RDS instance requirements
- CloudFront CDN: Caching static assets and API responses at the edge eliminates origin server load. CloudFront data transfer is 30–50% cheaper than EC2 egress
- S3 + CloudFront for static assets: All images, CSS, JS should be served from S3 + CloudFront, not from application servers
- Application-level caching (in-memory): Cache expensive computed results in the application process for the duration of a request or a short TTL
Storage and Data Transfer Optimisation
Storage and transfer costs are frequently overlooked:
- S3 Intelligent-Tiering: Automatically moves objects to cheaper storage classes (S3-IA, S3 Glacier) when they have not been accessed recently. No retrieval fee for Intelligent-Tiering.
- CloudWatch Logs retention: Set a retention policy (30–90 days) on all log groups. Indefinite retention of logs is a common source of growing S3 and Logs costs.
- RDS snapshots: Automated backups are retained for 7 days by default. Manual snapshots accumulate indefinitely — audit and delete old ones.
- Inter-AZ data transfer: AWS charges for data transferred between Availability Zones. Architect services to minimise cross-AZ traffic; place tightly-coupled services in the same AZ.
Implementation Checklist
- Run AWS Cost Explorer — identify top 5 cost line items before optimising
- Run AWS Compute Optimizer — accept right-sizing recommendations for EC2 and RDS
- Purchase Reserved Instances or Savings Plans for all 24/7 production workloads
- Delete unattached EBS volumes, unused Elastic IPs, and forgotten load balancers
- Set CloudWatch Logs retention policy on all log groups (30–90 days)
- Move static assets to S3 + CloudFront if served from application servers
- Enable S3 Intelligent-Tiering on buckets with infrequently accessed objects
- Tag every AWS resource — untagged resources are unattributable costs
Common Mistakes to Avoid
- ✗Optimising before identifying the top cost drivers — fix the biggest line items first, ignore the small ones
- ✗Purchasing 3-year reserved instances before production load is stable — 1-year commitments match startup iteration cycles better
- ✗No resource tagging strategy — without tags, you cannot attribute costs to products, teams, or environments
- ✗Deleting "old" snapshots without verifying they are not referenced by a restore process or compliance requirement
- ✗Treating cloud cost as a finance problem rather than an engineering problem — the most impactful optimisations require code changes, not billing configuration
Frequently Asked Questions
Need help applying these principles to your project? We build exactly this for startups worldwide.