Python Backend Development8 min read · March 2026Published Mar 2026

Common Backend Mistakes That Cause Scaling Problems

Most backend scaling problems are not infrastructure problems. They are code problems that infrastructure cannot fix. Adding more servers to a backend with an N+1 query problem makes the problem proportionally worse, not better. These are the eight mistakes that most reliably turn a working backend into a bottleneck — and the specific changes that fix them.

Mistake 1: The N+1 Query Problem

The N+1 problem occurs when a query fetches a list of N records and then makes N additional queries to fetch related data for each record. A query that looks correct in development becomes catastrophic in production:

Example: Fetching 100 orders and then querying the database for each order's customer — 101 queries instead of 2
Fix in SQLAlchemy: Use joinedload() or selectinload() for eager loading of related records
Fix for raw queries: Use a JOIN or a single IN query to fetch all related records in one round trip
Detection: SQLAlchemy's echo=True mode or Django Debug Toolbar reveals query counts per request
A single endpoint with an N+1 problem can go from 50ms at 10 records to 5,000ms at 1,000 records

Mistake 2: Missing Database Indexes

A query without an index on a table with 1 million rows performs a full table scan — reading every row to find the matching ones. This is the single most common cause of slow database queries in production:

Every foreign key column must have an index (PostgreSQL does not create them automatically)
Every column used in a WHERE, ORDER BY, or JOIN clause on a large table needs an index
Composite indexes (multiple columns) are needed for queries that filter on multiple columns simultaneously
Use EXPLAIN ANALYZE in PostgreSQL to identify which queries are missing indexes
The fix is one migration: CREATE INDEX CONCURRENTLY — concurrent creation does not lock the table

Mistake 3: No Connection Pooling

PostgreSQL has a limit on simultaneous connections (typically 100–200). Without connection pooling, each request opens a new connection and closes it on completion. At 50 concurrent users, this saturates the database:

Use asyncpg's built-in connection pool for FastAPI (min_size=5, max_size=20 for most apps)
For larger scale, use PgBouncer as a connection pooler in front of PostgreSQL
Symptom: "FATAL: remaining connection slots are reserved" errors under load
Connection pool size should be approximately: (number of workers × 2) + 1

A backend without connection pooling will collapse under its first traffic spike regardless of server size. This is a day-one configuration requirement, not a performance optimisation.

Mistake 4: Synchronous Operations in Async Routes

Mixing synchronous and async code in FastAPI routes blocks the event loop and eliminates the concurrency advantage:

Using synchronous database drivers (psycopg2) in async routes blocks the event loop for the duration of every query
Using requests library instead of httpx for external API calls blocks the event loop
Using time.sleep() instead of asyncio.sleep() blocks the event loop
Fix: audit every import in your routes — psycopg2 → asyncpg, requests → httpx, time.sleep → asyncio.sleep

Mistake 5: No Caching for Repeated Expensive Queries

An endpoint that runs the same expensive query on every request regardless of whether the underlying data changed wastes database resources proportionally to traffic:

Cache reference data (categories, plans, country lists) in Redis with a 5–60 minute TTL
Cache user permission sets after authentication — avoid re-querying role/permission tables on every request
Cache aggregate queries (dashboard counts, totals) with a 30–300 second TTL
Use cache-aside pattern: check cache → if miss, query DB → store in cache → return result

Mistake 6: Heavy Processing in the Request-Response Cycle

Any operation that takes more than 200ms should not happen synchronously in the HTTP response cycle:

Email sending: send to a task queue, return 200 immediately
PDF generation: offload to Celery, return a job ID, poll for completion
Image processing: use S3 pre-signed URLs for direct upload, process async
Report generation: async job with webhook or polling — never block the HTTP response
Webhook delivery: always async — external endpoints may be slow or unavailable

Implementation Checklist

Run EXPLAIN ANALYZE on your 10 most-called queries — fix missing indexes first
Connection pooling is configured (asyncpg pool or PgBouncer)
No synchronous database drivers or HTTP libraries are used in async routes
N+1 queries are eliminated with eager loading or single batch queries
Redis caching is in place for reference data and expensive aggregates
Background tasks handle email, PDF, webhooks, and any operation over 200ms
Load test your API before launch — identify bottlenecks before real users do

Common Mistakes to Avoid

✗Adding more servers before fixing the code — horizontal scaling multiplies N+1 problems proportionally
✗Caching everything without understanding cache invalidation — stale cache causes data integrity bugs
✗Not measuring before optimising — use profiling (Pyinstrument, cProfile) to find the actual bottleneck
✗Over-indexing — too many indexes slow down writes; index only the columns you query on
✗Ignoring slow query logs — PostgreSQL's log_min_duration_statement setting surfaces every slow query automatically

Frequently Asked Questions

What is the most common cause of slow Python API performance?+

The N+1 query problem is the single most common cause of slow Python API performance in production. It occurs when a list endpoint makes one query to fetch N records and then N additional queries to fetch related data. A list of 1,000 users with their associated subscription plans results in 1,001 database queries instead of 2. The fix is SQLAlchemy's eager loading (joinedload/selectinload) or a JOIN query. Detection requires query count monitoring, not just response time monitoring.

How do I know if my backend has a scaling problem before it fails?+

Run a load test before launch using tools like Locust (Python) or k6. Simulate 50, 100, and 500 concurrent users and measure: response time percentiles (p50, p95, p99), error rate, database query count per request, and database connection count. Problems to look for: response time that grows linearly with users (N+1 pattern), connection pool exhaustion errors, and queries taking >100ms that were fast at low load. Monthly load testing as part of your development process surfaces problems before users report them.

How much can database indexes improve query performance?+

A database query on an unindexed column that performs a full table scan on a 1-million-row table typically takes 500–5,000ms. The same query with a proper B-tree index takes 1–10ms — a 100–1,000× improvement. Index impact is proportional to table size and query selectivity (how many rows the query filters out). On tables under 10,000 rows, the index benefit is minimal. On tables over 100,000 rows, missing indexes are always the first thing to investigate for slow queries.

Work with us

Need help applying these principles to your project? We build exactly this for startups worldwide.

Audit Your Backend Architecture →

Related guides

Designing Scalable Backend Architectures With Python

10 min read

→

Why FastAPI Is Becoming the Preferred Choice for Modern SaaS Products

7 min read

→

FastAPI vs Django: Which Is Better for Startups?

9 min read

→