Scale & performance

Scaling Consumers

Horizontally scale on queue depth, not on CPU.

Why it matters

The fundamental signal that you need more consumers is queue backlog growth, not consumer CPU. Scale on ApproximateNumberOfMessages or ApproximateAgeOfOldestMessage.

The right scaling signal

Queues are buffers. Scale consumers when backlog grows — ApproximateNumberOfMessagesVisible, ApproximateNumberOfMessagesNotVisible, or ApproximateAgeOfOldestMessage. CPU-based scaling lags because workers may be blocked on I/O.

Backlog per instance = visible messages ÷ running consumers.
Target a constant backlog-per-instance (e.g. 5–10 messages).

Horizontal scaling

Add more consumer instances (EC2, ECS, Kubernetes pods) when backlog exceeds threshold. SQS distributes messages across consumers automatically for standard queues.

Lambda event source mapping

Lambda polls SQS on your behalf and scales by adding up to 60 more pollers per minute, up to 1,000 concurrent invocations (account limits apply). Configure batch size and maximum concurrency per event source mapping.

Standard: batch size 1–10 (up to 10,000 with batching window).
FIFO: batch size 1–10; concurrency bounded by active MessageGroupIds.

FIFO scaling limits

Adding consumers does not help a single hot MessageGroupId — only one consumer processes that group at a time. Scale by increasing the number of distinct message groups, not just consumer count.

In-flight quota

Each consumer holds messages in-flight until deleted. Too many slow consumers can hit the ~120,000 in-flight limit on standard queues. Scale consumers AND ensure timely deletion.

Gotchas

!Scaling on CPU lags the actual problem by minutes — backlog explodes before CPU rises.
!FIFO: per-MessageGroupId concurrency is 1 — adding consumers does NOT help a hot group.
!More consumers with slow processing can worsen duplicate deliveries (visibility timeout pressure).

Related concepts

Backpressure Queue Throughput Short vs Long Polling