Engineering10 min read

From 0 to 10k Followers:
The Engineering Behind Scale

How we built a system capable of handling millions of webhook events per second without crashing.

PE

PostEngageAI Team

Engineering Team

Feb 20, 2026

System Architecture: High Throughput

When we launched PostEngage.ai, we thought handling 100 requests per minute was a lot. Last week, we peaked at 12,000 requests per second during a major influencer campaign.

Scaling isn't just about adding more servers. It's about rethinking how data flows through your system. Here is the story of how our architecture evolved.

The Bottleneck: Database Locks

In V1, every incoming webhook from Instagram triggered a direct database write.Webhook -> API -> Postgres.

It worked fine until we onboarded our first customer with 500k followers. When they posted, thousands of comments flooded in within seconds. Our database CPU spiked to 100%, and connections timed out. We dropped events. It was a disaster.

The Solution: Async Queues

We decoupled ingestion from processing. Now, the flow looks like this:

Webhook -> API (Edge) -> Redis Queue -> Worker Pool -> Postgres
Edge Ingestion
We use Vercel Edge Functions to accept the webhook instantly (return 200 OK) and push the payload to a queue. Latency: <50ms.
Worker Autoscaling
Our worker fleet scales horizontally based on queue depth. If the queue grows, we spin up more processors.

Distributed Rate Limiting

Sending replies is harder than receiving webhooks because Instagram has strict API limits. We implemented a "Token Bucket" algorithm using Redis Lua scripts.

Before any worker sends a reply, it must "buy" a token from the user's bucket in Redis. This operation is atomic. If the bucket is empty, the job is re-queued with a delay. This ensures we never exceed Instagram's rate limits, no matter how many parallel workers are running.

Our Current Stack

  • Framework: Next.js 15 (App Router)
  • Database: PostgreSQL (Neon Serverless)
  • Queue: Upstash Redis + QStash
  • Hosting: Vercel
  • Monitoring: OpenTelemetry + Grafana

The beauty of this stack is that it scales to zero. If no one is posting, we pay almost nothing. When a viral post hits, we scale up instantly to handle the load.

Read This Next

View all articles