How we scan a 50,000-field Salesforce org in 12 minutes

The Metadata API wasn't designed for parallelism, so we built a subscriber pool with bounded concurrency. A look at the trade-offs.

When we first shipped OrgLens, our largest scan target was a 9,000-field test org that we seeded ourselves. A month later, our first real customer asked us to scan one with 48,200.

The Salesforce Metadata API is a beautiful thing, but it was not built for parallelism. Each request returns a single object, takes 400–1,200ms, and is rate-limited per connection. Naively, 48,200 fields would take 5–16 hours.

The subscriber pool

We maintain a pool of six long-lived Salesforce sessions per connected org. Each session gets a slice of the object graph, and we fan out requests within that slice using a bounded concurrency of eight. Subscribers periodically hand off work to keep the tail latency flat.

const pool = new SubscriberPool({ size: 6, concurrency: 8 });
const slices = partitionGraph(metadataGraph);
await Promise.all(slices.map(s => pool.run(s)));

What we gave up

Fairness. A slow partition can delay the final commit by 30 seconds while faster slices are already done. For our use case, that's fine — nobody notices whether the scan takes 11 or 12 minutes, but a 2× regression would be noticeable.

If you're curious how the HMAC signing on the audit log works, that's our next post.