On Scale, Throughput, and the Normalization of Waste
Modern software systems routinely overestimate how much compute they need. This overestimation is not benign: it drives premature infrastructure scaling, masks inefficiencies, and eventually produces cost structures that feel inevitable rather than pathological. Benchmarks like HammerDB, when interpreted correctly, expose just how wide the gap is between actual hardware capability and typical business demand—and how often large cloud instances are late-stage symptoms rather than legitimate requirements.
This essay argues a simple thesis: xlarge-class machines already provide extraordinary capacity; if infrastructure size scales faster than business value, something is already deeply broken. Expensive 2xlarge–4xlarge deployments are not evidence of success or scale, but often post-hoc rationalizations of accumulated inefficiency.
1. HammerDB NOPM and Intuition for Scale
HammerDB’s NOPM (New Orders Per Minute) is frequently misunderstood. It is not a vanity metric; it is a concrete measure of sustained OLTP throughput under a standardized workload (TPC-C–like). Converting it into intuitive units immediately reframes most “scale” discussions.
- 1,000 NOPM ≈ 16.7 orders/sec ≈ ~1.4 million orders/day
- 10,000 NOPM ≈ 167 orders/sec ≈ ~14.4 million orders/day
These are not edge cases. Even single-digit thousands of NOPM already exceed what the vast majority of startups will ever process, even at peak. Tens of thousands of NOPM move into territory occupied by large consumer platforms and financial systems.
The key insight: NOPM grows much faster than real business demand. This asymmetry is the foundation for everything that follows.
2. r8g and the Power of a Single Core
AWS-published PostgreSQL benchmarks on Graviton (r8g) instances show approximately 17k–25k NOPM per core, depending on configuration. This is the critical, often overlooked detail: even one modern server-class core is extremely powerful.
Once this is acknowledged, several conclusions follow naturally:
- An xlarge instance (4–8 cores) represents vast headroom.
- Scaling beyond xlarge is not a casual or default decision.
- Most systems will encounter architectural or query-level bottlenecks long before they exhaust raw CPU.
In other words, if a system cannot live comfortably within xlarge-class capacity, the explanation is rarely “we need more cores.” Much more often, it is “we are wasting the ones we have.”
3. Start Local, Stay Local (If You Can)
Modern laptops are not toys. Even conservatively derated, a local development machine can deliver 10k–30k total NOPM at ~100ms latency. This estimate is intentionally pessimistic:
- It assumes no server-grade tuning
- No NUMA or IO optimizations
- No attempt to maximize sustained throughput
And yet, even this severely discounted figure represents millions of transactions per day.
High-end laptops (recent i9s, Apple Silicon Pro/Max) can comfortably exceed this. The implication is not that laptops replace production infrastructure, but that raw throughput is almost never the reason teams need to leave local environments early.
Staying local as long as possible has concrete benefits:
- Zero infrastructure tax
- Immediate feedback loops
- Forced efficiency
- Early exposure of bad queries and pathological access patterns
Teams should leave local setups for operational reasons—availability, collaboration, backups—not because they have “outgrown” the hardware.
4. Large–Xlarge Goes a Very Long Way
If local execution is not viable, large to xlarge instances are the correct default. They already provide:
- Multiple highly capable cores
- Memory footprints sufficient for most working sets
- Throughput far beyond early and mid-stage business needs
Importantly, xlarge should not be viewed as “small.” In 2025 hardware terms, it is already a serious machine. Treating it as a stepping stone rather than a destination is a conceptual error that leads directly to premature vertical scaling.
5. When Instance Size Outpaces Business Value
This is the central diagnostic:
If instance size scales faster than business value or revenue, the business is already deeply broken.
This does not mean that needing 16 cores is inherently wrong. There are legitimate reasons for CPU-heavy systems at relatively small scale. But if:
- The system needs large instances, and
- Those instances feel financially painful, and
- Downgrading is not possible without collapse,
then the problem is not cost. It is inefficiency.
Bad queries, over-fetching, N+1 patterns, missing indexes, fan-out reads, and poorly bounded workloads all consume CPU in ways that scale faster than user value. Vertical scaling does not fix these issues; it merely postpones their consequences.
6. “2xlarge Is Very Expensive” as a Late-Stage Symptom
The complaint “2xlarge–4xlarge is too expensive” is not an early warning sign. It is a late-stage manifestation.
By the time a system reaches this point:
- Inefficiencies have compounded
- Query paths are fragile and risky to change
- Performance is perceived as “hard-won”
- Larger instances feel justified because smaller ones no longer work
But this justification is retrospective. The system did not become inefficient because it used large instances; it used large instances because inefficiency had already been normalized.
Crucially, 2xlarge–4xlarge performance is not extraordinary. In raw throughput terms, it is comparable to what modern laptops can deliver. What makes it feel significant is not capability, but cost.
7. Post-Hoc Rationalization and the Normalization of Waste
When faced with large infrastructure bills, teams often shift their narrative:
“These instances are extremely powerful, so of course they’re expensive.”
This is post-hoc rationalization. It reframes a symptom as a virtue and converts accumulated waste into an implicit baseline. The alternative—admitting that the system consumes far more compute than its value justifies—is uncomfortable, so it is avoided.
Over time, this normalization makes inefficiency invisible. Large bills become “the cost of doing business,” and the original design decisions that caused them fade from memory.
Conclusion
HammerDB benchmarks, modern CPUs, and real-world workload data all point to the same conclusion: hardware capacity is abundant; efficiency is rare.
Xlarge instances already represent enormous power. Local machines, even conservatively evaluated, can handle workloads far beyond what most startups will ever need. When systems grow into 2xlarge–4xlarge territory without commensurate business value, the issue is not scale—it is waste that has been deferred, normalized, and finally monetized.
Large cloud bills do not appear suddenly. They are the late-stage symptoms of problems that began much earlier, when vertical scaling was used as duct tape instead of a last resort.