Intercom merged twice as many pull requests per engineer in nine months. Same team. Same headcount. Same codebase.
That number deserves scrutiny, not because it's implausible, but because most companies trying AI development tools aren't getting anywhere near that result. The difference isn't the tool. It's the system built around it.
The Actual Numbers
100% adoption across all engineers, designers, product managers, and technical program managers. Everyone ships code now. The throughput doubled. The quality metrics held.
Brian Scanlan, Intercom's head of engineering, spoke about this in detail in a Lenny's Newsletter podcast this week. What he describes isn't a product endorsement. It's an operational transformation story.
The turning point came in December 2025, when Claude 3.5 Opus made a performance leap that changed the economics of AI-assisted development. Before that, the tools were interesting. After that, they were infrastructure. The distinction matters for understanding when to invest heavily and when to wait.
Intercom went all-in after that moment. Not every company should. But every company should know what signal they're waiting for before scaling up.
Why Most Companies Don't Get This Result
The companies I see trying AI development tools fall into two categories.
The first type buys subscriptions and tells engineers to figure it out. Usage is uneven. Results are anecdotal. The ROI conversation never gets resolved because nobody measured the baseline.
The second type builds a system. Intercom built a skills repository with enforced quality hooks. They built telemetry using Honeycomb and Snowflake to track skill usage, session data, and output quality across hundreds of engineers. They built accountability frameworks so adoption wasn't just permitted but expected.
The "flaky spec skill" they developed is a specific example worth understanding. It achieved what Scanlan describes as "100x capability" through end-to-end workflow design. That number isn't about the AI. It's about how precisely they built the workflow around it. The difference between a generic AI prompt and a purpose-built skill is the same as the difference between a spreadsheet and purpose-built software. Both use the same underlying technology. One is infrastructure. The other is a workaround.
What This Changes About Engineering Hiring and Capacity
If you accept that a well-configured AI development environment can double throughput, the implications for team structure are real.
A team of 20 engineers with the right AI infrastructure produces output equivalent to 40 engineers under the old model. That doesn't mean you hire 20 fewer engineers. It means you redirect the remaining capacity toward harder problems. The org structure question becomes about what the doubling is used for, not whether it's real.
The organizational challenge Intercom had to solve first was permission. Most engineering cultures resist systematic AI adoption initially because it feels like surveillance or pressure. The companies that get this right are the ones where leadership treats it as infrastructure investment with clear benefits for the engineers, not just for the quarterly velocity metrics.
The engineers at Intercom who adopted this earliest aren't being watched more closely. They're shipping more work that matters and spending less time on the parts of their job they found most tedious.
Build the measurement infrastructure first. Without knowing your baseline throughput per engineer, you can't know whether your AI tools are working. Then identify one high-frequency, high-pain workflow and build a skill for it. Prove the result. Expand from there.
The companies getting this right aren't moving faster on instinct. They're building systems that produce velocity as an output.
FAQ
How did Intercom achieve 2x engineering velocity with Claude Code?
They built an end-to-end system around the AI tools, not just subscriptions. Key elements included a custom skills repository with quality hooks, telemetry infrastructure using Honeycomb and Snowflake, and 100% adoption across all roles including designers and PMs. The result was doubled merged PRs per engineer over nine months while maintaining code quality standards.
What are Claude Code skills and why do they matter for productivity?
Skills in the Claude Code context are reusable AI-powered workflows built for specific tasks. Instead of each engineer prompting from scratch, skills encode best practices and quality standards into reusable processes. Intercom's "flaky spec skill" is one example. Building a skills repository turns ad-hoc AI usage into systematic productivity infrastructure.
How should engineering leaders start building AI development infrastructure?
Start with measurement. Build telemetry before you build skills. Without a baseline, you can't prove impact. Then identify one high-frequency, high-pain workflow and build a dedicated skill for it. Prove the result, then expand. The companies that scale AI development productivity fastest are the ones who invested in measurement infrastructure first.
