Home/Newsletter/The Numbers AI Vendors Use Are Wrong
Edition #1

The Numbers AI Vendors Use Are Wrong

Dan Toma·April 2, 2026·4 min read
The Numbers AI Vendors Use Are Wrong
Key Takeaway

The benchmark scores that AI vendors publish measure isolated task performance. They don't measure what happens when a human and an AI work together on your actual problem, in your actual context.


FAQ

Are any AI benchmarks actually useful for business decisions?

Some benchmarks are more useful than others. Domain-specific benchmarks that test performance on tasks close to your actual use case are meaningfully better than general capability scores. The issue is most commonly cited benchmarks are the general ones. Always ask vendors if they have benchmarks specific to your use case or industry.

How long does a proper AI evaluation take?

A minimum viable evaluation for a business use case typically takes two to four weeks if done rigorously. That includes defining the evaluation tasks, running the tests, gathering feedback from the people who'll actually use the tool, and analyzing the results. Rushing this process is one of the most common causes of failed AI deployments.

What questions should I ask an AI vendor about their benchmarks?

Ask specifically: what benchmark were these scores measured on, when was the evaluation conducted, who conducted it, and whether they have any independent evaluations from customers in similar industries. Vendors who've done solid work will have good answers. Those who haven't will deflect to the chart.

Subscribe to The Weekly Vibe

Every Tuesday. 5-7 original takes on what matters in AI, Marketing, and Business Growth. No spam, no fluff, unsubscribe anytime.