Two Signals on AI Visibility Worth Reading Together

Dan Toma·May 12, 2026·4 min read

Key Takeaway

Two pieces of research dropped within 48 hours of each other this week, one from Ahrefs and one from SparkToro. Together they say the current playbook for measuring brand presence inside AI search is mostly noise, and the metrics most teams have been chasing for the last 12 months do not move what they think they move.

Two separate research releases dropped within 48 hours of each other this week. Read separately they look like routine SEO industry pieces. Read together they describe one structural problem: the playbook most marketing teams adopted in 2025 for measuring AI visibility was built on assumptions the evidence does not support.

Signal One: Schema Did Not Move AI Citations

Ahrefs published a controlled study this week that tested whether adding JSON-LD schema markup to pages improved their citation rates inside Google AI Overviews, Google AI Mode, and ChatGPT. The methodology was clean enough to take seriously. They examined 1,885 pages that added schema and matched each against three control pages from different domains that never adopted it. They measured citation changes across a 30-day window before and after.

The results: Google AI Overviews dropped 4.6%, Google AI Mode rose 2.2%, and ChatGPT rose 2.4%. All three were statistically indistinguishable from random variation. Pages that already had structured data were three times more likely to be cited overall, but the controlled test showed that schema itself was not the causal factor. Sites that invest in schema typically invest in better content and earn more links. The schema correlates with the citation. It does not cause it.

A parallel searchVIU experiment quoted in the piece tested how five AI systems retrieve content in real time and found that all five extracted only visible HTML, ignoring JSON-LD, Microdata, and RDFa entirely. The implication is operational. If you have been allocating SEO time to schema additions because someone told you it improves AI citation, the cold data says you are spending against a placebo.

This does not mean schema is dead. Structured data still matters for traditional rich results and indexing efficiency. It means it is not the lever for AI search visibility your team has been told to pull.

Source: Schema Markup Didn't Move AI Citations In Ahrefs Test, Search Engine Journal.

Signal Two: Ranking Inside LLMs Is Not a Stable Metric

SparkToro published the recording from an Office Hours session this week with the cleanest articulation I have seen on what is actually measurable inside LLM-generated answers. The headline finding is that ranking, the position your brand appears in inside an AI response, is structurally noisy and not worth optimizing against.

SparkToro ran the same prompt 100 times across ChatGPT, Claude, and Google AI. Each run produced a different list of brands. The math worked out to roughly 124 repetitions before the same two-brand combination appears twice in the same order. Real users, of course, do not ask the same prompt 100 times. They ask different prompts entirely. The semantic similarity between how two users phrase the same underlying need averaged 0.08, which means the inputs hitting the model are almost completely different even when the intent is identical.

Rand Fishkin's argument, restated bluntly: ranking is a bunch of baloney. The metric that holds up under repeated measurement is visibility percentage, meaning how often your brand appears at all across a wide enough sample of user-realistic prompts. Position within the response is essentially random for most categories. Narrow categories like cloud computing showed more consistency. Broad categories like consumer products showed almost none.

From building this measurement infrastructure at GEOflux.ai, I can confirm the pattern at scale. Visibility percentage is stable across runs. Position is not. Teams that built their AI visibility dashboards around ranking are looking at noise dressed up as signal.

Source: Office Hours: Can You Actually Track AI Visibility?, SparkToro.

What These Two Signals Mean Together

Read separately, the two pieces look unrelated. Read together they identify the same gap. The AI visibility tooling category, which sprouted in 2024 and 2025 to give marketing teams something to point at, mostly imported metrics from the traditional SEO playbook without testing whether those metrics survive contact with how LLMs actually retrieve, rank, and present sources.

The operational fix is straightforward. Stop measuring schema additions as an AI visibility lever. Stop reporting ranking position inside LLM answers as a metric. Start measuring visibility percentage across a representative prompt set sized for your category. Track how often your brand appears, not where it appears when it does.

The teams that adjust their AI visibility reporting in the next quarter will be measuring something real. The teams that keep optimizing for schema and ranking will be measuring something that does not move.

Source: Schema Markup Didn't Move AI Citations + Can You Actually Track AI Visibility?

FAQ

If schema markup does not improve AI citations, should I remove it from my site?

No. Schema still matters for traditional SEO, rich results, and indexing efficiency. The Ahrefs test simply showed that adding schema to pages already cited by AI did not increase their citation rate. Keep the schema you have, but stop allocating new SEO budget to expanding schema coverage as a strategy for AI search visibility.

What is visibility percentage and how do I measure it?

Visibility percentage is the rate at which your brand appears in AI-generated answers across a representative sample of user-realistic prompts. To measure it credibly, you need a prompt set sized for your category, run repeatedly across the major LLMs, with consistent prompt phrasing variation that mirrors how real users actually ask questions. Counting position is not useful because position is noisy. Counting presence is stable.

Why are ranking metrics inside AI search so unstable?

LLMs are non-deterministic. The same prompt produces different outputs across runs, especially when the underlying retrieval set is large and the category is broad. SparkToro's research showed it takes around 124 repetitions to see the same two-brand combination appear in the same order. Real user prompts vary even more, with semantic similarity averaging 0.08 between two users with the same intent. Position is essentially random noise in most categories.

Back to Newsletter

Subscribe to The Weekly Vibe

Every Tuesday. 5-7 original takes on what matters in AI, Marketing, and Business Growth. No spam, no fluff, unsubscribe anytime.