Two separate research releases dropped within 48 hours of each other this week. Read separately they look like routine SEO industry pieces. Read together they describe one structural problem: the playbook most marketing teams adopted in 2025 for measuring AI visibility was built on assumptions the evidence does not support.
Signal One: Schema Did Not Move AI Citations
Ahrefs published a controlled study this week that tested whether adding JSON-LD schema markup to pages improved their citation rates inside Google AI Overviews, Google AI Mode, and ChatGPT. The methodology was clean enough to take seriously. They examined 1,885 pages that added schema and matched each against three control pages from different domains that never adopted it. They measured citation changes across a 30-day window before and after.
The results: Google AI Overviews dropped 4.6%, Google AI Mode rose 2.2%, and ChatGPT rose 2.4%. All three were statistically indistinguishable from random variation. Pages that already had structured data were three times more likely to be cited overall, but the controlled test showed that schema itself was not the causal factor. Sites that invest in schema typically invest in better content and earn more links. The schema correlates with the citation. It does not cause it.
A parallel searchVIU experiment quoted in the piece tested how five AI systems retrieve content in real time and found that all five extracted only visible HTML, ignoring JSON-LD, Microdata, and RDFa entirely. The implication is operational. If you have been allocating SEO time to schema additions because someone told you it improves AI citation, the cold data says you are spending against a placebo.
This does not mean schema is dead. Structured data still matters for traditional rich results and indexing efficiency. It means it is not the lever for AI search visibility your team has been told to pull.
Source: Schema Markup Didn't Move AI Citations In Ahrefs Test, Search Engine Journal.
Signal Two: Ranking Inside LLMs Is Not a Stable Metric
SparkToro published the recording from an Office Hours session this week with the cleanest articulation I have seen on what is actually measurable inside LLM-generated answers. The headline finding is that ranking, the position your brand appears in inside an AI response, is structurally noisy and not worth optimizing against.
SparkToro ran the same prompt 100 times across ChatGPT, Claude, and Google AI. Each run produced a different list of brands. The math worked out to roughly 124 repetitions before the same two-brand combination appears twice in the same order. Real users, of course, do not ask the same prompt 100 times. They ask different prompts entirely. The semantic similarity between how two users phrase the same underlying need averaged 0.08, which means the inputs hitting the model are almost completely different even when the intent is identical.
Rand Fishkin's argument, restated bluntly: ranking is a bunch of baloney. The metric that holds up under repeated measurement is visibility percentage, meaning how often your brand appears at all across a wide enough sample of user-realistic prompts. Position within the response is essentially random for most categories. Narrow categories like cloud computing showed more consistency. Broad categories like consumer products showed almost none.
From building this measurement infrastructure at GEOflux.ai, I can confirm the pattern at scale. Visibility percentage is stable across runs. Position is not. Teams that built their AI visibility dashboards around ranking are looking at noise dressed up as signal.
Source: Office Hours: Can You Actually Track AI Visibility?, SparkToro.
What These Two Signals Mean Together
Read separately, the two pieces look unrelated. Read together they identify the same gap. The AI visibility tooling category, which sprouted in 2024 and 2025 to give marketing teams something to point at, mostly imported metrics from the traditional SEO playbook without testing whether those metrics survive contact with how LLMs actually retrieve, rank, and present sources.
The operational fix is straightforward. Stop measuring schema additions as an AI visibility lever. Stop reporting ranking position inside LLM answers as a metric. Start measuring visibility percentage across a representative prompt set sized for your category. Track how often your brand appears, not where it appears when it does.
The teams that adjust their AI visibility reporting in the next quarter will be measuring something real. The teams that keep optimizing for schema and ranking will be measuring something that does not move.
