AEO2026-03-28· 8 min· By Michael Saad

Measuring AEO visibility across four platforms, weekly.

A client asked us if they were showing up in AI search. We didn't have a straight answer. Nobody did. So we built a methodology: query sets, four-dimension scoring across ChatGPT, Perplexity, Google AI Overviews, and Claude, weekly cadence. Then we ran it on real accounts. Here's what came out.

A client asked us something we couldn't answer.

"Are we showing up in AI search?"

Simple question. Reasonable question. The kind of question a business owner in 2026 should be able to get a straight answer to from their marketing agency.

We did not have a straight answer. Nobody did. There was no standard methodology, no established scoring framework, no consensus on what "showing up" even meant across four platforms that each worked differently, weighted sources differently, and updated on completely different schedules.

So we built one.

Why the old tools don't work here

Traditional search visibility is measurable with reasonable precision because it is deterministic. Query a keyword. Record a position. Run it again tomorrow and you get the same result.

AI answer engines are not deterministic. Run the same query twice on ChatGPT in the same session and you get two different responses. Perplexity pulls different sources than it did yesterday because its index updated overnight. Google AI Overviews varies its response based on query phrasing in ways that are not fully predictable even with identical intent. Claude synthesizes from different angles depending on how the question is framed.

This means the entire framework of traditional SEO tracking, keyword ranking, position monitoring, SERP snapshot, does not apply. You cannot rank-track your way into understanding AI visibility because there is no stable rank to track. The question is not where you appear. The question is whether you appear, how often, in what context, and whether the citation is working for you or against you.

None of the tools that existed when that client asked us the question were built to answer it. We were measuring a new channel with instruments designed for a different one.

What we built instead

The methodology starts with a query set, not a single query.

For each client we define fifteen to twenty-five queries across three categories. Brand queries are the client's name, branded service names, and variations on how someone would search for them directly. Category queries are non-branded searches for the services the client offers in their primary markets. Problem queries are the questions a prospective client asks before they know which brand to call. "How do I find a [service type] in [city]." "What should I look for in a [provider]." These are the queries that happen at the moment someone is forming an opinion about who to hire.

Every query runs across all four platforms in the same weekly session: ChatGPT, Perplexity, Google AI Overviews, and Claude. For each platform we record whether the client was cited, what language was used if they were, where in the response the citation appeared, and whether the framing was positive, neutral, or qualified.

We do not run one query. We run the full set. The variance across fifteen to twenty-five queries on the same platform in the same session is the signal. A single query from a non-deterministic system is noise.

The scoring model

One hundred points per platform per week across four dimensions.

Citation rate accounts for forty points: the percentage of queries in the set that produced a brand citation. Quality score accounts for thirty: how the citation appeared, whether it was decision-driving or decorative. Query distribution accounts for twenty: whether citations are concentrated in brand queries only or spread across category and problem queries. Brand-only citations mean name recognition. Problem-query citations mean top-of-funnel presence. Trend accounts for ten: week-over-week movement, because a brand improving from 60 to 65 is doing something different than one static at 65 for two months.

We track all four platforms separately. We report a composite. We never collapse them into one number because the platforms are not equivalent and the optimization levers for each are not the same. A brand invisible on ChatGPT but well-cited on Perplexity has a specific problem with a specific fix. Averaging those into one score buries the diagnosis.

The Univision baseline

Univision Computers in DeLand, Florida sells and services technology. We established their AEO baseline in Week 1 of the SEO Foundation engagement.

Composite score: 22 out of 100.

ChatGPT: zero citations across twenty queries. Zero. No presence in any query category. Perplexity: two citations, both on direct brand queries, none in category or problem searches. Google AI Overviews: three mentions, all in list-format responses with no differentiation from competitors. Claude: one citation in a geographic services query.

This is a normal baseline for a local service business that has not optimized for AI visibility. It is not a failure. It is a starting point with a clear diagnosis.

The corrective work is direct: structured local data, NAP consistency across every source AI models pull from, review volume and recency, cited presence on the directories and platforms that AI models treat as authoritative, and content that directly answers the problem-query layer. FAQ content that answers the questions a prospective customer asks before they know who to call.

Priority actions for Univision: DeLand citations completed, Azure URL redirect resolved to consolidate crawlable content, structured FAQ content targeting problem queries. First meaningful movement in Perplexity and AI Overviews expected by Week 8. ChatGPT has the longest index lag. Local brands typically see first citations in the three to five month range.

We will publish the Week 8 data here.

What we learned building this

The variance finding was the most important thing that came out of the first several weeks of running this methodology across multiple accounts.

For a brand scoring in the 20-35 range, any given single query returns a citation roughly two to four times out of ten. Running that one query and getting a no-citation result tells you nothing that wasn't already in the base rate. The only way to distinguish a real visibility gain from natural response variance is to run enough queries in the same session to produce a meaningful sample.

The platform independence finding was the second important thing. A brand visible in Perplexity at 40% citation rate may score 15% on ChatGPT for the same queries. This happens because the retrieval architectures are different, the source weighting is different, and the types of content each platform treats as authoritative are different. Treating them as one measurement is like treating organic search and paid search as one channel. They are related. They are not the same thing.

The first-mover implication is the finding that matters most for clients who are asking the question late.

Citation patterns, once established, are difficult to displace. An AI model that has learned to cite your competitor for a category of queries will continue citing them unless something changes the underlying signals: their authority weakens, your authority grows, or the model is significantly updated. The window for establishing first-mover citation patterns in most local and regional markets is open now. It will not stay open indefinitely.

Where this is going

Running this methodology manually takes approximately forty-five minutes per client per session. We have templated the logging and scoring to reduce the friction. It is still a workflow problem at scale.

We are building the automated version. A query pipeline that runs the full query set against all four platforms, scores each response against the framework, and delivers a structured report. Not a dashboard to log into. A scored PDF that arrives and tells you exactly where you stand, what changed since last week, and what is worth doing about it.

The methodology is the hard part. The tooling to scale it is the natural next step. We validated the methodology on real accounts with real baselines. The automation comes after the methodology earns it.

That is the sequence that produces something worth building. Not the other way around.

Want to apply this?

Run an AEO Scan against your own stack.

Free written read of your visibility across ChatGPT, Claude, Perplexity, and Google AIO in 24 hours. Same diagnostic we run on every new engagement.