There are now two well-instrumented studies that tell you what shape of content AI engines pick up. Both are primary research with stated methodology — not vendor blog handwaving.
This post compresses them into a checklist.
Study 1 — Princeton GEO (KDD 2024)
Aggarwal et al. ran a 10,000-query benchmark across a Perplexity-style retrieval engine and measured the Position-Adjusted Word Count lift from nine different content interventions. Citation: arXiv:2311.09735.
The headline result, verbatim from the paper (numbers are maximum lift observed on the highest-responsive query subset, not averages — the paper is explicit that effects vary by domain and position):
| Intervention | Position-Adjusted Word Count lift |
|---|---|
| Cite sources | up to +40% |
| Quotation addition | up to +40% |
| Statistics addition | up to +40% |
| Fluency optimization + Statistics addition (combined) | +5.5% over any single method |
| Cite sources on a position-5 result | +115% |
| Keyword stuffing (control) | negative — actively decreased visibility |
Translation: the three highest-leverage moves are (1) cite primary sources inline, (2) drop direct quotes from named authorities, and (3) include verifiable statistics. The marginal lift drops fast — adding all three doesn't give you +120%, it gives you ~+45%. Pick the right one per page. Median lift across all queries is significantly lower than the +40% headline — treat the table as ceiling, not expected value.
Domain-conditional findings:
- Statistics addition dominates on Law/Government and Opinion-type queries.
- Quotation addition dominates on People & Society, Explanation, and History.
- Cite sources is the highest-leverage move for lower-ranked pages — if you're at position 4-6 on Google for the underlying query, this is the +115% intervention.
Keyword stuffing was the only intervention with a negative Subjective Impression score. Old-school SEO over-optimization actively hurts in generative engines.
Study 2 — Profound, 30M ChatGPT citations (Sep 2025)
Profound analyzed 3M ChatGPT responses and 30M citations, 18,012 of which they verified by hand. Source. The hand-verified subset is 0.06% of the corpus — directional signal, not a controlled experiment, and the cohort skews toward Profound's own customer prompts. Treat the percentages below as patterns we've also seen in our own audit logs, not as a peer-reviewed measurement.
The unintuitive findings, verbatim:
- 44.2% of citations come from the first 30% of a document. They call this the "ski ramp" — steep drop after the first third, long tail to the bottom.
- 53% of citations come from the middle of paragraphs. Only 24.5% from first sentences and 22.5% from last sentences. Front-loading every key insight to the opening sentence is wrong. Uniform information density beats positional gymnastics.
- Proper-noun density of cited text averages 20.6% — vs typical English at 5-8%. A 3-4× concentration of named entities (brands, people, places, products).
- 78.4% of citations tied to questions came from headings in question form (H2/H3 as questions).
- Cited content was 2× more likely to contain a question mark anywhere.
- Flesch-Kincaid grade level of cited content averaged 16 vs 19.1 for non-cited. Plainer prose wins, not denser academic prose.
Translating the data into a checklist
If you want one paragraph that touches every variable both studies converge on:
Open with a named entity. Drop a statistic with its source link. Use a question as your next H2. Keep paragraphs uniform in density — don't dump everything in the first sentence. Aim for grade-16 prose, not grade-19.
Concretely, here's the checklist we run pages through inside our audit:
- Front 30% audit — does the first third of the page contain the named entities, statistics, and quotes you want cited? If they're in the conclusion, move them up.
- Proper-noun density — count proper nouns / total tokens. Below 12% is under-optimized. Above 25% reads like a press release.
- Question-form headings — at least one H2 should be a question. Aim for 30-50% of H2/H3 in question form.
- Statistic density — at least one verifiable, source-linked number per 500 words.
- Reading grade — target Flesch-Kincaid 14-18. Higher than 19 hurts you in retrieval reranking.
- Citation links — inline
<a>tags to primary sources, not vague "studies show" phrasing.
What does not move citations (despite vendor claims)
A pattern across 2026 vendor blogs is to claim 1.4-3.2× lifts for "freshness," "FAQ schema," "long-form content." Those numbers don't appear in primary research:
- "Long-form wins" — Ahrefs' Dec 2025 study found 53.4% of AI Overview citations go to pages under 1,000 words. Average cited length: 1,282 words. Length is not the lever; density and structure are.
- "FAQ schema lifts citations 44%" — no published methodology. Empty schema can actively hurt (our
detectHtmlMismatches()audit catches this). - "Freshness × 3.2" — fresh content does help, but the multiplier is closer to 1.2-1.4 in AI Overviews per the Ahrefs Q1 2026 data, not 3.2.
Optimize for what the primary studies measured. Skip claims with no methodology link.
Sources
- Princeton GEO paper (KDD 2024) — https://arxiv.org/abs/2311.09735
- ACM record — https://dl.acm.org/doi/10.1145/3637528.3671900
- Profound 30M citation study — https://www.tryprofound.com/blog/ai-platform-citation-patterns
- Ahrefs short vs long content — https://ahrefs.com/blog/short-vs-long-content-in-ai-overviews/
Want the checklist applied to your pages? Run a free audit — our citability analyzer scores the front-30% concentration, proper-noun density, question-heading ratio, and reading grade for every crawled page.