PDF Accessibility Guide: Tags & PDF/UA

PDFs are the quiet accessibility problem inside almost every organization. Websites get audited, redesigned, and tested with screen readers — but the annual report, the policy document, the benefit statement, and the application form that live behind a download link are too often shipped exactly as they came out of the export dialog. To a sighted reader they look polished. To someone using a screen reader, a magnifier, or keyboard-only navigation, the very same file can be an impenetrable wall: no headings to jump between, images with no description, tables that read as a meaningless stream of numbers, and form fields that cannot be filled in at all.

This guide explains why PDFs are so frequently inaccessible and what actually makes one usable by assistive technology. It covers the structural building blocks — tags, reading order, alternative text, tables, forms, and metadata — and the standards that govern them: WCAG 2.2 and PDF/UA, the ISO 14289 specification for accessible tagged PDFs. Throughout, the goal is the one QualiBooth applies to every document we touch: a file that works in practice, confirmed with real assistive technology, not just blessed by an automated checker.

Why PDFs are so often inaccessible

A PDF is, at heart, a description of how to paint marks on a page. The format was designed to preserve visual fidelity — to make a document look identical on any screen or printer. That design goal is exactly what makes accessibility hard. Visual fidelity tells you nothing about meaning. A line of 18-point bold text looks like a heading to a human eye, but unless the file explicitly records “this is a heading,” assistive technology has no way to know it is anything other than some larger glyphs.

Most PDFs in circulation are untagged. They contain the visual content but none of the underlying structure — no information about what is a heading, a paragraph, a list, a table, or an image. A screen reader confronted with an untagged PDF either refuses to read it meaningfully or falls back on guesswork, inferring a reading order from the position of marks on the page. The results range from awkward to unusable: a two-column newsletter read straight across both columns, a caption read before the paragraph it belongs to, or footnotes interrupting the middle of a sentence.

Several common production habits make things worse:

Scanned documents. A scan is just an image of a page. Without optical character recognition (OCR), there is no real text at all — nothing to read, search, or select.
Exports that drop structure. Many “Save as PDF” and “Print to PDF” paths discard the heading and list structure that existed in the source document.
Design-tool layouts. Files built in page-layout software can have visually correct pages whose underlying object order bears no relation to the intended reading sequence.
Decorative clutter. Background images, rules, and ornaments get exposed to assistive technology and announced as if they carried meaning.

None of this is visible on screen, which is precisely why the problem persists. The fix is to add the structural layer the format leaves optional — the work of PDF remediation.

Tags and document structure

Tags are the foundation of an accessible PDF. A tagged PDF carries a hidden hierarchy — the structure tree — that sits alongside the visual content and describes what each piece of the page actually is. This is directly analogous to the semantic HTML behind a well-built web page: where HTML uses <h1>, <p>, <ul>, and <table>, a tagged PDF uses structure elements such as <H1>, <P>, <L> (list), and <Table>.

The tag tree is what gives assistive technology something to navigate. With it in place, a screen reader can do the things its users rely on:

Jump by heading. Users move through a long document heading to heading rather than listening to every word in sequence. This requires real heading tags (<H1> through <H6>) applied in a logical, nested order — never skipping levels, never faking a heading by bolding a paragraph.
Understand lists. A <L> tag with its <LI> items tells the screen reader “this is a list of five items,” so the user knows where they are and how much remains.
Distinguish content from decoration. Genuine content gets tagged; purely decorative marks are designated as artifacts so they are skipped entirely.

A correct, logically nested heading structure is the single highest-impact thing you can get right in a PDF, because it transforms a linear listening experience into a navigable one. Getting it wrong — or omitting it — is one of the common accessibility issues that surfaces again and again in document audits.

Reading order

Tags say what each element is. Reading order says in what sequence those elements are presented to someone who cannot see the page. The two are related but distinct, and reading order is where many otherwise well-tagged PDFs fall down.

A screen reader announces content in the order defined by the document’s structure, not in the order the marks happen to sit in the file. In a single-column document the two usually align. In anything more complex — multi-column layouts, sidebars, pull quotes, captions, text wrapping around an image — they frequently diverge. The visual eye reorders content effortlessly; assistive technology follows the order it is given, and if that order is wrong the meaning collapses.

Good reading order means the content is announced in the sequence a sighted reader would naturally follow: the headline before the body, the introduction before the sidebar, a caption after the figure it describes. Setting it correctly is a manual judgement about how the document is meant to be read, which is why automated tools alone cannot guarantee it. It is one of the core deliverables of professional PDF remediation, and one of the first things experienced testers check.

Alternative text for images

Every image that carries information needs a text equivalent so it can be described to people who cannot see it. The principles are the same as for the web, applied through PDF tags.

Informative images — charts, diagrams, photographs that convey meaning, infographics — need concise, accurate alternative text that communicates the same information the image does. For a chart, that often means summarizing the takeaway (“Revenue grew 12% year over year”) rather than describing the visual (“a bar chart in blue”).
Complex images — a detailed process diagram or a data-heavy figure — may need both short alt text and a longer description, or the underlying data presented in an accessible form elsewhere in the document.
Decorative images — borders, background textures, ornamental dividers, a logo repeated in a footer — should be marked as artifacts so assistive technology skips them. Forcing a screen reader to announce “image, image, image” for decoration is its own accessibility failure.
Text inside images — a graphic of a quote, a scanned letterhead, a button image with a label — must have that text captured, either as alt text or, better, as real selectable text.

Writing good alt text is a content task, not a technical one. It requires understanding what the image is for in its context — the same skill our accessibility consulting team brings to web content.

Accessible tables

Tables are where PDF accessibility gets genuinely difficult, and where automated exports fail most often. A data table communicates meaning through the relationship between a cell and its row and column headers. Sighted readers reconstruct those relationships visually by glancing up and to the left. A screen-reader user cannot — they depend on the table being marked up so that header associations are explicit.

An accessible PDF table needs:

A proper <Table> structure containing <TR> (rows), <TH> (header cells), and <TD> (data cells), rather than a loose grid of text positioned to look like a table.
Header cells correctly identified, with scope (row or column) where the table layout requires it, so that as a user moves through the data the relevant headers are re-announced (“Q3, Revenue, 1.2 million”).
Sensible handling of merged or spanned cells, which complicate the header relationships and frequently confuse automated tooling.

A common anti-pattern is the layout table — a grid used purely to position content visually, with no real data relationships. Layout tables should not be tagged as tables at all, because doing so forces assistive technology to announce phantom rows and columns. Distinguishing a data table from a layout artifact, and then encoding the right relationships, is detailed manual work that benefits enormously from review by people who actually use screen readers every day.

Accessible PDF forms

Forms are the highest-stakes documents an organization publishes, because they are transactional: an application, a claim, a consent, a registration. If a PDF form cannot be completed with assistive technology, the person is not merely inconvenienced — they are excluded from a service.

An accessible PDF form requires:

Labelled fields. Every field — text input, checkbox, radio button, dropdown — needs an accessible name (a tooltip/label in PDF terms) so a screen reader announces what the field is for, not just “edit text.”
Logical tab order. Keyboard users move through fields with Tab. The tab order must follow the visual and logical flow of the form, not the order fields were added in the editor.
Grouped controls. Related radio buttons and checkboxes should be grouped so their shared question is announced once and the options are understood as a set.
Required fields and instructions. Mandatory fields, formatting requirements, and error guidance must be conveyed in text, not only by color or visual cues.
Full keyboard operability. Every field must be reachable and operable without a mouse.

Forms sit at the intersection of structure, interaction, and content, which makes them the part of PDF work where doing it properly matters most. The same discipline applies to other transactional documents — it is closely related to the care needed for accessible email, where structure and labelling determine whether a message can actually be used.

Language, title, and metadata

Some of the most impactful PDF fixes are also the smallest. A handful of document-level properties materially change how assistive technology handles a file.

Document language. The PDF must declare its primary language (for example, en-GB) so a screen reader uses the correct pronunciation rules. A French paragraph read with English phonetics, or vice versa, is barely intelligible. Passages in a different language from the main document should carry their own language markers.
Document title. PDF metadata should include a meaningful title, and the viewer should be set to display that title rather than the file name. “Annual Accessibility Report 2026” is announced and shown; “final_v3_FORWEB.pdf” is not.
Tab-and-bookmark navigation. Bookmarks (the document outline) give all users — and especially those navigating non-visually — a way to jump to major sections of a long document.
Tagged-PDF and clean metadata flags. The file should be marked as a tagged PDF and carry consistent, accurate metadata.

These properties take minutes to set and are required for conformance, yet they are skipped in the vast majority of published PDFs.

WCAG 2.2 and PDF/UA (ISO 14289)

Two standards govern accessible PDFs, and they work together rather than competing.

WCAG 2.2 is the technology-agnostic baseline for digital accessibility. Its success criteria — text alternatives, info and relationships, meaningful sequence, contrast, keyboard operability, and the rest — apply to PDFs just as they apply to web pages. WCAG 2.2 is the standard most laws point to, and the W3C publishes specific techniques for satisfying WCAG with PDF features (tagging headings, providing alt text, defining reading order, and so on). If you are working through general conformance, our guide to making content WCAG compliant and the WCAG compliance overview both apply directly to documents.

PDF/UA — formally ISO 14289 — is the technical specification for accessible PDF. Where WCAG describes outcomes (“provide text alternatives”), PDF/UA prescribes exactly how a PDF must be constructed to be a correctly tagged, machine-readable, accessible document: which structure types to use, how the tag tree must be formed, how artifacts must be marked, and how forms and tables must be encoded. The two are complementary — the most robust approach is to remediate against PDF/UA’s technical requirements while validating user-facing outcomes against WCAG 2.2.

Conformance to these standards is what underpins legal obligations across jurisdictions. PDFs published by covered organizations fall squarely within the European Accessibility Act, the ADA, and Section 508, all of which treat downloadable documents as part of the digital experience that must be accessible.

Remediating existing PDFs vs authoring accessible ones

There are two routes to accessible PDFs, and most organizations need both.

Remediating existing PDFs means taking a finished file — a report, a back catalogue of statements, a scanned form — and adding or correcting the accessibility layer: running OCR where needed, building the tag tree, setting reading order, writing alt text, fixing tables, and labelling form fields. Remediation is essential when the source files are gone, when documents were produced by third parties, or when you have a published archive that needs to be brought into conformance. Crucially, remediation changes the underlying structure, not the visual design — the document looks identical and becomes usable for everyone. This is the core of QualiBooth’s PDF remediation service, which scopes batches by importance and reach and prioritizes the documents that matter most first.

Authoring accessible PDFs means building accessibility into the production process so documents are born accessible. That involves using real heading styles, list styles, and alt text in the source application; designing tables as data tables; setting language and title; and choosing an export path that preserves the tag tree. Authoring accessibly is dramatically cheaper than repairing the same document later, and it is the only sustainable answer for organizations that publish PDFs continuously.

The two approaches are not either/or. The practical pattern is to remediate the documents already in the wild while fixing the upstream process so new documents do not recreate the problem. Embedding that change is exactly what accessibility process improvement addresses — turning accessible publishing from a one-off project into the default way your team works. A broader view of where document and web work fit together is laid out in our accessibility services overview.

Validating with screen readers — and why overlays don’t help

A PDF is only accessible if it actually works for the people who depend on it. That is why validation cannot stop at an automated checker. Tools that scan a PDF against PDF/UA rules are valuable — they catch missing tags, undefined languages, and structural errors at scale — but they verify the presence of structure, not its quality. An automated tool can confirm that an image has alt text; it cannot tell you the alt text is wrong. It can confirm a heading exists; it cannot tell you it is nested at the wrong level.

Real validation combines both:

Automated checking to catch structural and metadata failures broadly and consistently. Software like the QualiBooth accessibility scanning platform excels at flagging machine-detectable issues across large volumes.
Manual testing with assistive technology — navigating the document with a screen reader, moving by heading, reading tables, tabbing through a form — to confirm the experience is coherent. This is the only way to verify reading order, alt-text quality, and form usability. Our manual audit methodology explains why human testing is irreplaceable, and audits conducted by people with disabilities surface problems that no checker and no sighted tester would ever notice.

A word of caution on shortcuts. Accessibility overlays — third-party scripts or widgets that claim to fix accessibility automatically — do not solve PDF accessibility, and QualiBooth does not endorse them. They cannot author a correct tag tree, judge reading order, or write meaningful alt text, because those tasks require understanding the document’s content and intent. There is no automated substitute for proper remediation. Genuine PDF accessibility comes from correct structure plus human verification — the approach behind our PDF remediation work.

Frequently asked questions

Is an untagged PDF ever acceptable? No. An untagged PDF is inaccessible to assistive technology by definition and fails both WCAG 2.2 and PDF/UA. Any PDF you publish for the public or for employees should be tagged.

Does making a PDF accessible change how it looks? No. Remediation adds and corrects the hidden structural layer — tags, reading order, metadata — without altering the visual design. The page looks identical.

Should I just provide an HTML version instead of an accessible PDF? An accessible HTML alternative is often the better experience and is worth offering. But if you publish the PDF, the PDF itself must be accessible — an HTML alternative does not exempt the document from conformance requirements.

Can scanned documents be made accessible? Yes, but they must be OCR’d first to create real text, after which the normal remediation steps — tagging, reading order, alt text, tables — apply.

How do I keep new PDFs accessible without remediating each one? Fix the authoring process: use real styles and alt text in the source, design proper data tables, set language and title, and export through a path that preserves tags. Pairing remediation with process improvement makes accessible documents the default.

Conclusion

PDF accessibility is not an optional polish step — it is the difference between a document everyone can use and one that silently excludes the people who rely on assistive technology. The work is concrete and well understood: tag the structure, set a correct reading order, describe images, encode tables and forms properly, declare language and title, and validate the result against WCAG 2.2 and PDF/UA with real screen readers as well as automated tools. Remediate the documents you already publish, fix the process that produces new ones, and skip the overlay shortcuts that promise accessibility without delivering it.

If your reports, statements, brochures, or forms have never been checked, that is the place to begin. You can start with a free accessibility scan, request a demo of the QualiBooth platform, or talk to our team about PDF remediation for a single critical document or an entire back catalogue.