PDF Remediation

Six Strategies. One Goal:
Every PDF Accessible.

One platform with six built-in fix strategies. OCR, AI vision, structure rebuilds — everything is included. The system automatically selects the right approach for each document and escalates until Section 508 verification passes or review is required.

2,400+ PDFs Fixed
99%+ Compliance
6 Fix Strategies

The 6-Stage Remediation Pipeline

1

Quick Fix

Repairs document metadata, language tags, structure tree, and PDF/UA identifiers. Handles the most common compliance failures — missing titles, incorrect role mappings, broken tag hierarchies — without altering page content.

2

OCR Remediation

Extracts text from scanned or image-heavy PDFs using optical character recognition, then rebuilds the document with a proper structure tree, tagged content, and reading order. Turns flat images into searchable, screen-reader-accessible documents.

3

Vision AI Analysis

When structural repairs aren’t enough, a vision model analyzes each page visually — understanding headings, tables, lists, and reading flow from the rendered layout. Generates a semantic structure informed by what a human reader would see, not just what the file’s byte stream contains.

4

Structure Rebuild

For documents with deeply corrupted tag trees, the structure is linearized and rebuilt from scratch. Content is re-extracted, re-tagged, and re-validated, preserving the original appearance while replacing the accessibility layer entirely.

5

Strip & Rebuild

The most aggressive structural fix. All existing markup, annotations, and metadata are stripped to raw content streams, then rebuilt with a clean structure tree, proper artifact wrapping, and Section 508 validation evidence — effectively a fresh accessibility pass on the original visual content.

6

Render & Rebuild

The last resort for documents that resist all other strategies. Each page is rendered to a high-fidelity image, then OCR and vision AI reconstruct the document from pixels up — new text layer, new structure, new metadata. The visual appearance is preserved pixel-for-pixel.

Adaptive Escalation, Not Brute Force

Fix
Validate
?
Pass?
Done
Escalate

The system validates after every stage using the same 104-rule Matterhorn Protocol engine used by national archives. If a fix doesn’t satisfy the Section 508 gate, the next useful strategy takes over — each one more powerful than the last.

Clause-Targeted Refinement

After initial remediation, remaining failures are analyzed by specific PDF/UA clause. The system generates targeted fixes for each individual violation, re-validates, and iterates until Section 508 verification passes or the document is flagged for human review.

This iterative, clause-level approach is what pushes compliance from ~90% to 99%+. Instead of applying broad fixes that may introduce new issues, each iteration addresses exactly the violations that remain — surgical precision at scale.

What You Get Back

.zip Archive or Individual Downloads

Every verified PDF available as a single .zip download or individual file links. Each is a drop-in replacement — same appearance, now with Section 508 validation evidence.

Compliance Verification Reports

Each PDF includes a validation report showing pass/fail against all 104 Matterhorn Protocol rules — documentation for auditors, legal, and procurement.

Ongoing Monitoring

Weekly rescans detect new PDFs and regressions automatically. New documents enter the same pipeline — compliance is maintained, not just achieved once.

All six strategies — including OCR, AI vision analysis, and full page rendering — are built into one platform. The Agent uses deeper strategies only when validation evidence shows they are useful.

Proven at Scale

2,400+ PDFs fixed and verified
7,000+ PDFs discovered & scanned
99%+ Section 508 compliance

Every fix verified against veraPDF, the same validation engine used by the Library of Congress and European national archives.

Ready to Make Your PDFs Accessible?

Start a free scan and see how many of your PDFs need remediation.