Skip to main content
Skip to docs content

Accessibility Testing Strategy

Building accessible software requires more than good intentions. It requires a structured testing program that catches issues early, measures progress over time, and balances automation with human judgment. This guide helps teams build a comprehensive accessibility testing strategy. It covers what to automate, what requires manual testing, and how to measure progress. Whether you are starting from scratch or maturing an existing program, the framework here scales from a single developer running lint rules to a full organization with dedicated accessibility champions and user testing panels.

Shift-Left Testing

The principle behind shift-left testing is simple: the earlier you catch a problem, the cheaper it is to fix. An accessibility issue caught during design review might take five minutes to resolve. The same issue found in production after launch could require a full sprint of rework, affect real users in the meantime, and carry legal risk. Every stage in your development pipeline presents an opportunity to catch different categories of accessibility issues.

The key insight is that each stage catches different types of problems. Design review catches structural issues (missing focus states, insufficient contrast in mockups, unclear interaction patterns). Development catches implementation issues (missing ARIA attributes, broken keyboard navigation). Code review catches patterns that automated tools miss (whether alt text is actually meaningful, whether focus management makes sense in context). CI catches regressions. Staging catches issues that only appear when components interact. Production monitoring catches issues that slip through everything else.

StageWhat It CatchesCost to Fix
Design ReviewMissing focus states, contrast issues in mockups, unclear interaction patterns, missing keyboard flowsVery low
DevelopmentMissing labels, broken ARIA, heading hierarchy issues, form association errorsLow
Code ReviewMeaningless alt text, poor focus management logic, incorrect live region usageLow-Medium
CI PipelineRegressions in screen reader output, broken heading structure, missing accessible namesMedium
StagingComponent interaction issues, focus loss between views, announcement timing problemsMedium-High
ProductionReal-world assistive technology incompatibilities, edge cases in user flowsHigh

A comprehensive testing strategy covers all six stages. Most teams start from the right side (reacting to production issues) and gradually shift left over time. The goal is not to eliminate later-stage testing. It is to ensure that most issues are caught before they reach users. Even a mature team will still find issues in production that slipped through earlier checks, but those should be rare exceptions, not the norm.

What to Automate

Automated accessibility testing tools typically catch between 30% and 50% of accessibility issues. That number might sound low, but the issues they catch are the ones that appear most frequently and are easiest to prevent with consistent tooling. Automation is not a replacement for manual testing: it is a safety net that catches the low-hanging fruit so your manual testing time can focus on the complex, nuanced problems that require human judgment.

The types of issues best suited to automation include:

  • Missing form labels: inputs without associated labels or aria-label attributes
  • Broken heading hierarchy: skipping heading levels (h1 → h3) or multiple h1 elements
  • Color contrast violations: text that does not meet WCAG minimum contrast ratios
  • Invalid ARIA usage: roles with missing required attributes, invalid state values
  • Focus order structure: positive tabindex values, focusable elements hidden from the tab order
  • Missing accessible names: buttons, links, and images without discernible text
  • Landmark structure: missing main landmark, duplicate landmarks without labels

How Tools Complement Each Other

No single tool covers everything. A strong automated strategy combines tools that approach accessibility from different angles:

Speakable: Screen Reader Output Prediction

Shows exactly what screen reader users will hear when they navigate your content. This catches issues that rule-based tools miss: an element might technically have an accessible name, but the announced output might be confusing, redundant, or missing context. Speakable makes invisible output visible, so developers can evaluate quality (not just presence) of accessible content.

axe-core: Rule-Based Violation Detection

Checks HTML against a comprehensive ruleset based on WCAG success criteria. Flags definitive violations with high confidence. Strong at catching structural issues: missing roles, broken associations, invalid attributes. Reports issues with severity levels and links to remediation guidance.

Lighthouse: Overall Scoring and Auditing

Provides a high-level accessibility score and runs a subset of axe-core rules in the context of a full page load. Useful for tracking trends over time and giving stakeholders a quick health metric. Less granular than axe-core for individual component testing but valuable for page-level monitoring.

Used together, these tools form a layered defense: Speakable shows you what screen reader users actually experience, axe-core flags rule violations you might overlook, and Lighthouse provides a trend metric for stakeholders. None of them replaces manual testing with real assistive technology, but they catch the majority of common, preventable issues before code reaches users.

What Requires Manual Testing

The remaining 50-70% of accessibility issues require human judgment to identify. These are problems that automated tools cannot reliably detect because they depend on context, intent, timing, or subjective evaluation of user experience quality. Manual testing is not optional: it is where you catch the issues that matter most to real users.

1.

Focus Management in Dynamic Interfaces

When a modal opens, does focus move to it? When it closes, does focus return to the trigger? When content is dynamically inserted, is focus handled appropriately? These flows depend on interaction sequences that static analysis cannot evaluate.

2.

Screen Reader Announcement Timing

Live regions need to announce at the right moment: not too early (before the user has context), not too late (after they have moved on), and not so frequently that they overwhelm. Timing and interruption behavior vary by screen reader and require real-world testing to validate.

3.

Cognitive Accessibility

Clear language, predictable behavior, consistent navigation, and logical page structure all require human evaluation. No tool can determine whether instructions are confusing or whether an interaction pattern is intuitive for the target audience.

4.

Complex Widget Keyboard Interaction

Custom widgets (comboboxes, date pickers, data grids, drag-and-drop interfaces) implement keyboard patterns that need to be tested holistically. Does the full interaction flow work? Are all states reachable? Do keyboard shortcuts conflict with assistive technology commands?

5.

Content Meaning and Context

Automated tools can check that an image has alt text. They cannot judge whether the alt text is meaningful, accurate, or appropriate for the context. A link can have text content but still be incomprehensible out of context. These quality judgments require human evaluation.

Recommendation: Combine manual testing with actual screen readers (NVDA on Windows, VoiceOver on macOS/iOS, TalkBack on Android) with periodic user testing sessions involving people with disabilities. Developers testing with screen readers catch implementation issues. User testing catches experience issues: problems that are technically correct but practically unusable.

Testing Frequency

Different types of testing belong at different cadences. Automated checks should run on every commit because they are fast and cheap. Manual testing should happen on a regular schedule because it is expensive but essential. User testing should happen quarterly because it requires coordination with external participants but provides insights that nothing else can replicate.

Every Commit

Run automated lint rules and accessibility checks. Include Speakable in CI to generate screen reader output snapshots for changed components. This catches regressions instantly: if a code change alters what a screen reader announces, you know about it before the PR merges.

Every Sprint

Manual screen reader walkthrough of new features and changed interactions. A developer spends 30-60 minutes navigating new UI with NVDA or VoiceOver, verifying focus management, announcement quality, and keyboard operability. Document findings and file issues for the next sprint.

Every Release

Full regression test across multiple screen readers (NVDA, JAWS, VoiceOver) covering core user flows. Test with different browsers. Screen reader behavior varies between Chrome, Firefox, and Safari. This catches cross-reader inconsistencies that per-commit automation cannot detect.

Quarterly

User testing with assistive technology users. Recruit participants who use screen readers, switch controls, voice navigation, or magnification in their daily workflow. Observe them completing real tasks. This reveals usability issues that even expert manual testers miss because they do not rely on assistive technology full-time.

CI/CD Integration Guide

Learn how to add Speakable to your CI pipeline for automated screen reader output testing on every commit.

Metrics to Track

Good metrics make progress visible and help teams prioritize. The right accessibility metrics tell you whether your testing strategy is working, where gaps exist, and whether you are improving over time. Avoid vanity metrics that create a false sense of security. Track metrics that drive action.

Automated Issues Per Component

Track the number of axe-core violations and Speakable warnings per component or page. Trend this over time. A healthy codebase shows this number decreasing or staying at zero for existing components.

Test Coverage Percentage

What percentage of your components have dedicated accessibility tests? This includes unit tests with testing-library queries that verify accessible names, integration tests that check keyboard navigation, and Speakable snapshots that track screen reader output.

Screen Reader Announcement Regression Rate

When using Speakable diffs in CI, track how often PRs introduce changes to screen reader output. Not all changes are regressions (some are improvements), but unexpected changes should trigger review. A high rate of unintentional changes suggests accessibility is not being considered during development.

Time to Fix Accessibility Bugs

Measure the time between an accessibility bug being filed and its fix being deployed. Set an SLA (for example: critical issues fixed within one sprint, moderate issues within two sprints). This metric reveals whether accessibility is being prioritized alongside other bugs.

Speakable Coverage

Track the number of pages and components tested with Speakable versus your total inventory. This shows how much of your application has verified screen reader output and where blind spots remain.

What Not to Track

Avoid tracking a single "compliance score" as your primary metric. Compliance is binary per WCAG checkpoint: you either meet a success criterion or you do not. A percentage score (like "87% accessible") is misleading because it obscures which specific requirements are unmet and creates false confidence. A page with a 95% score might still be completely unusable for screen reader users if the 5% failure is in navigation or form submission. Track specific, actionable metrics instead.

Team Maturity Model

Accessibility maturity does not happen overnight. Teams typically progress through distinct levels as they build knowledge, adopt tooling, and integrate accessibility into their workflow. Use this model to assess where your team is today and identify concrete next steps. Moving up one level at a time is more sustainable than trying to jump from Level 1 to Level 5 in a single quarter.

1

Reactive

Fix accessibility issues only when reported by users or flagged in audits. No automated testing in place. Team has basic awareness that accessibility matters but no systematic approach. Issues are treated as one-off bugs rather than symptoms of process gaps.

2

Aware

Lint rules for accessibility enabled in CI (eslint-plugin-jsx-a11y or equivalent). Team members occasionally test with keyboard navigation. Some training has happened. At minimum, developers know what ARIA is and why semantic HTML matters. Issues are tracked but fixes are not prioritized consistently.

3

Proactive

Speakable integrated in CI/CD pipeline to catch screen reader output regressions. Regular manual testing with screen readers (at least once per sprint). Accessibility is an explicit item in code review checklists. Team uses a dedicated testing checklist for new features. Issues have SLAs for resolution.

4

Integrated

Accessibility is part of the design process. Designers annotate mockups with focus order, heading levels, and ARIA states before handoff. Automated regression testing covers all critical user flows. User testing program with assistive technology users runs quarterly. Cross-reader output is tracked and compared across NVDA, JAWS, and VoiceOver.

5

Mature

Dedicated accessibility champion role embedded in the team. Continuous feedback loop with users who rely on assistive technology. Proactive pattern library with verified accessible components that the entire organization can use with confidence. Internal training program onboards new developers with accessibility fundamentals. The team contributes back to the accessibility community through shared tooling, documentation, or standards participation.

Building Your Strategy

Start where the impact is highest and the effort is lowest. Automation provides the quickest wins because it runs continuously without human effort once configured. From there, layer in manual testing, training, and user testing as your team matures. Here is a practical sequence for building your accessibility testing program from the ground up.

1.

Start With Automation

Enable eslint-plugin-jsx-a11y (or your framework equivalent) and axe-core in your test suite. These catch the most common issues with zero ongoing effort. Fix existing violations to establish a clean baseline, then treat new violations as CI failures.

2.

Add Speakable to Your CI Pipeline

Generate screen reader output snapshots for your components. When a PR changes what a screen reader announces, reviewers see the diff and can evaluate whether the change is intentional. This catches a category of regressions that lint rules miss entirely. See the CI/CD Integration guide for setup instructions.

3.

Train the Team on Screen Reader Basics

Every developer should be able to turn on a screen reader and navigate a page. This does not require expert proficiency: basic navigation (headings, landmarks, tab key, reading mode) is sufficient to catch most issues. Even 30 minutes of hands-on experience changes how developers think about their markup. See How Screen Readers Work for background knowledge.

4.

Build a Code Review Checklist

Create a lightweight checklist that reviewers reference during code review. Does the component have an accessible name? Is keyboard navigation handled? Are state changes announced to screen readers? A checklist ensures consistency and helps less-experienced reviewers catch issues. See the Testing Checklist for a ready-to-use template.

5.

Establish Quarterly User Testing

Partner with organizations that connect you with assistive technology users for usability testing. Services like AccessibilityOz, Fable, and Deque offer user testing panels. Budget for at least four sessions per year (one per quarter) to maintain a consistent feedback loop with real users.

This sequence is a starting point, not a rigid prescription. Some teams will skip straight to step 3 because they already have automation in place. Others will start with user testing because they have an immediate compliance deadline. Adapt the sequence to your context, but aim to cover all five areas within a year. The combination of automated checks, screen reader output tracking, team education, structured reviews, and user feedback creates a testing program that catches issues across the full spectrum of accessibility concerns.

Related Pages