If the navigation is not visible, this link will take you to it.
Accessibility and the value of user testing
An in–depth look at a test methodology, by Tina Holmboe, Greytower Technologies (UK) Ltd., July 2005
Abstract
This article will attempt to sort out some of the issues regarding two fundamentally different methods of evaluating accessibility: automated assessments and user testing.
It will also debunk two common myths regarding the former method, and clarify some issues normally overlooked in regard to the latter.
Introduction
There has been, and is to this day, much controversy regarding how to test a website for accessibility issues.
Various methods have been suggested, such as for instance W3C's Preliminary Review, and EuroAccessibility's Evaluation Methodology for Accessibility Evaluation and Repair Tools; but the debate rages on, centring around one question and one statement:
“Is automated testing a valuable and/or useful tool when testing how accessible a website is?” and “Having the site audited by disabled people is the true test of accessibility”. We will look at both issues.
Automated Testing
Let me start on a positive note: automated accessibility testing is not only possible, but highly useful, well worth the effort, and an important part of any methodology.
It is, however, not a sinecure. The automation tool must be used with care, and the results evaluated thoroughly — a sentiment reflected in the article that inspired this work: GAWDS' Automated testing — How useful is it? by author Grant Broome.
While there is much to agree on in the GAWDS article, it does propagate the two major misunderstandings regarding accessibility testing: automated testing cannot provide a useful status on its own, and user testing is the only true measure of accessibility.
Quote Grant Broome:
One such vendor actually publishes a league table based purely on automated testing results leading customers to believe that the status of a website can be determined by the automated test alone! This is far from being true …
The inherent difficulty with such statements — both that made by the un–named vendor and by Mr. Broome — is that they are absolute, and by their very nature insensitive to the nuances of real life.
We’ll have a look at how close both claims come to being true, and start by proposing these definitions:
- Baseline: a predefined set of statements against which a website can be measured.
- One–point: the site is accessible at one point in its life, typically just prior to actual launch.
- Any–point: the site is accessible at any point during its life.
Several baselines have already been defined, such as the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) by the World Wide Web Consortium, Section 508 (508) by the United States government, and the See It Right (SiR) standard by the Royal National Institute of the Blind.
We also suggest these two statements as axioms:
- No automated tool can test all parts of a baseline if any of the checkpoints associated with it are subjective, such as for instance the readability of text.
- An automated tool can test all parts of a baseline if all checkpoints associated with it are objective, such as for instance the lack of alternative content.
Seen in relation to a baseline such as WCAG 1.0, no automated tool can test all parts, but it can test some parts. This, in turn leads us to conclude:
- An automated tool can determine the status of a website by reporting those objective checkpoints of a baseline that has been violated.
Should the automated test reveal that a site is sorely lacking in alternative content, a status has been determined, and would be a useful indication as to the health of the site.
Let us return, for a moment, to the GAWDS article:
Used properly, they will help to cut an auditor’s workload in half. That’s because automated testing tools identify around half of the problems with web pages
From this statement it would follow that an automated testing tool can create a status report covering fifty per cent of a baseline. It is reasonable to say that should half of the checkpoints be found violated, the site has an accessibility problem.
The flip side of this coin is, of course, that should none of the checkpoints fail, no conclusion as to the health of the site can be drawn, except that a manual check is needed in order to further analyse the situation.
An automated tool can be used to prove that a site is inaccessible, but cannot prove that the site is accessible. To do the latter, manual inspection is needed.
The usefulness of automated testing cannot, however, be argued. In summary:
- Checkpoints which cannotbe objectively measured can be flagged up for manual review. For instance, WCAG 1.0 checkpoint 12.1 states that frames should have useful titles. An automated tool can flag up all occurrences of frames on a site, including iframes, which could otherwise be difficult to do manually.
- After a site has been launched, and tested for accessibility problems using both a tool, and a manual approach, automated reviews can be run at appropriate intervals to maintain any–point accessibility.
- Comparing the output of a tool when run on a number of sites can help give a view on the overall state of, for instance, government websites. Such a comparison can, using automated tools, be run at intervals and is very useful for policy–makers and activists alike.
- Sites can be tested, and a preliminary or partial status reported, regardless of size. Only automated tools make it practical to test all pages on large sites.
User Testing
Before we can discuss the benefits and drawbacks of user–based testing, we must decide what it actually is. There are typically three different methods that go by the name:
- Polling. Here a number of users are asked their opinion as to whether a specific website is accessible. The sample is usually drawn from a population defined as "users with disabilities". This method does not include the use of a baseline.
- Expert assessment. One — or more — accessibility experts apply the site against the specified baseline and give their opinion as to its accessibility.
- User groups. A number of disabled users are asked to browse the site in question, and give their impressions as to its accessibility. A baseline is usually not involved with such testing.
All three methods have benefits and drawbacks.
Polling
The method designated “polling” is the only one of the three to which a number of statistical tools can be efficiently applied. It is a more traditional method of measuring. However:
-
Determining a target population, calculating a standard deviance, and
estimating sample size is a complicated matter due to the uncertainty in
defining who, exactly, accessibility is for. We recommend that, should
this method be applied, the population be defined independent of
ability, and
in steadinstead be picked from the set of users expected on the site. - The method is not cost effective, as sample sizes most likely will need to be large, and the users involved not expected to give their opinions on more than a very few documents from the site in question. Given time and cost constraints, this method can only provide a partial report.
- Cross–site comparison is difficult, both for practical and economical reasons, but also due to the difficulty involved in defining a target population common to multiple sites. A comparison between EU websites would, for instance, in theory include the entire population of the European Union.
- Repeated testing of a site requires that a new sample be chosen, as it is unrealistic that the old group of users can be re–used. Bias would be introduced in the group after the first test, re—testing find users going over familiar territory which they this time around may, or may not, know how to handle. This makes ensuring any–point accessibility difficult.
In conclusion, the poll method of accessibility testing has several draw–backs, in particular in terms of cost effectiveness and precision. It is also a method which requires a high level of statistical skill on the part of the team analysing results.
Expert Assessment
An expert, to quote Alistair MacLean, is loosely defined as a person who claims to know what he — or she — is doing; with the expert assessment method the biggest difficulty is finding someone competent in the field.
- There are currently no commonly agreed on educations for accessibility experts, nor are the certifications that exist without controversy. We are not likely to see this resolved in a neutral and objective manner any time soon.
- In order to find someone qualified to assess — or audit — a website, a number of possible companies must be listed, and their reliability estimated. This is a difficult and time–consuming task, as the number of snake–oil salespeople is high.
- Using a person who is well versed in the required baseline, and has experience in testing websites against it is, however, the single most efficient and precise method in terms of producing a report on the overall accessibility of a site.
The expert should, in our view, employ an automated tool, but also do manual inspection, user–agent testing, and interviews. Evaluating, for instance, what policies are in place for update of alternative content is a typical use of the latter technique.
User Groups
Assembling a group of users from the target population and basing a report upon their experiences with a site is, at first glance, an attractive method to use. However:
- It suffer from many of the same problems as polling; in particular when it comes to choosing which users to include in the group, the number of documents per site to examine, and how to precisely compare different sites.
- Skill levels among users must, and will, vary drastically. This leads to skewed results. It is not practical to assemble user groups large enough to iron out statistical uncertainties; personal skill and opinion then become important aspects. What is an easily navigated site for a highly experienced user of one browser, may be an insurmountable task for someone with less experience. Similarly, the experienced user might, sub-consciously, adapt to a poorly created site and deem it accessible.
- Questions such as whether or not to allow participants to use their own equipment — i.e. that with which they are most accustomed — or not, whether to do testing in a controlled environment, whether that environment hinder or help participants, and so forth must be answered.
Testing a site by asking people using it how they want it, is very tempting. As the above points illustrate, it is a method not without drawbacks. User groups are, however, an important tool for usability testing.
One of the differences between accessibility and usability is that the former focus on ensuring that any user can get to, and use, content and functions on a website. The latter focus on ensuring that content make sense, that functionality is logical, and that the site is not confusing and difficult to understand.
Reaching content depends on many factors, most of which are external to the user — code quality, browser support, and so forth. Once the information has been retrieved, understanding is largely independent on such details. In a majority of cases, difficulties comprehending the information offered does not depend on which browser used, or the skill with which a particular user–agent is operated, but rather on abilities and skills inherent to the user.
When an aspiring philosophiae doctor stand up to defend her thesis, it is normally a public event. Anyone who so choose may enter, and observe the proceedings. To be accessible, it must be possible for everyone to reach a point where they can observe. This does not mean that the content matter must be written so as to be understandable to anyone, regardless of skill in the topic.
The art of accessibility boils down to making information available; not to making information understood.
Being careful with how user groups are employed in testing accessibility does not in any way preclude involving users with different abilities, or groups representing their interests. Working together, in particular when first creating a baseline for testing, is of immense value.
Conclusion
Our conclusion, in short, is that a combination of automated testing tools, selected with care, and manual assessment done by equally carefully chosen experts, is the best and most cost effective method of ensuring one–point and any–point accessibility of websites.
Disclaimer
Greytower Technologies offer, among other services, automated accessibility testing and expert analysis. The author strove to ensure that this article is based in fact, and that all arguments and conclusions are objective.
Acknowledgements
Acknowledgement and kudos go to the following people, without whom this article would never have become a reality:
- David Dorward
- Jörgen Andreasen
- Mark Ng
- Victoria Gladman, Greytower Technologies (UK) Ltd.
Document Information
- Published
- 15th of July 2005.
- Revised
- 10th of August 2005
