Back to home page

General Concepts of Usability Testing 

What is it?

Usability testing is carrying out experiments to find out specific information about a design. Tests have their root in experimental psychology, which used to mean a reliance upon heavy-duty statistical analysis of data. Today, with more emphasis on the interpretation of the results rather than actual data-driven figures, you see less importance given to the hard numbers and more to the other things you find out during the test. For example, a lot of tests done today use the thinking-aloud protocol in conjunction with some sort of performance measurement. While performance measurement is still useful, information gathered from the thinking aloud protocol often makes its way into the product faster--it doesn't need to be aggregated and analyzed before you can act on it.

How do I do it?

The overall process is simple; get some users and find out how they work with the product. Usually you observe individual users performing specific tasks with the product. You collect data on how they're doing--for example, how long they take to perform the task, or how many errors they make. Then analyze the data from all your experiments to look for trends. This section, based on Rubin's Handbook of Usability Testing, breaks these phases out in more detail.

Determine what you're trying to find out

What do you want to know about your product? Start with an overall purpose for your investigation; maybe you want to find out why tech support's call rate has gone way up since the last release. Or, your market share has slipped and you wonder if the other guy's product sells better because it's more usable.

Distill this purpose down into a few objectives for your test. "How usable is the product?" is not a good objective. The objective has to be something you can test for, for example: Does the delay in waiting for the Java applet to load cause users to leave the site? How difficult is it for a novice to do their long-form taxes using this software? Does the online help system provide enough tax code information? Is that information in easy-to-understand language, not government jargon?

Design your test

Identify the users you'll test--you'll need this because you have to go out and find some users, and knowing if you need novices, or experts, male or female or both, young or old or both, is important. Who are the target users for your product? If you're testing fighter jet displays, you don't want a horde of high school kids running through your test scenarios. If you're testing soft drink machines, you do want high school kids in your user population, in addition to the fighter pilots. This user profile of your product's users is important in developing your test design and choosing your sample subjects.

Determine the experimental design. The experimental design refers to how you'll order and run the experiments to eliminate non-interesting variables from the analysis. For example, suppose you're testing tax software. Do you want subjects that have done their taxes using your software before, thus already having knowledge about the product? Maybe you want to run two groups of users through--rank novices in one group, and semi-experienced folks in another. There's a lot of information on experiment design in the usability testing books. If you want even more information on experiment design, see the references in the statistics and test design section of the bibliography--the quality craze of the 80's gave rise to a lot of interesting test designs that might be applicable (especially the ones that reduce sample size) to your situation.

Develop the tasks that your users will perform during each experiment. Of course, these would be derived from tasks that the users normally perform when they're using the product. Specify what you need to setup the scenario: the machine or computer states, screens, documentation, and other job aids that must be present. Also, specify what signifies a completed task--for example, if the user successfuly saves the edited document, or completes the manufacturing operation with a finished, in-spec part.

Specify the test apparatus. In traditional scientific experimentation, for example, biological or chemical research, from which usability testing methodology is ultimately derived from, the test apparatus would be the lab glassware, bunsen burners, flasks, etc. and other devices used in the course of the experiment. For usability testing, this is the computer and its software, or the mockup of the manufacturing workstation, or the prototype dashboard of a car.

The test apparatus can also include devices used in the running of the test, like video cameras to record the user's actions, scan converters to record the on-screen action, audio recorders to record verbal protocols, one-way mirrors to help the experimenter stay out of the subject's way, and so on. A lot of importance is placed on these items in regards to usability testing, but it really doesn't have to be that way. Even with simple home camcorders, or no video recording at all, you can find out a lot of useful information.

Identify the required personnel. You'll need at least one experimenter to run the test, from greeting the subject to explaining the test sequence to working with the subject during each task. You might also want to enlist an observer or two to reduce the data logging load on the experimenter.

Get some users

Assemble a list of users from which you'll draw a sample population of test subjects. There's so much written about picking subjects that it would be hard to list all the issues, but here's a few guidelines. You'll need enough users to fill your sample population with the correct mix of experience, skills, and demographic characteristics--otherwise, other factors might creep into your experimental design and influence the data. The user profile you determined during your expermental design will help you identify the exemplar user for your product. For example, a fighter jet display might be "used" by not only the pilots themselves, but also maintenance workers, installers, diagnostic testers, and instructors. However, for the purposes of what you're trying to find out ("Does the radar display of surrounding aircraft allow the user to avoid collisions during inverted flight?") you might be concerned only with one segment of the overall user population--the pilots.

Even if you've narrowed the user population down to a single profile, for example, "male or female fighter pilots with 20/20 vision between the ages of 22 and 35, with at least a bachelor's degree or equivalent," you'll still need to gather more information about them. How much experience with this type of display does each user have? Are they used to the old-fashioned mechanical gauges or do they prefer high-tech computerized displays? Are they colorblind? Which eye is dominant? You could go on and on, but the more knowledge you have about your sample subjects, the less you can be surprised by weird factors that skew your experimental data.

How do you find all these users? Well, by any means possible. Recruit from fellow employees and the family and friends of employees. Enlist temporary employment agencies and market research firms to get people (you might need to pay for them, but you'll probably have an easier time sorting through their characteristics). Get customers from Tech Support's call logs, or from Sales' lead lists. Offer free food at college campuses. Put out an ad on the Web, or in newspapers. Contact user groups and industry organizations. Consider other populations, like retirees who might have more spare time. Invite schools to send students over for a fieldtrip.

You might have problems finding particular user populations. If you need to test fighter pilots, can you get enough from each branch of the military to cover their specific biases? If you're testing an executive information system (EIS), can you procure enough executive-level people to test against, given their hectic schedules?

Setup the test

Prepare the test apparatus. The test apparatus for a usability test includes the computer, and its software for a software test, or the machine or mockup for a hardware test. Some tests are run with prototypes; if so, ensure that the states to be encountered during each task scenario will be available in the prototype. Also include the materials that will be provided to the subject and to the experimenter during the test. The subject usually gets a list of the tasks to perform. Often, the steps to perform each task are intentionally omitted if you want to determine the discoverability of certain command sequences. The experimenter usually gets a basic script of how to present the tasks to the subject user, and a form to help log observations during the test. Sometimes an agreed-upon shorthand for noting test events is useful in the rush to capture everything that's going on.

Prepare the test sample. The sample is the group of subjects you'll run through the test. How many do you need? Most common guidelines recommend at least four to five participants to find the majority of usability problems. Pick your sample based on your objectives and user profiles, and their availability on your test dates.

Run the test

Prepare the subject for the test. Since most people are uncomfortable when they're put into a laboratory and asked to perform tasks while being timed and their mistakes are logged for analysis, it's crucial to set the subject at ease. Explain that the user can stop the test at any time, go use the bathroom, or take a break if he or she needs to. Emphasize that you're testing the product, not the user, and that they need not feel pressured by the test. Thank the user for participating.

Most tests have each subject sign nondisclosure agreements and recording consent forms prior to the test. As a part of this filling-out-paper step, you can have the user complete a pre-test questionaire to identify domain knowledge or attitudes, or get more information about the user's characteristics.

Run the subject through the tasks and collect data. The typical test consists of a subject at a workstation, performing written tasks while the experimenter observes the user and asks questions or provides prompts if necessary.

Tests that are looking for primarily preferential or conceptual data (through thinking aloud, for example) can have a fairly large amount of interaction between the experimenter and the subject. For tests where you're trying to find out empirical data, like error rates, you'll want to reduce the interaction until it's a minimal influence upon the subject.

Let the subject work through the tasks without much interference. It will be tough to watch them struggle through tough parts, but it's better to learn from their struggling in the lab rather than have them struggle once they've paid for your product and brought it home. Of course, if a user really gets stuck to the point of tears or leaving the lab, assist them with getting through the immediate problem or simply move on to another task.

Even if you're not using a thinking-aloud protocol, you might want to ask the subject questions at different times during the test if you feel you'll learn a lot more about why the subject did something a certain way.

Debrief the user

Discuss the test with the user. After the tasks are complete, and the test is over, chat with the subject about the test.  Go over events that happened during the test to gather more information about what the subject was thinking at that time. One way to review the events is to recall the event and discuss it with the subject, or to simply ask the subject which events were the most noteworthy.

Thank the user for participating. Remember, the subjects are here doing you a big favor, and it's important to let them know you appreciate them. Most labs provide a small gift for the subject: a coffee mug, or t-shirt, or free software, after the test. Many times, you'll want to draw from your pool of previous subjects for a future test, so it's important to keep them happy about participating.

Analyze your data

Find the big problems first. Identifying the big problems is probably easiest since they would be evident through observation notes. If every subject had a problem with a particular menu item, obviously that item's design needs to be revisited.

Summarize the performance data you've collected. Performance data like error rates and task durations is evaluated by performing statistical analysis on the data set. Most analysis consists of figuring the mean and standard deviation, and checking the data for validity. Does the data indicate any trends? Were particular parts of the product more difficult?

Summarize the preference data you've collected. By observing the user's actions, and recording the user's opinions, either during the test using a thinking-aloud protocol or asking questions, or before and after the test in the questionaires, you have amassed a large set of preference data. Most questionaire designs allow you to quantify opinions using numerical scales, and the quantitative data found thusly can be analyzed using statistics much as the raw performance data. You can also summarize this data by selecting quotes from the subjects to highlight in the report as soundbites.

When should I use this technique?

Usability testing is used throughout the product development lifecycle. In early stages of product development, testing the previous version or competitors' products gives the design team benchmarks to shoot for in the design. In middle stages of development, testing validates the design and provides feedback with which to refine the design. At the later stages, testing ensures that the product meets the design objectives.

1998 Update!

Well, times do change. I got this email back in March of 1998, in regards to my citing Hagan Heller's article on low-cost usability testing from 1994:
You should note in your bibliography that while it has lots of good ideas, much of the information in it is significantly out of date.  Some examples:

   * Scan converters are not nearly as expensive, and are quite portable nowadays.  We have one that does up to 800x600 resolution on both WinPC's and Macs that is in the $2000 range, which is about an order of magnitude less than she cites.
   * Excellent portable labs with built in video and audio mixing capabilities and decent video editing are available in the $15,000 to $30,000 range.  They can be set up in under an hour at any site (well, depending on how many flights of stairs there are) and can do titles and other effects when hooked up to a computer for  editing. They DON'T require a Video Toaster, unless that's what's under the hood of the mix board in my lab now.

However, their note about how tripods are good and you need one is dead-on.

Thanks to Merryl Gross for the info. Your note does really ring true. We cobbled together the lab at Cisco on a really, really low budget. It was as if the "Tightwad Gazette" lady decided to construct a usability lab--scrounged desks and chairs, telephones set on "conference call" as our intercom, etc. Ah, those good old days...

Who can tell me more?

Click on any of the following links for more information:
 
Dumas, JS, and Redish, Janice, A Practical Guide to Usability Testing, 1993, Ablex, Norwood, NJ
ISBN 0-89391-991-8 (paper)

Lindgaard, G., Usability Testing and System Evaluation: A Guide for Designing Useful Computer Systems, 1994, Chapman and Hall, London, U.K. ISBN 0-412-46100-5

Rubin, Jeffrey, Handbook of Usability Testing, 1994, John Wiley and Sons, New York, NY ISBN 0-471-59403-2 (paper)

Additional Information

Bell, Brigham, Rieman, John, and Lewis, Clayton. "Usability Testing of a Graphical Programming System: Things We Missed in a Programming Walkthrough.'' Communications of the ACM volume/number unknown (1991): 7-12

Chartier, Donald A. "Usability Labs: The Trojan Technology.''

Cline, June A., Omanson, Richard C., and Marcotte, Donald A. "ThinkLink: An Evaluation of a Multimedia Interactive Learning Project.''

Haigh, Ruth, and Rogers, Andrew. "Usability Solutions for a Personal Alarm Device.'' Ergonomics In Design (July 1994): 12-21

Heller, Hagan, and Ruberg, Alan. "Usability Studies on a Tight Budget.'' Design+Software: Newsletter of the ASD (1994)

Jordan, Patrick W., Thomas, Bruce, Weerdmeester, Bernard, (Eds.), Usability Evaluation in Industry, 1996, Taylor & Francis, Inc., London, UK. ISBN: 0-74-840460-0

Lund, Arnold M. "Ameritech's Usability Laboratory: From Prototype to Final Design.''

Whiteside, John, Bennett, John, and Holtzblatt, Karen. "Usability Engineering: Our Experience and Evolution'' from Handbook of Human-Computer Interaction, M. Helander (ed.). Elsevier Science Publishers B.V. (North Holland), 1988: 791-804.

Wiklund, Michael E., Usability in Practice, 1994, AP Professional, Cambridge, MA ISBN 0-12-751250-0

Yuschick, Matt, Schwab, Eileen, and Griffith, Laura. "ACNA--The Ameritech Customer Name and Address Service.''

Usability Testing Techniques and Issues

Bailey, R. W., et. al. "Usability Testing versus Heuristic Evaluation: A Head-to-Head Comparison.'' Proceedings of the Human Factors Society 36th Annual Meeting, (1992): 409-413.

Dayton, Tom, et. al. "Skills Needed By User-Centered Design Practitioners in Real Software Development Enironments: Report on the CHI `92 Workshop.'' SIGCHI Bulletin v25 n3, (July 1993): 16-31.

Jeffries, R., et. al., "User Interface Evaluation in the Real World: A Comparison of Four Techniques.'' Reaching through Technology: Proceedings of the 1991 CHI Conference, New Orleans, April-May 1991, NY: Association for Computing Machinery (ACM), 119-124.

Virzi, Robert A. "Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough?'' Human Factors, v34, n4 (1992): 457-468.

 
 

All content copyright © 1996 - 2011 James Hom