Distill this purpose down into a few objectives for your test. "How usable is the product?" is not a good objective. The objective has to be something you can test for, for example: Does the delay in waiting for the Java applet to load cause users to leave the site? How difficult is it for a novice to do their long-form taxes using this software? Does the online help system provide enough tax code information? Is that information in easy-to-understand language, not government jargon?
Determine the experimental design. The experimental design refers to how you'll order and run the experiments to eliminate non-interesting variables from the analysis. For example, suppose you're testing tax software. Do you want subjects that have done their taxes using your software before, thus already having knowledge about the product? Maybe you want to run two groups of users through--rank novices in one group, and semi-experienced folks in another. There's a lot of information on experiment design in the usability testing books. If you want even more information on experiment design, see the references in the statistics and test design section of the bibliography--the quality craze of the 80's gave rise to a lot of interesting test designs that might be applicable (especially the ones that reduce sample size) to your situation.
Develop the tasks that your users will perform during each experiment. Of course, these would be derived from tasks that the users normally perform when they're using the product. Specify what you need to setup the scenario: the machine or computer states, screens, documentation, and other job aids that must be present. Also, specify what signifies a completed task--for example, if the user successfuly saves the edited document, or completes the manufacturing operation with a finished, in-spec part.
Specify the test apparatus. In traditional scientific experimentation, for example, biological or chemical research, from which usability testing methodology is ultimately derived from, the test apparatus would be the lab glassware, bunsen burners, flasks, etc. and other devices used in the course of the experiment. For usability testing, this is the computer and its software, or the mockup of the manufacturing workstation, or the prototype dashboard of a car.
The test apparatus can also include devices used in the running of the test, like video cameras to record the user's actions, scan converters to record the on-screen action, audio recorders to record verbal protocols, one-way mirrors to help the experimenter stay out of the subject's way, and so on. A lot of importance is placed on these items in regards to usability testing, but it really doesn't have to be that way. Even with simple home camcorders, or no video recording at all, you can find out a lot of useful information.
Identify the required personnel. You'll need at least one experimenter to run the test, from greeting the subject to explaining the test sequence to working with the subject during each task. You might also want to enlist an observer or two to reduce the data logging load on the experimenter.
Even if you've narrowed the user population down to a single profile, for example, "male or female fighter pilots with 20/20 vision between the ages of 22 and 35, with at least a bachelor's degree or equivalent," you'll still need to gather more information about them. How much experience with this type of display does each user have? Are they used to the old-fashioned mechanical gauges or do they prefer high-tech computerized displays? Are they colorblind? Which eye is dominant? You could go on and on, but the more knowledge you have about your sample subjects, the less you can be surprised by weird factors that skew your experimental data.
How do you find all these users? Well, by any means possible. Recruit from fellow employees and the family and friends of employees. Enlist temporary employment agencies and market research firms to get people (you might need to pay for them, but you'll probably have an easier time sorting through their characteristics). Get customers from Tech Support's call logs, or from Sales' lead lists. Offer free food at college campuses. Put out an ad on the Web, or in newspapers. Contact user groups and industry organizations. Consider other populations, like retirees who might have more spare time. Invite schools to send students over for a fieldtrip.
You might have problems finding particular user populations. If you need to test fighter pilots, can you get enough from each branch of the military to cover their specific biases? If you're testing an executive information system (EIS), can you procure enough executive-level people to test against, given their hectic schedules?
Prepare the test sample. The sample is the group of subjects you'll run through the test. How many do you need? Most common guidelines recommend at least four to five participants to find the majority of usability problems. Pick your sample based on your objectives and user profiles, and their availability on your test dates.
Most tests have each subject sign nondisclosure agreements and recording consent forms prior to the test. As a part of this filling-out-paper step, you can have the user complete a pre-test questionaire to identify domain knowledge or attitudes, or get more information about the user's characteristics.
Run the subject through the tasks and collect data. The typical test consists of a subject at a workstation, performing written tasks while the experimenter observes the user and asks questions or provides prompts if necessary.
Tests that are looking for primarily preferential or conceptual data (through thinking aloud, for example) can have a fairly large amount of interaction between the experimenter and the subject. For tests where you're trying to find out empirical data, like error rates, you'll want to reduce the interaction until it's a minimal influence upon the subject.
Let the subject work through the tasks without much interference. It will be tough to watch them struggle through tough parts, but it's better to learn from their struggling in the lab rather than have them struggle once they've paid for your product and brought it home. Of course, if a user really gets stuck to the point of tears or leaving the lab, assist them with getting through the immediate problem or simply move on to another task.
Even if you're not using a thinking-aloud protocol, you might want to ask the subject questions at different times during the test if you feel you'll learn a lot more about why the subject did something a certain way.
Thank the user for participating. Remember, the subjects are here doing you a big favor, and it's important to let them know you appreciate them. Most labs provide a small gift for the subject: a coffee mug, or t-shirt, or free software, after the test. Many times, you'll want to draw from your pool of previous subjects for a future test, so it's important to keep them happy about participating.
Summarize the performance data you've collected. Performance data like error rates and task durations is evaluated by performing statistical analysis on the data set. Most analysis consists of figuring the mean and standard deviation, and checking the data for validity. Does the data indicate any trends? Were particular parts of the product more difficult?
Summarize the preference data you've collected. By observing the user's actions, and recording the user's opinions, either during the test using a thinking-aloud protocol or asking questions, or before and after the test in the questionaires, you have amassed a large set of preference data. Most questionaire designs allow you to quantify opinions using numerical scales, and the quantitative data found thusly can be analyzed using statistics much as the raw performance data. You can also summarize this data by selecting quotes from the subjects to highlight in the report as soundbites.
You should note in your bibliography that while it has lots of good ideas, much of the information in it is significantly out of date. Some examples:Thanks to Merryl Gross for the info. Your note does really ring true. We cobbled together the lab at Cisco on a really, really low budget. It was as if the "Tightwad Gazette" lady decided to construct a usability lab--scrounged desks and chairs, telephones set on "conference call" as our intercom, etc. Ah, those good old days...
* Scan converters are not nearly as expensive, and are quite portable nowadays. We have one that does up to 800x600 resolution on both WinPC's and Macs that is in the $2000 range, which is about an order of magnitude less than she cites.
* Excellent portable labs with built in video and audio mixing capabilities and decent video editing are available in the $15,000 to $30,000 range. They can be set up in under an hour at any site (well, depending on how many flights of stairs there are) and can do titles and other effects when hooked up to a computer for editing. They DON'T require a Video Toaster, unless that's what's under the hood of the mix board in my lab now.
However, their note about how tripods are good and you need one is dead-on.
Lindgaard, G., Usability Testing and System Evaluation: A Guide for Designing Useful Computer Systems, 1994, Chapman and Hall, London, U.K. ISBN 0-412-46100-5
Rubin, Jeffrey, Handbook of Usability Testing, 1994, John Wiley and Sons, New York, NY ISBN 0-471-59403-2 (paper)
Chartier, Donald A. "Usability Labs: The Trojan Technology.''
Cline, June A., Omanson, Richard C., and Marcotte, Donald A. "ThinkLink: An Evaluation of a Multimedia Interactive Learning Project.''
Haigh, Ruth, and Rogers, Andrew. "Usability Solutions for a Personal Alarm Device.'' Ergonomics In Design (July 1994): 12-21
Heller, Hagan, and Ruberg, Alan. "Usability Studies on a Tight Budget.'' Design+Software: Newsletter of the ASD (1994)
Jordan, Patrick W., Thomas, Bruce, Weerdmeester, Bernard, (Eds.), Usability Evaluation in Industry, 1996, Taylor & Francis, Inc., London, UK. ISBN: 0-74-840460-0
Lund, Arnold M. "Ameritech's Usability Laboratory: From Prototype to Final Design.''
Whiteside, John, Bennett, John, and Holtzblatt, Karen. "Usability Engineering: Our Experience and Evolution'' from Handbook of Human-Computer Interaction, M. Helander (ed.). Elsevier Science Publishers B.V. (North Holland), 1988: 791-804.
Wiklund, Michael E., Usability in Practice, 1994, AP Professional, Cambridge, MA ISBN 0-12-751250-0
Yuschick, Matt, Schwab, Eileen, and Griffith, Laura. "ACNA--The Ameritech Customer Name and Address Service.''
Dayton, Tom, et. al. "Skills Needed By User-Centered Design Practitioners in Real Software Development Enironments: Report on the CHI `92 Workshop.'' SIGCHI Bulletin v25 n3, (July 1993): 16-31.
Jeffries, R., et. al., "User Interface Evaluation in the Real World: A Comparison of Four Techniques.'' Reaching through Technology: Proceedings of the 1991 CHI Conference, New Orleans, April-May 1991, NY: Association for Computing Machinery (ACM), 119-124.
Virzi, Robert A. "Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough?'' Human Factors, v34, n4 (1992): 457-468.
All content copyright © 1996 - 2011 James Hom