Editor’s Summary

Whether created by user experience designers or information specialists, taxonomies should always be subjected to evaluation processes. The taxonomy’s terms and relationships must be assessed in the context of its implementation as part of information architecture. Traditional card sorting in a face-to-face setting can be performed by any number of testers working independently; delphi-method card sorting involves iterative review by a succession of testers. Each requires advance preparation of cards showing candidate terms to be arranged in a way that makes sense to each tester. Dialogue with testers during the process provides qualitative insight into their thinking and logic, and data analysis sheds light on users’ mental models of the taxonomy. Usability studies then reveal how testers use the taxonomy to navigate, search and tag content. With similar preparation, online sorting methods facilitate testing and provide rich results for analysis. Regardless of the method used, testers should be selected with care, and terms must be distinguished from navigation and context. Testing should be performed early, using small studies to focus on key areas and larger samples for quantitative analysis.

Keywords

card sorting
taxonomies
index language construction
evaluation


Testing Taxonomies:  Beyond Card Sorting

by Alberta Soranzo and Dave Cooksey

In this article, we discuss why user testing of taxonomies is needed, consider how testing taxonomies is different from testing information architecture, review traditional card sorting techniques and introduce testing methods that go beyond card sorting. It is our goal to encourage those who craft taxonomies to test with real users.

The Need for User Testing of Taxonomies

Before we look at taxonomy testing techniques, we would first like to explain why we should test taxonomies with real users. In some cases taxonomies may be created by user experience designers or developers who may lack experience or training in taxonomy creation techniques. In these cases, testing with real users is a necessity. But what about taxonomies that are crafted by information specialists who are trained in taxonomy evaluation techniques?

We recommend that these taxonomies also require user testing because evaluation techniques do not directly consider the user. All the techniques employed to evaluate a taxonomy, such as search log analysis, web analytics review, competitive analysis, review of organizing standards and subject matter expert interviews, are techniques that evaluate the taxonomy’s terms and relationships from the perspective of the expert. But these activities cannot tell us how actual users interpret or utilize a taxonomy. Only testing with real users can.

The Difference Between Testing Taxonomy and Testing IA

We would like to make one more important point before we talk methods. Taxonomy is not information architecture. Often in our work in creating user experiences, we will hear our colleagues and clients talk about taxonomy as if it were just navigation or search, facets or filters. But as practitioners who craft taxonomies, we need to remind ourselves and those we work with that taxonomy and information architecture are not necessarily the same; rather, information architecture is the articulation of a taxonomy.

Taxonomies may contain terms and relationships that go beyond what a user will interact with on a website or in a content management system (CMS). But even if a taxonomy only contains terms and relationships that are to be implemented in an architecture, there may be myriad ways to accomplish this in a user experience.

This distinction is important to keep in mind because as we plan testing, we need to consider whether we are testing the terms and relationships contained in the taxonomy or if we are testing the way they are implemented in the information architecture through navigation, search and filters. The former contains less context than the latter, impacting study design and results.

In-person Testing Methods

Card sorting is considered the traditional route for testing taxonomies and it’s worth taking a look at how card sorting is performed before moving beyond it.

Traditional Card Sorting. Card sorting has traditionally been used to help design or evaluate the organization of websites. In a card-sorting session, participants organize topics into categories and then label them. Card sorting may be conducted with pieces of paper or one of several online card-sorting software tools. A typical card-sorting exercise is conducted in 4 stages.

  • Planning. Deciding what to test and how to recruit participants
  • Preparing the cards. Writing one concept/term per card. If any explanation is needed, it can be printed on the back of the card.
  • Sorting. Participants organizing cards into logical group that make sense to them. They then label the created groups.
  • Analysis. Collecting and analyzing the results. This step can be done manually (we recommend using Joe Lamantia’s spreadsheet [1]) or automatically by software.

Traditional card sorts may be conducted in three ways.

  1. Open sort. In an open card sort, there are no preset categories. Participants create their own categories into which they organize their cards and label them. This method helps reveal not only how they classify things but also what terms they use. Open sorting is generative; it is typically used to discover patterns in how participants classify concepts, which in turn helps generate ideas for organizing information.
  2. Closed sort. In a closed card sort, participants are provided with a predetermined set of categories that are already labeled. They then place the index cards into these fixed categories. This exercise helps reveal the degree to which the participants agree with the pre-existing categorization. Closed sorting is evaluative; it is typically used to judge whether a given categorization schema effectively organizes a given collection of content.
  3. Hybrid sort. In a hybrid card sort, one or more predefined groups are provided and the other groups are created by the participants. This type of test is helpful in situations where certain categories are required because of regulatory constraints, technical language or other non-negotiable reasons.

Regardless of the type of sort performed, card sorting gives insight into how people think about organization and allows for the creation of logical structures that support their information seeking needs. For more discussion on card sorting, see Donna Spencer’s Card Sorting [2].

Delphi-Method Card Sorting. Delphi-method card sorting is based on the same principles as traditional card sorting with a few important tweaks that result in focused testing sessions, lower costs and quicker results.

Drs. Kathryn Summers and Celeste Lyn Paul introduced Delphi-method card sorting while Dr. Paul was pursuing her Ph.D. at the University of Maryland Baltimore County [3, 4]. It is a card-sorting technique performed one participant at a time, one after the other, to test both terms and categorization. Briefly, Delphi-method card sorting works in the following manner:

  • Cards containing the terms to be tested are laid out on a table in hierarchical fashion.
  • Participants modify the deck by adding new cards, deleting cards (flipping them over), moving cards (flipping over one and adding a duplicate card in another spot on the table) and re-labeling cards (flipping over the existing card and placing a new card one top of it). Cards are not removed from the table so they may be reviewed by following participants.
  • The test is performed with participants working through the cards, each participant starting with the deck where the previous participant ended. Approximately 8-10 sessions are needed until the hierarchy and labels stabilize, meaning it becomes evident which parts of the structure most participants accept and which areas lack consensus.

So, step-by-step, how do we do Delphi-method card sorting? The steps below assume participants have been recruited and scheduled for testing sessions.

  1. Seed the deck. Decide to seed the deck or let the first user create the seed. We recommend seeding the deck yourself, especially if you have built and/or evaluated the taxonomy. If you start the test by letting the first participant seed the deck, be aware that the first person will set the stage for all the following participants.
  2. Label the cards. Write all the categories on blank index cards or have the first participant do so. If you yourself are seeding the card deck, printing the terms on the cards is an easy way for you to distinguish the seed from participant-added cards. If you are testing a taxonomy of physical things, such as an e-commerce product catalog, you may include an optional representative item image on each of the cards.
  3. Begin recording. Recording with a smartphone or digital camera allows you to check specific details later if needed. In general written notes should suffice. But if you want to create a presentation or report to share your findings, a recording is particularly useful in capturing compelling quotes.
  4. Interview the participants. Perform brief initial interviews so you understand the background of the participants and any of their particular needs and wants. The interviews will help in interpreting their comments.
  5. Explain the exercise. Describe how to manipulate the cards. Most participants will understand the instructions right away. You may have to help some participants with when to turn over a card, when to write a new one and so on. But this assistance is okay and will actually allow you to build rapport with them.
  6. Let the participants work. Allow the participants to go through the deck at their own pace and style. We have observed that most people go through the cards left to right, top to bottom, given that this is the way most of our participants read. But a few may want to jump around. Keep track of areas not addressed in your notes so you can make sure that the participants cover all the cards during the session.
  7. Watch and probe. While the session is underway, encourage dialogue to get rich, descriptive detail into what the participants are thinking. The main strength of a qualitative method like this is the ability to allow participants to explain their thoughts, providing insights into user mental models.
  8. Repeat. Repeat with participants until you are satisfied. Typically you will need 8-10 sessions before you see the same issues popping up again and again. There may be some items that generate strong opposing opinions. Note which ones those are as they will be important in terms of polyhierachy or generating synonyms. Most of the time, you will not be surprised by these disagreements, especially if the taxonomy has been evaluated.
  9. Perform Analysis. And finally, if necessary, analyze the data if more detail is required to describe mental models or preferred terms. But for the most part, at the end of a day of testing, no analysis is needed because the cards themselves and their arrangement will be all the data needed to begin implementing an information architecture, a major advantage to Delphi-method card sorting.

Usability Testing. While Delphi-method card sorting is a good way to understand user mental models, usability studies are excellent at measuring how well people interact with the articulation of a taxonomy. But they require a design that participants can interact with, not just index cards, which can mean more work upfront to prepare for a test. However, it is possible to leverage design artifacts already being created in the design process, such as design comps, wireframes or existing prototypes.

In order to test the taxonomy’s articulation in a design, simply insert a few specific taxonomy-related tasks into a usability test plan. The tasks need to be related to the information architecture of the design, such as navigating, searching or tagging content. The advantage to testing the articulation of a taxonomy in a usability test is that the results from these tasks will provide an understanding of how users think about specific terms used in the taxonomy within the context of a design, which is difficult to do in a card sort, where participants are only looking at terms on index cards.

Here are a few things to consider when performing usability testing of a taxonomy:

  • Offer the test moderator example tasks to be inserted into the study (if you are not running the tests yourself) and discuss what question you are hoping to answer.
  • Keep the tasks simple enough for participants to accomplish relatively quickly. Tasks with many steps are difficult to summarize or compare.
  • If you are responsible for creating the test artifacts, use real content. Lorem ipsum and other placeholder copy will cause problems with participants understanding the context of information and performing the tasks.
  • Attend the study sessions in person and take notes. Pay attention to when users discuss their understanding of concepts and terms and ask follow up questions during Q&A.
  • During the tests, explain to stakeholders how the taxonomy is driving the experience. This opportunity is invaluable, given that what we do as creators of taxonomies is not always obvious to others on our teams or to our clients.
  • Finally, offer to create an overview for the study organizer if you are piggybacking on someone else’s study. Give recommendations on how to improve the terms and relationships of the taxonomy as well as their articulation in the design’s information architecture, such as navigation schema, page layouts or content hierarchy.
Online Testing Methods

For our discussion of online taxonomy testing techniques, we will use an example from the Optimal Workshop suite of products because, in our experience, they are easy to use and quickly provide a very detailed level of analysis in a format that is interactive and easy to understand. But many other products are worth considering, depending on preference, budget or availability.

Before we look at the example below, we would like to discuss one method that we do not recommend in isolation: click-path analysis. A click path (or clickstream) is the sequence of hyperlinks website visitors follow on a given site presented in the order viewed. The problem we find with click-path studies is that we cannot tell what users were thinking when they clicked through a series of links. But, a click-path study in conjunction with other kinds of tests such, as a card sort or an online study, can make study results more meaningful and actionable.

Treejack. Treejack is an online closed card sort offered by Optimal Workshop [5] where an existing structure of categories and sub-categories is tested. Users are given specific tasks and asked to complete them by navigating a collection of cards. Each card contains the names of subcategories related to a category. Users are asked to find the card most relevant to the given task starting from top-level categories. Using an online testing method like Treejack ensures that the organization schema is evaluated in isolation and that the effects of navigational aids and visual design are neutralized.

Additionally, there are other views of the resulting data offered by a product like Treejack that provide further detail into participant behavior.

  • Task analysis. This view shows a breakdown of success, directness time taken and an overall score calculated for each of the given tasks. Additional details are provided for each specific task to see how they were performed in the context of the larger task.
  • Click path. This data view, called Pietree in the Optimal suite of products, is an exact breakdown of how a particular user arrived at each destination. It detects labels that may be unclear to users.
  • Destinations table. This data view gives a quick overview of the tree structure highlighting the correct answers and the probable problem areas where people incorrectly found the answer to the task.

Here are few things to consider when performing Treejack testing.

  • If you are testing a site with multiple navigation systems (for example, top menu and in-page navigation), conduct a discrete test.
  • Keep the tasks focused and limit the number of correct destinations per task. If you notice before you run the test that there are too many targets, it is an indication that your structure isn’t as well organized as you may have thought.
  • Consider labeling for navigation and content carefully. Poor naming leads to inconclusive tests results.
  • Choose your test participants carefully to ensure they are truly representative of the intended audience of your project.
  • Do a dry run of the test. Mistakes are common. By testing the test before launch, you are bound to discover a few insights that will help clarify concepts and tasks.
  • Print all the results and charts, especially the Pietrees, and stick them to a wall before beginning analysis. Having a clear visual of the paths your users took will help you (and your stakeholders) get an immediate sense of the strengths and weaknesses of your IA and spot patterns that may otherwise be more difficult to detect.
Things to Keep in Mind

Before setting off to test taxonomies, we list a few things that we’ve learned over the years in performing taxonomy testing. These tips will help you craft better studies and get actionable results that drive better designs.

  • Level of expertise. The level of expertise or specific training will affect the way participants talk about terms in a taxonomy, so consider testing user groups separately. For example, physicians and patients use very different terms to refer to the same concepts, such as physician and carcinoma versus doctor and cancer.
  • Terms and context. As stated above, card sorts work really well for understanding mental models and user acceptance of terms outside specific contexts, while assigned tasks within usability studies are great for understanding how users negotiate terms in the context of wayfinding and information search.
  • Study focus. Taxonomy is abstract, and a study can become too long or complicated very quickly if you are trying to examine too much. Craft smaller studies focused on key problem areas. And in keeping your studies small, you’ll be able to repeat them more often.
  • Confusing taxonomy and navigation. We mentioned this point above but it bears repeating. Taxonomy isn’t simply navigation, but the underlying structure that enables the creation of navigation. Testing your taxonomy will not ensure that a navigation schema is usable.
  • Multiple methods. Performing small-sample, qualitative studies with large-sample, quantitative studies will provide rich descriptive detail paired with numerical data.
  • Visual design. Beware when performing tests with artifacts that contain high-fidelity design because it can affect user behavior and perceptions. Be sure to isolate tasks and questions concerning taxonomy and information architecture from implementation in visual design and interactions.
  • Test early. Testing taxonomy terms and information architecture early on allows design to proceed knowing that information is easily findable by users. And it allows for subsequent user tests to focus on interaction and visual design without worrying about untested terms or navigation schema.
  • Not testing at all. From our perspective, not testing is the biggest sin of all. Much as we think we know our users, we cannot be sure how others find, retrieve and consume information without testing. We may perform evaluation techniques on our taxonomies but this step will not guarantee that they are user friendly.

We hope this article has given you information necessary to create and execute taxonomy tests with real users. Happy testing!

Resources Mentioned in the Article

[1] Lamantia, J. (August 26, 2003). Analyzing card sort results with a spreadsheet template. Boxes and Arrows. Retrieved from http://boxesandarrows.com/analyzing-card-sort-results-with-a-spreadsheet-template/

[2] Spencer, D. (2009). Card sorting: Designing usable categories. Brooklyn, NY: Rosenfeld Media.

[3] Paul, C. L. (2008). A modified Delphi approach to a new card sorting methodology. Journal of Usability Studies, 4(1), 7-30. Retrieved from http://uxpajournal.org/a-modified-delphi-approach-to-a-new-card-sorting-methodology/

[4] Paul, C. L. (2007). A Delphi approach to card sorting [slides]. Presentation at the 2007 Information Architecture Summit. Retrieved from www.asis.org/Conferences/IA07/Sunday/Celeste_Lynn_Paul-Introduction%20to%20Delphi%20Card%20Sorting.pdf

[5] Optimal Workshop: www.optimalworkshop.com/

For Further Reading

Hedden, H. (2010). The accidental taxonomist. Medford, N.J.: Information Today.
Rosenfeld, L., Morville, P., & Arango, J. (2015). Information architecture for the World Wide Web: Designing for the web and beyond. Brooklyn, NY: O’Reilly Media.


Alberta Soranzo is a user experience consultant. She can be reached at albertasorzanzo.com.

Dave Cooksey is founder and lead consultant in user experience at saturdave. He can be reached at dave<at>saturdave.com