Current Issue
It's a small world after all: Western usability guidelines predict behavior of Chinese users of on-line bookstores - 3 July 2005
by Josephine K. Y. Yau and William G. Hayward
Read this article in Chinese (translated by Ke Chen and Kevin Huang, proof read by Qiaoqiao Huang and Christina Li)
Abstract
The present study examined whether Western usability guidelines apply to Chinese web sites. Nielsen et al (2000) proposed a set of 207 usability guidelines derived from observations in the field. We took a subset of 48 rules, and looked at the compliance rate (number of guidelines a web site complied with, divided by the total number of guidelines), task completion time, task accuracy, and users’ perceived usability and likeability for four Chinese online bookstores. Results showed a clear relationship between adherence to the rules and usability of the site: as the web site’s compliance rate increased, so did the usability and the impression the web site received from its users. These results suggest that the rules governing behavior of Chinese users are similar to those of Western users. More generally, this study calls into question the widely-held intuition that usability for Asian web sites should be different than usability for Western sites.
Keywords: web usability, culture, e-commerce, human-computer interaction
1. Introduction
It is a truism that usability ultimately determines the success or failure of an interface, be it an aircraft cockpit, a new first-person shooter game, or a web site. As the New Economy meltdown of 2000 showed us, no amount of funding and attention to features, marketing, and branding can make people buy a system if they can’t figure out how to use it. Perversely, one of the benefits of the crisis in the tech sector has been an increased understanding of the role of usability in software, web site, and appliance design.
1.1 What is usability?
Recognizing the importance of usability is one thing; actually creating a more usable interface is another. A number of difficulties stand in the way of the web designer who wants to create a more usable site. First, a definition of usability is required; in particular, whether usability should be considered an attribute or a process. The former view is characterized by Jakob Nielsen; in various forums (eg., Nielsen, et al., 2000; www.useit.com), he has argued that usability is the status of a system that can be achieved by following a set protocol (testing the right number of subjects, using the right set of usability guidelines). According to the Nielsen view, usability on the web is created by homogenization, if not of the entire web then at least of sectors (book sellers, travel sites, etc.). Nielsen’s widely quoted axiom is that you should design your site to follow the patterns of other sites, as users will spend most of their time on sites other than yours (Nielsen, 2000). Following this logic, Nielsen and colleagues (2000) created a set of usability guidelines for e-commerce sites by synthesizing the elements of a great many usable web sites. According to the authors, greater adherence to the guidelines results in a more usable site. Finally, Nielsen argues that flaws in usability are fairly easily detected, and recommends conducting user tests with only five subjects (which, he argues, will catch 85% of the problems; Nielsen & Landauer, 1993).
Jared Spool and associates at User Interface Engineering have a quite different view of usability. According to these authors, usability is a process rather than an attribute of a site. Thus, if a web site (or other interface) goes through a process of usability enhancement it will become more usable; if it does not, it is likely that the site’s usability will be lacking. Key to Spool’s conception of usability is the idea that it cannot be prescribed. Consider two of Spool’s findings. First, he has argued that, contrary to Nielsen, five users is nowhere near enough to detect usability problems with a web site; alternative estimations of necessary numbers range from 18 to 90 or more (Spool & Schroeder, 2001). One reason for this difference with Nielsen is Spool’s conception of the usability task – as Hudson (2001) has pointed out, Spool advocates an open-ended, relatively unconstrained usability test, which requires many more users in order to see potential usability problems with the site.
1.2 The validity of usability guidelines
A second point at which Spool’s views deviate from Nielsen’s concerns the utility of usability guidelines. In keeping with the “usability as process” view, Spool argues that nothing can fix usability guidelines except a protocol for usability testing. Guidelines, he argues, are easy to produce but have shown relatively little validity. In a recent paper (Spool, 2002), he argues that guidelines are often difficult to interpret, or are simply incorrect. For example, he tested a guideline from Bernard (2002), that common features of e-commerce sites, like the shopping cart button, should be placed in positions on a web page where users expect them. According to Spool, user expectations actually had no validity as a prediction of site usability when the locations of such features were manipulated. Thus, the logic of the guideline did not carry through to predicting user behavior. Certainly, if usability guidelines are not tested, it is difficult to argue that adherence to them will lead to a more usable interface.
1.3 Cultural specificity in usability
In addition to Spool’s concerns about the validity of usability guidelines, we must also worry about their generality. Most such guidelines (eg., Bernard, 2002; Nielsen, et al., 2000) are developed by observing users in the United States and, to some lesser extent, Europe. The web now is truly worldwide, however, and so designers from every country are becoming concerned with usability (in addition, US designers are concerned about a site’s usability for international users). To what extent will guidelines developed for one cultural and/or linguistic group be able to predict usability with another? There are several possibilities. First, translation of a computer interface into other languages is not always feasible or appropriate (Kukulska, 2000). This is particularly true of translating English into Asian languages such as Chinese. Written Chinese uses a semantic-based logography in which the structure-meaning relationships of linguistic elements are much closer than in English. On the other hand, English is phonologically based, so we can use its visual form as a cue for pronunciation, but cannot derive a word’s meaning merely from its structure. The difference in language systems between Chinese and English may produce differences in cognitive functioning (as argued by the well-known Whorfian hypothesis of linguistic relativity; Whorf, 1956; see Roberson et al., 2000, for a more recent formulation).
The second problem with generalizing standards derived from one culture to another comes from differences in socio-cultural norms and cognitive styles. Many aspects of psychological functioning, from aesthetics to interpersonal dynamics to motivations, will vary from culture to culture. As such, behavioral rules derived from one culture may not transfer to another. For example, Choong and Salvendy (1999) investigated the effects of cultural difference on the design of appropriate interfaces for Chinese and American users. They found that Chinese participants showed stronger performance advantages with concrete knowledge representation and thematic interface structures than Americans.
On the other hand, web usability guidelines mainly address basic human information processes, like memory and attention span. In this sense, the guidelines should be applicable in the East as well as the West. Thus, determining whether web usability guidelines derived in the US and Europe can be applied to Chinese web users is an important problem. Of course, as Spool notes, web designers and consultants have suggested a variety of usability guidelines for others to take into consideration when constructing web sites, but little if any empirical evaluation is usually performed to ensure that the guidelines actually do produce better sites. Hence, as well as investigating their cross-cultural validity, studies are needed to systematically test and verify these guidelines within any context.
1.4 General methodology of the study
One of the most popular, recent sets of guidelines comes from Nielsen et al. (2000). They tested twenty business-to-customer e-commerce web sites. Sixty-four participants from the United States and Denmark were asked to perform some shopping tasks on these sites. Users were asked to think aloud as they worked, and a trained observer took notes. Based on observations during the study, as well as the experts’ experience, Nielsen et al (2000) derived a set of 207 design guidelines (hereafter referred to as the Nielsen guidelines) for creating a good e-commerce user experience. These guidelines covered a wide range of topics, including selling strategies, trust, category pages, search, product pages, checkout and registration, and international users. By producing guidelines Nielsen et al. provided a relatively objective measure of web site usability. However, the study has a number of drawbacks. First, use of the guidelines is very much dependent on the skills, experience, and ability of the observers and experts. Not all organizations will have usability experts who are sufficiently skilled to implement all the guidelines. Second, and more crucially, the authors provide no empirical test of the efficacy of their guidelines. In effect, the reader is told to take their word for the validity of the guidelines. In order to correct this problem, an external empirical study is needed to validate their usefulness.
In this study, we evaluated whether the Nielsen guidelines predicted usability of Hong Kong web sites. Although the entire set contains 207 guidelines, we paid particular attention to the guidelines related to the finding of information. This decision was taken for a few reasons. First, locating target information is fundamental to most web site tasks, particularly on-line shopping. Therefore, examining how the web site is organized to facilitate search, and whether appropriate information is readily available, are important concerns for on-line success. According to Nielsen et al.’s (2000) study, inability to find an item was the most common reason for task failure. Second, verification of many of the guidelines requires knowledge of the structure of the site and its programming, which was not available in this study. Therefore, we selected 48 guidelines related to navigation and product and service information. The selected rules include areas related to classification, search function, winnowing tools, product listing pages, and product and customer service information.
Our hypothesis was that if the Nielsen guidelines were appropriate for Hong Kong, a web site should become easier to use as it complied with more of them. We chose four Chinese on-line bookstores, in order to reduce differences between sites, and measured usability by task completion time, task accuracy and a questionnaire. In addition, we also measured users’ overall impressions of each site.
2. Method
2.1 Design
This experiment was a one-way within-subjects design. There were four web sites with varying usability guideline compliance rates. The dependent variables were task completion time, accuracy, perceived usability and perceived likeability of the web sites.
2.2 Participants
Twenty undergraduate students (ten males and ten females) from the Chinese University of Hong Kong participated in the experiment. All participants spoke Chinese (Cantonese) as their native language, and all were familiar with Chinese word processing procedures.
2.3 Materials
2.3.1 Web usability guidelines.
Forty-eight rules were used from Nielsen et al.’s (2000) study. Three kinds of rules were included; rules that were related to product information (e.g. provide reviews and/ or ratings of products); customer service information (e.g. provide links on the home page to shipping and delivery information); and methods of navigation: i) search function (e.g. put the search box on every page) and ii) classification (e.g. consider multiple classification schemes). Some of the rules were further broken down if they contained multiple meanings or criteria. Only rules that were concrete and had a reasonably objective interpretation were selected. In addition, we had two independent raters assess the adherence of each web site to the guidelines; this process produced a relatively high agreement rate (see next section).
2.3.2 Web sites.
Four Chinese on-line bookstores were selected, with varying guideline compliance rates. They were (A) isubculture.ichannel.com.hk ; (B) www.compubook.com.hk ; (C) www.hongkongbooks.com.hk ; and (D) www.cp1897.com.hk. The compliance rates for these online bookstores were 21%, 40%, 52% and 73% respectively. The inter-rater reliabilities for the four web sites ranged from .73 to .92.
2.3.3 Task.
There were three types of tasks; product information (e.g. In ‘Introduction to e-commerce’, did the authors mention the electronic pay system?); customer service information (e.g. Can I use an ATM to pay the bill?); and search for a particular product according to some predefined criteria (e.g. Please find a book which can tell me my fortune in the Year of the Snake).
2.3.4 Questionnaire.
The questionnaire measured participants’ impression towards the web sites. It was a seven point bipolar scale. It consisted of two subscales. The first subscale concerned likeability, which measured participants’ subjective feelings, evaluation and degree of trust towards the web site (e.g. Do you find this web site attractive? How likely would you be to purchase goods from this web site? Do you find this web site reliable?). The second subscale measured participants’ perception on the degree of usability of the web sites. Questions included whether the web sites provide sufficient product-related information (e.g. Do you find this web site provides sufficient information about the books?); navigation and search functions (e.g. Do you find this web site easy to navigate or difficult to navigate? Do you think it is easy or difficult to search for particular books in this web site?).
2.4 Procedure
All participants were tested individually. The testing sequence for the web sites and tasks was randomized for each participant. Participants were given one minute to familiarize themselves with each web site, and then the first task was given to them. Their task completion time was measured as the period from the first move of the mouse until they first tried to write the answer. After having finished all three tasks for a particular web site, participants needed to complete a questionnaire measuring their perceived likeability and usability of the web site. The same procedures were repeated for each web site. An observer was present during the experiment to record the task completion time for each task and facilitate the experimental process. The observer was not allowed to answer any questions related to task performance during the experiment.
3. Results and Discussion
Web usability was measured by task completion time, task accuracy and users’ perceptions of the usability of the site, while the likeability scale measured user web site preferences. Task completion and accuracy (objective measures) were determined for each subject by averaging their performance across the three tasks for each site. Times were only included in the completion time measure if the task was successfully completed. Perceived usability and likeability (subjective measures) were determined by calculating the average score for the web site from relevant items in the questionnaire, and then dividing by the total of the scale, resulting in a percentage in which high scores denote high perceived usability and likeability. Repeated-measures one-way ANOVAs were further performed to compare the mean difference of the four web sites on each of the dependent variables.

Figure 1. Measures of user behavior. On the right is shown the mean completion time of tasks for each web site; on the left is the accuracy at completing those tasks. Here and in Figure 2, web sites are plotted in increasing order of guideline adherence.
3.1 Objective Measures
The first two dependent variables measured the users’ behavioral interactions with the web sites. As can be observed from Figure 1, both showed a pattern of better performance as the web site adhered with more of the usability guidelines. These differences were statistically significant for both task completion time, F(3, 57)= 24.00, p<.001, and completion accuracy, F(3,57)=8.92, p<.001. Thus, there is a clear association between adherence to the Nielsen guidelines and the usability of the web sites.
3.2 Subjective Measures
The remaining two dependent variables measured subjects’ judgments about different aspects of the web sites, and are shown in Figure 2. As with the objective measures, judgments show a clear relationship with web site adherence to the guidelines; the more guidelines are adhered to, the more positive are the ratings of both likeability and usability. These differences were statistically significant; both the ANOVAs on perceived likeability, F(3,57)=30.22, p<.001, and perceived usability, F(3,57)=33.56, p<.001, were significant. These results show that the Nielsen guidelines do not simply predict user behavior; they also predict users’ judgments about their interaction with the site, and whether they enjoyed that interaction.
3.3 Linear Trends
To further verify the hypotheses, planned comparisons were employed to analyze the trends of the four web sites on the dependent variables. The linear contrast for the four web sites on task completion time (p<.001), task accuracy (p<.001), perceived likeability (p<.001) and perceived usability (p<.001) were all statistically significant. Thus, not only were there significant differences in performance among the web sites, but those differences followed a linear trend. From looking at Figures 1 and 2 we can see that for each dependent variable this trend is for scores to get better (completion time lower, other measures higher) as adherence to the usability guidelines increased.

Figure 2. Measures of users’ subjective ratings. The graph on the left shows users’ ratings of how much they liked the web site, whereas the right graph shows their judgments of perceived usability.
4. Conclusions
4.1 Implications for designers of Chinese web sites
This study provides empirical support for Nielsen’s guidelines as being appropriate for Chinese web sites. The results showed that the number of guidelines a web site complied with is directly proportional to both usability and users’ positive impressions of the web sites. The importance of this result is that it gives the guidelines empirical support for predicting interactions with websites which use neither the English language nor Euro-American culture. In addition, the measures were both performance-based and subjective, and all were influenced by adherence to the guidelines. From these results it is extremely difficult to argue that usability is not important, or that Chinese usability is fundamentally different from that found for other populations. It appears that complying with these guidelines can help a Chinese web site enhance its usability and make a more favorable impression with users, as the time and accuracy for users to perform a task will be improved.
We should note that these results do not mean that usability guidelines for Chinese web sites will be identical to those for American or European sites. Clearly, some aspects of culture and/or language may affect usability considerations, and these should rightly show up in culture-specific guidelines. In addition, some usability rules that are very important in other contexts may be less important in China and Asia. However, simply because these general statements may be true does not mean that we should say “Chinese/Asian usability is different from that in other places”. Rather, the onus is on Asian internet professionals to demonstrate what aspects of web usage may be culturally specific. On the basis of this research, we argue that unless such evidence exists, web designers should assume that Western usability standards will hold in Asia.
Despite these commonalities, however, a number of issues of web design related specifically to the Chinese context became apparent during the study. First, among twelve Hong Kong online bookstores available in a Yahoo search page, nearly all did not support combined search and search operators (like +, &, the Chinese word ‘and’/ ‘or’). Combined search allows the user to narrow down the search scope and pinpoint the information needed. Such search tools are popular in English language sites, but appear not to be commonly used in Chinese sites.
Second, among twelve online bookstores initially considered for the study, almost none provided mechanisms for Chinese input. Since Chinese typing is more complicated than English typing, it is not unusual for users to be unable to type Chinese words. Even if the users are skilled with Chinese word processing, the computer they are using may not have the necessary software to support it. Therefore, if web sites do not provide a way for these users to enter Chinese characters into text boxes, the search function will be unusable.
4.2 Implications for web design in general
At the beginning of this paper we discussed differences in the views of usability between Jakob Nielsen and Jared Spool. Having noted Spool’s criticism of usability guidelines, and, in particular, their lack of validation, this study can be taken as some rectification of that problem. Although we did not test the validity of individual guidelines (that is, which ones are important and which ones are essentially useless), we did find a general relationship between use of the guidelines and usability. Thus, we can conclude that although designers should rightly remain skeptical about usability guidelines, the set we used at least predicted both usability and subjective perceptions of a site. While there is no substitute for usability testing (“usability as process”), adherence to valid usability guidelines can be expected to improve usability of the web site in both Western and Eastern cultures.
Despite our apparent validation of the Nielsen guidelines in this study, however, we agree with Spool that usability cannot be guaranteed without a specific evaluation being carried out. Although guidelines such as those tested here may signal a general measure of usability, it is likely that many important issues with each site were not captured by the tests we performed. Indeed, it would be difficult for us to make suggestions about exactly how each site should improve its usability on the basis of the results shown here (though we will make some comments in the next section). To get more specific information, qualitative testing should be a crucial component of any development cycle. Thus, we would recommend the use of properly-validated guidelines as one tool of many used to enhance usability, rather than using guidelines as a simple validation tool of the finished product.
One interesting aspect of the results was the close relationship between usability and preference judgments. In addition to shortening the time required for users to search for information, guideline adherence also increased users’ positive impressions of the web site and the web site provider. In this information era, a homepage or company’s web site may be the first contact point between the company and its customers. Thus, web usability is important as the users’ first impression towards the web site forms a lasting effect on the mind or feeling of the users toward the company.
4.3 Identification of specific problems with the web sites
Generally, the web site designs we tested were far from perfect. This is partly due to the fact that web design, particularly in Hong Kong, has just started to grow. On the basis of these results, we encourage web designers to follow experimentally-validated usability guidelines when constructing their sites. Three specific types of guidelines verified in this study are, first, to provide sufficient information and detailed descriptions about products; second, to provide customer service information, for example, information about delivery time and postage fees; and third, to provide clear and systematic structures for classification, navigation system and search.
When the participants were performing the experiment, most of them encountered problems in:
- i) Search– participants were astonished when they typed in the exact words of a book’s title, but search results failed to reveal the corresponding answer. Most of the participants simply believed that there was no such product in the bookstore instead of trying to find the product in other ways. This result suggests that web sites need to check the mechanism of their search database very carefully and avoid such mistakes that could cause major losses in sales.
- ii) Membership registration – most participants stopped searching when they were requested to register as a member. This finding supports many reports that membership registration is an obstacle for users. Therefore, web sites should minimize registration processes.
- iii) Unclear information – one of the main reasons for lengthening of task completion time is that participants were often not able to find the information they needed. It may be due to the fact that some useful information was written with a very small font, or that it was located in places difficult to recognize or nested within less important information. Thus, we suggest that sites should highlight or emphasize key or important points and place them into an easily recognized location so that users will not overlook them.
4.4 Summary
Whether one considers usability an attribute or a process, two conclusions can be drawn from this study. First, guidelines can be sensitive to usability differences between web sites. Using such guidelines within the design process can be expected to generally improve usability. Second, usability in Asia (specifically, Hong Kong) seems to follow very similar rules to usability in the United States or Europe. As noted, we expect to find some differences in East-West usability, but such differences are remarkably difficult to verify, and may turn out to be more apparent than real. Thus, designers of Asian sites should assume that usability rules from the West will transfer to their sites unless they have a specific reason to think otherwise.
5. References:
Bernard, M. (2002). Examining user expectations for the location of common e-commerce web objects. Usability News, 4.1. Downloadable from http://psychology.wichita.edu/surl/usabilitynews/41/web_object-ecom.htm
Choong, Y.Y., & Salvendy, G. (1999). Implications for design of computer interfaces for Chinese users in Mainland China. International Journal of Human Computer Interaction, 11, 29 – 46.
Hudson, W. (2001). How many users does it take to change a web site. SIGCHI Bulletin, May/June 2001, 6.
Kukulska, H.A. (2000). Communication with users: Insights from second language acquisition. Interacting with Computers, 12, 587-599.
Nielsen, J. (2000). End of web design: Alertbox for July 23, 2000. Available at www.useit.com/alertbox/20000723.html
Nielsen, J., and Landauer, T. K. (1993). A mathematical model of the finding of usability problems. Proc. ACM INTERCHI’93 Conf. (Amsterdam, the Netherlands, 24-29 April), 206-213.
Nielsen, J., Molich, R., Snyder, C., & Farrell, S. (2000). E-commerce user experience. Nielsen Norman Group: Fremont.
Roberson, D., Davies, I, & Davidoff, J. (2000). Color categories are not universal: Replications and new evidence from a stone-age culture. Journal of Experimental Psychology: General, 129, 369-398.
Spool, J. (2002). E??volution trumps usability guidelines??. UIE-tips newsletter, September 9, 2002. Downloadable from http://www.uie.com/Articles/evolution_trumps_usability.htm
Spool, J., & Schroeder, W. (2001). Testing web sites: Five users is nowhere near enough. Proc. CHI 2001, Extended Abstracts, ACM 285-286.
William Hayward received a BA and MA from the University of Canterbury in New Zealand, and a Ph.D. in Psychology from Yale University in the USA. After teaching at the University of Wollongong in Australia, he has been Assistant Professor in the Psychology Department at the Chinese University of Hong Kong since 1999. Prof. Hayward’s research interests include the psychology of user interface interaction, as well as perception, cognition,and attention.
Comments made
Possible Related Articles:




Latest Comments