BASELINE2

Frequently Asked Questions about User Validation:
User Validation Methods and Tools

 

Return to BASELINE Home Page

Return to FAQ Overview Page


  1. What evaluation tools is one recommended to acquire first ?( a1)

  2. Can the same user validation methods be applied to groups which differ radically in experience and capability ? (a2)

  3. What is the difference between qualitative and quantitative evaluation methods ? (a3)

  4. What is a 'dimensionless' metric ? (a4)

  5. What is the difference between a dimensionless and a parametrised ('non-dimensionless') metric ? (a5)

  6. How can one find a usability testing laboratory in one's own country ? (a6)

  7. Is it true that the best information for redesign comes from users complaints, suggestions and post trial discussions which provide qualitative data, or should the focus be on collecting quantitative data ? (a7)

  8. Do users prefer one single user interface to fulfill simple and complex tasks, or do they prefer different interfaces, which are suitable to these tasks ? (a8)

  9. Which other than ISO criteria are important to measure ? (a9)

  10. What is a sufficiently detailed description of quality dimensions ?( a10)

  11. How many scenarios are sufficient for user testing ? (a11)

  12. Should scenarios be designed such that different user groups (e.g. technology user and end user) have to work through a scenario or is it better to design specific scenarios for specific user groups ?(a12)

  13. Is it important to measure cognitive workload ? (a13)

  14. Are there specific usability-methods which can reveal emotional behaviour by users (e.g. opinions about colour, form, childishness)? (a14)


  1. What evaluation tools is one recommended to acquire first ?

    None, as long as you do not know "what for". Do not buy tools "in stock". Instead, develop your user validation plan first. Then you will be in a position to select the most appropriate evaluation tools for your validation project. Check if you can immediately apply the selected evaluation tools, or if training is needed. Choose to either buy and use the evaluation tools yourself or to ask external experts for help. If an evaluation tool is expensive and requires much learning effort before it can be used, and if you plan to use this tool only once, then it will be advantageous from an economical point of view not to acquire the tool but to look for an expert who has the tool and the expertise. For example, a SME would not set up a usability lab if the plan is to use it only 20 days per year.



  2. Can the same user validation methods be applied to groups which differ radically in experience and capability ?

    Most validated tools have a good, broad level of applicability. Search for a standard method and tool first. Only develop specialised techniques if the available methods and tools are not applicable or the information needed is very specific to the application being evaluated.



  3. What is the difference between qualitative and quantitative evaluation methods ?

    A qualitative method is one that yields data in terms of words, or pictures, but does not lend itself to yielding data in terms of numbers. For instance, video clips of frequently occurring user errors are a classic qualitative method used in HCI. Interview data yield another.

    A quantitative method is one which yields numbers. Some quantitative metrics are easy to interpret (for instance pounds per square inch, or kilometers per hour) but others need careful consideration (for instance rate of change of heart rate or number of errors per user hour).

    And of course some quantitative measures are simply meaningless (for instance, number of checklist items ticked or number of guidelines adhered to). It is a poor defence for a pilot to claim that he satisfied 95% of his checklist items but nevertheless attempted to land with his undercarriage up, or for a screen designer to say that 97% of all guidelines were adhered to, except the one which stipulated that text should not be smaller than 6 points!



  4. What is a 'dimensionless' metric ?

    A dimensionless metric is one whose scale is independent of the object being measured to such an extent that two or more objects can be compared meaningfully on the same metric. For instance, the speed at which a user can carry out a certain task is highly dependent on the task characteristics as well as the user's experience level and the usability of the software being studied. However, if we take the ratio of the end user's speed to that of an expert performing the same task, we have managed to control for the task characteristics - we are simply reporting difference between two levels of performance. This Relative User Efficiency metric can be applied to tasks on a booking reservation system, for instance, and compared with quite different tasks carried out with the aid of a word processor, if such a comparison is meaningful in the context of an evaluation.

    Another dimensionless metric is the average of a set of responses by a user to a set of standardised questions about the usability of a system. Such questionnaires can be applied to a wide range of computer systems and enable us to discuss differences in usability between user groups, types of task, or even different computer systems.

    Dimensionless metrics are important because they enable us to make increasingly general statements about the usability of computer systems.



  5. What is the difference between a dimensionless and a parametrised ('non-dimensionless') metric?

    Performance measurement provides the opportunity to develop systematic metrics databases, especially if such 'dimensionless' metrics are calculated such as Relative User Efficiency. Its main threat is the cost in time and resources. Performance measurement schemes we have been able to review are clearly well informed, although they have / will obviously cost a lot in terms of resources.

    In general, it is more advantageous to use 'dimensionless' metrics in which accidental information such as central tendency, and spread have been eliminated. For instance, the Relative User Efficiency metric (RUE) is a dimensionless metric which is computed by:

             Expert User Time on Task
    RUE = ----------------------------- × 100
          Average End User Time on Task

    Although the spread of the user scores is not controlled for, RUE controls for the central tendency of the data by comparing each user with an expert user performance. We would therefore expect statements such as '60% of expert user performance' without referring to actual values of performance obtained. This allows one to compare relative user performance on different kinds of tasks, maybe even different kinds of systems.



  6. How can one find a usability testing laboratory in one's own country ?

    The INUSE project issued a list of usability laboratories in Europe. To our knowledge, this list has not been updated. One of the objectives of the European Usability Professionals Association (EUPA) is to co-ordinate independent usability practitioners around Europe; however, speed of uptake has been slow for EUPA so far. We at BASELINE have developed an extensive set of contacts around Europe in the usability engineering profession, and we may be able to help on receiving a private enquiry to the help desk.



  7. Is it true that the best information for redesign comes from users complaints, suggestions and post trial discussions which provide qualitative data, or should the focus be on collecting quantitative data ?

    The best information for redesign comes from user validation activities which measured "user problems and design deficiencies". Collecting objective, quantitative data (e.g. number of errors in a defined time interval, time for error recovery) and subjective, qualitative data (e.g. type of error, user comments about the error) according to a plan is the most reliable strategy for getting information about re-design. However, collecting user complaints, suggestions from users, users opinions about the interface is additional information which will provide valuable feedback for redesigning and improving the interface.



  8. How can one find a usability testing laboratory in one's own country ?

    The INUSE project issued a list of usability laboratories in Europe. To our knowledge, this list has not been updated. One of the objectives of the European Usability Professionals Association (EUPA) is to co-ordinate independent usability practitioners around Europe; however, speed of uptake has been slow for EUPA so far. We at BASELINE have developed an extensive set of contacts around Europe in the usability engineering profession, and we may be able to help on receiving a private enquiry to the help desk.



  9. Is it true that the best information for redesign comes from users complaints, suggestions and post trial discussions which provide qualitative data, or should the focus be on collecting quantitative data ?

    The best information for redesign comes from user validation activities which measured "user problems and design deficiencies". Collecting objective, quantitative data (e.g. number of errors in a defined time interval, time for error recovery) and subjective, qualitative data (e.g. type of error, user comments about the error) according to a plan is the most reliable strategy for getting information about re-design. However, collecting user complaints, suggestions from users, users opinions about the interface is additional information which will provide valuable feedback for redesigning and improving the interface.



  10. Do users prefer one single user interface to fulfill simple and complex tasks, or do they prefer different interfaces, which are suitable to these tasks ?

    Ask your prospective users. Only the users can give you the answer.



  11. Which other than ISO criteria are important to measure ?

    For innovative products and services subjective quality factors such as appeal, attractiveness, and aesthetics are important. For consumer products and especially entertainment products enjoyability and thrillingness are further relevant issues.



  12. What is a sufficiently detailed description of quality dimensions ?

    A practical description of quality dimensions is reached when the quality aspects described can be easily mapped to measures.



  13. How many scenarios are sufficient for user testing ?

    That depends to a large extent on the type of applications, on the complexity of tasks which will be performed with the application, and on what you want to measure.



  14. Should scenarios be designed such that different user groups (e.g. technology user and end user) have to work through a scenario or is it better to design specific scenarios for specific user groups ?

    It is by far more efficient to design specific scenarios for specific user groups. A scenario should focus only on those tasks, task sequences and work flow which are relevant for a specific user group.



  15. Is it important to measure cognitive workload ?

    Yes, because the information processing resources of a user are limited. These resources are allocated to task performance and to the use of applications needed for task performance. If new applications are difficult to use, the user must allocate to many information processing resources to using the application and will have too little resources left for performing the real work tasks. The user has to cope with additional cognitive workload imposed on him by the application.

    Cognitive workload is an important quality dimension for business applications where the primary task of a user is to perform his work tasks in an efficient way. The secondary task, to perform work tasks with an innovative application, should not disturbe the user too much.

    Cognitive workload measurements are of importance for the development of safety critical applications or when products or services are created for use in safety critical environments.



  16. Are there specific usability methods which can reveal emotional behaviour by users (e.g. opinions about colour, form, childishness)?

    Procedures for measuring these asepcts exist. However they are not publicly documented and consume quite some time to perform. For more information contact ACit.


Return to BASELINE Home Page
Go to next topic in sequence: Interviews
Go to top of this page