HFRG

Questionnaires in Usability Engineering

A List of Frequently Asked Questions (3rd Ed.)

Compiled by: Jurek Kirakowski,
Human Factors Research Group, Cork, Ireland.
This edition: 2nd June, 2000.

Line
Resources Page

Over the years, I have seen many questions asked about the use of questionnaires in usability engineering. The list on this page is a compilation of the questions I have heard most often and the answers I gave, should have given, or would have given if I had thought of it first.

A number of folk have given me feedback on this document, and they are gratefully acknowledged below.

There is a mailto: box at the bottom of the page: I will be delighted to receive more comments, questions, or corrections.


Line

Index of questions on this page(use the 'back' button on your browser to get back to this index)

Any comments? How are we doing, so far?

Acknowledgements.

Line

What is a questionnaire?

A questionnaire is a method for the elicitation, and recording, and collectingof information. The four italicised words in this definition summarise the essence of what questionnaires are about. I can give a 50-minute lecture explaining this definition with examples and anecdotes, but the notes below summarise the gist of it.
  • Method: This means that a questionnaire is a tool to be used rather than an end in itself or a work of modern art. Before you start even thinking of using a questionnaire, a useful question to ask yourself is: 'what do I need to know and how best can I find this out?' Some kinds of information are not very reliably gathered using questionnaires (eg how often people do things, or self-reports about aspects of life where status is involved.) And it is also very useful at the start to ask yourself 'how will I summarise the information I am seeking to give me a true picture of what I want to know?'
  • Elicitation: A questionnaire may bring out information from the respondent or it may start the respondent thinking or even doing some work on their own in order to supply the requested information. In any case, a questionnaire is a device that starts off a process of discovery in the respondent's mind.
  • Recording: The answers the respondent makes are somehow recorded onto a permanent medium which can be re-played and brought into analysis. Usually by writing, but also possibly by recording voice or video.
  • Collecting: People who use questionnaires are collectors. Given the amount of effort involved in creating a questionnaire, if you only ever needed to use it for one respondent, chances are you'd find some more efficient method of getting the information. However, unless you intend to leave piles of questionnaire mouldering in your filing cabinet, you must also consider what you are going to do with the information you have amassed. Which brings one neatly back to the first point that a questionnaire is a method.

Questionnaires are made up of items to which the user supplies answers or reactions.

Answering a questionnaire focuses the respondent's mind to a particular topic and almost by definition, to a certain way of approaching the topic. We try hard to avoid bias when we construct questionnaires; when a respondent has to react to very tightly focussed questions (so-called closed-ended questionnaires) bias is a real problem. When a respondent has to react to a more loose set of questions (so-called open-ended), bias is still there, but it's most probably more deeply hidden.

Are there different kinds of questions?

There are three basic types of questions:

Factual-type questionnaires

Such questions ask about public, observable information that it would be tedious or inconvenient to get any other way. For instance, number of years that a respondent has been working with computers, or what kind of education did the respondent get. Or, how many times did the computer break down in a two-hour session, or how quickly did a user complete a certain task. If you are going to include such questions you should spend time and effort to ensure that the information you are collecting is accurate, or at least to determine the amount of bias in the answers you are getting.

Opinion-type questions

These ask the respondent what they think about something or someone. There's no right or wrong answer, all we have to do is give the strength of our feeling: do we like it or not, or which do we prefer? Will we vote for Mr A or Mr B? An opinion survey does not concern itself with subtleties of thought in the respondent, it is concerned with finding out how popular someone or something is. Opinion questions direct the thought of the respondent outwards, towards people or artefacts in the world out there. Responses to opinion questions can be checked against actual behaviour of people, usually, in retrospect ('Wow! It turned out that those soft, flexible keyboards were a lot less popular than we imagined they would be!')

Attitude questions

Attitude questions focus the respondent's attention to inside themselves, to their internal response to events and situations in their lives. There are a lot of questionnaires consisting of attitude questions about experiences with Information Technology, the Internet, Multi-media and so on. These tend to be of interest to the student of social science. Of more use to the HCI practitioner are questionnaires that ask the respondent what their attitudes are to working with a particular product the respondents have had some experience of. These are generally called satisfaction questionnaires.

In our research, we have found that user's attitudes to working with a particular computer system can be divided up into attitudes concerning:

  • The user's feeling of being efficient
  • The degree to which the user likes the system
  • How helpful the user feels the system is
  • To what extent the user feels in control of the interactions
  • Does the user feel they can learn more about the system by using it.

We can't directly cross-check attitude results against behaviours in the way we can with factual and opinion type questions. However, we can check whether attitude results are internally consistent and this is an important consideration when developing attitude questionnaires.

What are the advantages of using questionnaires in usability research?

  • The biggest single advantage is that a usability questionnaire gives you feedback from the point of view of the user. If the questionnaire is reliable, and you have used it according to the instructions, then this feedback is a trustworthy sample of what you (will) get from your whole user population.
  • Another big advantage is that measures gained from a questionnaire are to a large extent, independent of the system, users, or tasks to which the questionnaire was applied. You could therefore compare
    • the perceived usability of a word processor with an electronic mailing system,
    • the ease of use of a database as seen by a novice and an expert user,
    • the ease with which you can do graphs and statistical computations on a spreadsheet.
  • Additional advantages are that questionnaires are usually quick and therefore cost effective to administer and to score and that you can gather a lot of data using questionnaires as surveys. And of course, questionnaire data can be used as a reliable basis for comparison or for demonstrating that quantitative targets in usability have been met.

What are the disadvantages?

  • The biggest single disadvantage is that a questionnaire tells you only the user's reaction as the user perceives the situation. Thus some kinds of questions, for instance, to do with time measurement or frequency of event occurrence, are not usually reliably answered in questionnaires. On the whole it is useful to distinguish between subjective measures (which is what questionnaires are good for) and performance measures (which are publicly-observable facts and are more reliably gathered using direct event and time recording techniques).
  • There is an additional smaller disadvantage. A questionnaire is usually designed to fit a number of different situations (because of the costs involved). Thus a questionnaire cannot tell you in detail what is going right or wrong with the application you are testing. But a well-designed questionnaire can get your near to the issues, and an open-ended questionnaire can be designed to deliver specific information if properly worded.
  • Those who have worked with questionnaires for a long time in industry will also be aware of the seductive power of the printed number. Getting hard, quantitative data about user attitudes or opinions is good, but this is not the whole story. If the aim of the investigation is to analyse the overall usability of a piece of software, then the subjective data must be enhanced with performance, mental effort, and effectiveness data. In addition, one should also ask, why? This means talking to the users and observing them.

How do questionnaires fit in with other HCI evaluation methods?

The ISO 9241 standard, part 12, defines usability in terms of effectiveness, efficiency, and satisfaction. If you are going to do a usability laboratory type of study, then you will most probably be recording user behaviour on a video or at least timing and counting events such as errors. This is known as performanceor efficiency analysis.

You will also most probably be assessing the quality of the outputs that the end user generates with the aid of the system you are evaluating. Although this is harder to do, and more subjective, this is known as effectiveness analysis.

But these two together don't add up to a complete picture of usability. You want to know what the user feels about the way they interacted with the software. In many situations, this may be the single most important item arising from an evaluation! Enter the user satisfaction questionnaire.

It is important to remember that these three items (effectiveness, efficiency, and satisfaction) don't always give the same answers: a system may be effective and efficient to use, but users may hate it. Or the other way round.

Questionnaires of a factual variety are also used very frequently in evaluation work to keep track of data about users such as their age, experience, and what their expectations are about the system that will be evaluated.

What is meant by reliability?

The reliability of a questionnaire is the ability of the questionnaire to give the same results when filled out by like-minded people in similar circumstances. Reliability is usually expressed on a numerical scale from zero (very unreliable) to one (extremely reliable.)

What is meant by validity?

The validity of a questionnaire is the degree to which the questionnaire is actually measuring or collecting data about what you think it should be measuring or collecting data about. Note that not only do opinion surveys have validity issues; factual questionnaires may have very serious validity issues if for instance, respondents interpret the questions in different ways.

Should I develop my own questionnaire?

If you have a lot of time, patience, and resources, then go right ahead. You are well advised to do a course in psychological measurement, including a heavy dose of statistics beforehand, and to gain experience with administering and interpreting questionnaires that have already been devised, for purposes outside usability engineering as well as for purposes within. You should ensure that your questionnaire has adequate reliability and validity and that you have an idea of what the expected values are. If this list of qualifications sounds ominous to you, then take the sensible option: use a questionnaire that has already been developed and standardised by someone else.

What's wrong with putting a quick-and-dirty questionnaire together?

The problem with a quick-and-dirty questionnaire is that you usually have no notion of how reliable or valid the questionnaire is. You may be lucky and have developed a very good questionnaire you may be unlucky. However, until you put your questionnaire through the intensive statistical and methodological procedure involved in creating a questionnaire, you just won't know.

A poor questionnaire will be insensitive to differences between versions of software, releases, etc. and will not show significant differences. You are then left in a quandary: does the questionnaire fail to show differences because they do not actually exist, or is it simply because your questionnaire is insensitive and unreliable? If your questionnaire does show differences, is this because it is biased, or is it because one version is actually better?

The crux of the matter is: you can't tell unless the questionnaire has been through the standard development and test process.

Factual-type questionnaires are easy to do, though, aren't they?

A factual, or 'survey' questionnaire is one that asks for relatively straightforward information and does not need personal interpretation to answer. Answers to factual questions can be proven right or wrong. An opinion based questionnaire is one that asks the respondent what they think of something. An answer to an opinion question cannot be proven right or wrong: it is simply the opinion of the respondent and is inaccessible to independent verification.

Although it is important to check that the respondents understand the questions of both kinds of questionnaires clearly, the burden of checking is much greater with opinion style questionnaires because we cannot sanity check the answers against reality.

What's the difference between a questionnaire which gives you numbers and one that gives you free text comments?

A closed-ended questionnaire is one that leaves no room for individual comments from the respondent. The respondent replies to a set of questions in terms of pre-set responses for each question. These responses can then be coded as numbers. An open-ended questionnaire requests the respondent to reply to the questions in their own words, maybe even to suggest topics to which replies may be given. The ultimate open-ended questionnaire is a 'critical incident' type of questionnaire in which respondents explain several good or bad experiences, and the circumstances which led up to them, and what happened after, all in their own words.

  • Closed-ended questionnaires are good if you are going to be processing massive quantities of data, or if your questionnaire is appropriately scaled to yield meaningful numeric data. If you are using a closed-ended questionnaire, however, encourage the respondents to leave their comments either in a special space provided on the page, or in the margins. You'll be surprised what this gives you.
  • Open ended questionnaires are good if you are in an exploratory phase of your research or you are looking for some very specific comments or answers that can't be summarised in a numeric code.

Can you mix factual and opinion questions, closed and open ended questions?

It doesn't do to be too purist about this. It's a good idea to mix some open-ended questions in a closed-ended opinion questionnaire and it's also not a bad thing to have some factual questions at the start of an opinion questionnaire to find out who the respondents are, what they do, and so on. Some of your factual questions may need to be open-ended, for instance if you are asking respondents for the name of the hardware they are using.

This also means you can construct your own questionnaire booklets by putting together a reliable opinion questionnaire, for instance, and then add some factual questions at the front and maybe some open ended opinion questions at the end.

How do you analyse open-ended questionnaires?

The standard method is called 'content analysis' and is a subject all of its own. Content analysis usually lets you boil down responses into categories, and then you can count the frequency of occurrence of different categories of response.

What is a Likert-style questionnaire? One with five response choices to each statement, right?

No indeed not. A Likert-style questionnaire is one in which you have been able to prove that each item of the questionnaire has a similar psychological 'weight' in the respondent's mind, and that each item is making a statement about the same construct. Likert scaling is quite tricky to get right, but when you do have it right, you are able to sum the scores on the individual items to yield a questionnaire score that you can interpret as differentiating between shades of opinion from 'completely against' to 'completely for' the construct you are measuring.

It is possible to find questionnaires which seem to display Likert-style properties in which many of the items are simply re-wordings of other items. Such questionnaires may show some fantastic reliability data, but basically they're a cheat because you're just adding in extra items that bulk up the statistics without telling you anything really new.

And of course there are plenty of questionnaires around which are masquerading as Likert-style questionnaires but which have never had their items tested for any of the required Likert properties. Summing item scores of such questionnaires is just nonsense. Treat such questionnaires as checklists (see below) until you are able to do some psychometric validation on them.

How can I tell if a question belongs to a Likert scale or not?

The essence of a Likert scale is that the scale items, like a shoal of tropical fish, are all of approximately the same size, and are going in the same direction.

People who design Likert scales are concerned about developing a batch of items that all have approximately the same level of importance (size) to the respondent, and are all more or less talking about the same concept (direction), which concept the scale is trying to measure. Designers use various statistical criteria to quantify these two ideas.

To start with, we have to get a bunch of people to fill out the first draft of the questionnaire we are trying to design. We should ideally have about 100 respondents with varied views on the topic we are trying to measure, and certainly, more respondents than questions. We then compute various statistical summaries of this data.

Do the items all have the same level of importance to the respondent? To measure this we look at the reliability coefficient of the questionnaire. If the reliability coefficient is low (near to zero) this means that some of the items may be more important to the respondents than others. If the reliability coefficient is high (near to one) then the items are most probably all of the same psychological 'size.'

Are the items all more or less talking about the same concept? To measure this we look at the statistical correlation between each item and the sum of the rest of the items. This is sometimes called the item-whole correlation. Items which don't correlate well are clearly not part of the scale (going in a different 'direction') and should be thrown out or amended.

It's fascinating to use an interactive statistical package and to watch how reliabilities and item-whole correlations change as you take items in and out of the questionnaire.

A very real risk a developer runs when constructing a scale is that they start to 'model the data.' That is, they take items in and out and they compute their statistics, but their conclusions are only applicable to the sample that evaluated the questionnaire. What the developer must do next is to try the new questionnaire on a fresh sample, and re-compute all the above statistics again. If the statistics hold on the fresh sample, then well and good. If not, then it's back to the drawing board.

Warning: one sometimes sees some very good-looking statistics reported on the basis of analysis of the original sample, without any check on a fresh sample. Take these with a large pinch of salt. The statistics will most probably be a lot less impressive when re-sampled.

In general, in answer to the question: is this a real Likert scale or not, the onus is on the person who created the scale to tell you to what extent the above criteria have been met. If you are not getting this level of re-assurance from the scale designer, then it really is a fishy business. A scale item which may work very nicely in one questionnaire may be totally out of place in another.

How many response options should there be in a numeric questionnaire?

There are two sets of issues here. One is, should we have an odd or even number of response options. The general answer to give here is that, if there is a possibility of having a 'neutral' response to a set of questions, then you should have an odd number of questions with the central point being the neutral place. On the other hand, if it is a question of whether something is good/bad, male/female (bi-polar) then basically, you are looking at two response options. You may wish to assess the strength of the polarity; you are actually asking two questions in one: firstly, is to good or bad, and secondly, is it really very good or very bad. This leads you to an even number of response options.

Some people use even numbers of response options to 'force' the respondents to go one way or another. What happens in practice is that respondents end up giving random responses between the two middle items. Not very useful.

The other set of issues is how wide should the response options be. A scale of 1 to 3, 1 to 5, or even 1 to 12? The usual answer is that it depends on how accurately can the majority of respondents distinguish between flavours of meaning in the questions? If you suspect that the majority of respondents are going to be fairly uninformed about the topic, then stick with a small number of response options. If you are going to be dealing with experts, then you can use a much larger set of response options.

A sure way of telling if you are using too many response options is to listen to the respondents talking after they have done the questionnaire. When people have to differentiate between fine shades of meaning that may be beyond their ability, they will complain that the questionnaire was 'long' and 'hard.'

How many anchors should a questionnaire have?

The little verbal comments above the numbers ('strongly agree', etc.) are what we call anchors. In survey work, where the questions are factual, it is considered a good idea to have anchors above all the response options, and this will give you accurate results. In opinion or attitude work, you are asking a respondent to express their position on a scale of feeling from strong agreement to strong disagreement, for instance. Although it would be helpful to indicate the central (neutral) point if it is meaningful to do so, having numerous anchors may not be so important. Indeed, some questionnaires on attitudes have been proposed with a continuous line and two end anchors for each statement. The respondent has to place a mark on the line indicating the amount of agreement or disagreement they wish to express. Such methods are still relatively new.

A related question is, should I include a 'no answer' option for each item. This depends on what kind of questionnaire you are developing. A factual style questionnaire should most probably not have a 'no answer' option unless issues of privacy are involved. If in an opinion questionnaire, many of your respondents complain about items 'not being applicable' to the situation, you should consider carefully whether these items should be changed or re-worded.

In general, I tend to distrust 'not applicable' boxes in questionnaires. If the item is really not applicable, it shouldn't be there in the first place. If it is applicable, then you are simply cutting down on the amount of data you are going to get. But this is a personal opinion.

My respondents are continually complaining about my questionnaire items. What can I do?

People always complain. It's a fact of life. And everybody thinks of themselves as a 'questionnaire expert.' If you get the odd grumble from your respondents, this usually means that the person doing the grumble has something extra they want to tell you, beyond the questionnaire. So listen to them.

If you get a lot of grumbles, this may mean that you have badly miscalculated and it's time to go back to the drawing board. When you listen to people complaining about a questionnaire, listen carefully: are they unhappy about what the questionnaire is attempting to measure, or are they unhappy about the wordings of some of your items?

What other kinds of questionnaires are there?

You mean, what other kinds of techniques can you employ to construct a questionnaire? There are two main other varieties:
  1. Semantic differential type questionnaires in which the user is asked to say where their opinion lies between two anchor points which have been shown to represent some kind of polar opposition in the respondent's mind
  2. Guttman scaling type questionnaires which are a collection of statements which gradually get more extreme, and you calculate at what statement the respondent begins to answer negatively rather than positively.

Of the two, semantic differential scales are more frequently encountered in practice, although they are not used as much as Likert scales, and professionals seem to have relegated Thurstone and Guttman scaling techniques into the research area.

Should favourable responses always be be checked on the left (or right) hand side of the scale?

Usually no. The reason for not constructing a questionnaire in this manner is because response bias can come into play. A respondent can simply check off all the 'agrees' without having to consider each statement carefully, so you have no guarantee that they've actually responded to your statements -- they could be working on 'auto-pilot'. Of course, such questionnaires will also produce fairly impressive statistical reliabilities, but again, that could be a cheat.

Is a long questionnaire better than a short one? How short can a questionnaire be?

You have to ensure that you have enough statements which cover the most common shades of opinion about the construct being rated. But this has to be balanced against the need for conciseness: you can produce a long questionnaire that has fantastic reliabilities and validities when tested under controlled conditions with well-motivated respondents, but ordinary respondents may just switch off and respond at random after a while. In general, because of statistical artefacts, long questionnaires will tend to produce good reliabilities with well-motivated respondents, and shorter questionnaires will produce less impressive reliabilities but short questionnaires may be a better test of overall opinion in practice.

A questionnaire should not be judged by its statistical reliability alone. Because of the nature of statistics, especially the so-called law of large numbers, we will find that what was only a trend with a small sample becomes statistically significant with a large sample. Statistical 'significance' is a technical term with a precise mathematical meaning. Significance in the everyday sense of the word is a much broader concept.

So high statistical reliability is not the 'gold standard' to aim for?

If a short (say 8 - 10 items) questionnaire exhibits high reliabilities (above 0.85, as a rule of thumb) then you should look at the items carefully and examine them for spurious repetitions. Longer questionnaires (12 - 20 items) if well constructed should yield reliability values of 0.70 or more.

I stress these are rules of thumb: there is nothing absolute about them.

What's the minimum and maximum figure for reliability?

Theoretically, the minimum is 0.00 and the maximum is 1.0. Suspect a questionnaire whose reliability falls below 0.50 unless it is very short (3-4 items) and there is a sound reason to adopt it.

The problem with questionnaires of low reliability is that you simply don't know whether they are telling you the truth about what you are trying to measure or not. It's the lack of assurance that's the problem.

Can you tell if a respondent is lying?

The polite way of saying this, is, can you tell if the respondent is giving you 'socially desirable' answers. You can, but the development of a social desirability scale within your questionnaire (so-called 'lie scale') is a topic all of its own. 'Lie scales' work on the principle that if someone is trying to make themselves look good, they will also strongly agree to an inordinate number of statements that ask about impossible behaviours, such as
  • 'I have never been late for an appointment in my life.'
  • 'I always tell the truth no matter what the cost.'
Now, some respondents may strongly agree with some of these items but they'd have to be a saint to be able to honestly agree to all of them.

'Lie scales' generally bulk up a questionnaire and are generally not used in HCI. If you are really concerned with your respondents giving you socially desirable answers, you could always put a social desirability questionnaire into the test booklet and look hard at those respondents who give you high scores on social desirability.

Why do some questionnaires have sub-scales?

Suppose that the overall construct you are getting the respondents to rate is complex: there are different components to the construct. Thus for instance, overall user satisfaction is a complex construct that can be broken down into a number of separate components, like 'attractiveness', 'helpfulness', 'feelings of efficiency' and so on. If you can identify these components, it makes sense to create a number of sub-scales in your questionnaire, each of which is a 'mini questionnaire' in its own right, measuring one component, but which also contributes to the overall construct.

How do you go about identifying component sub-scales?

The soundest way of doing this is to carry out a statistical procedure called 'factor analysis' on a large set of questions, to find out how many underlying (latent) factors the respondents are operating with but often, received opinion or expert analysis of the overall construct may be used instead. The crucial questions are:
  1. Are these factors truly independent? That is, if they are, we would expect items that make up the factors to be more highly correlated with each other than with items from other factor scales.
  2. What use can the analyst make of the different factors? Extracting a bunch of factors that actually contributes little to our understanding of what is going on is pseudo-science. On the other hand, separating factors which are fairly highly inter-correlated but which make sense to separate out practically makes for a more usable questionnaire. For instance, 'screen layout' and 'menu structure' are two factors which may be fairly strongly inter-correlated in a statistical sense but separately they may give the analyst useful information about these two aspects of an interface.

How much can I change wordings by in a standardised opinion questionnaire?

In general, if a questionnaire has been through the standardisation process the danger in changing, deleting, or adding items is that you undo the statistical basis for the questionnaire: you set yourself back by unknown amounts. You are generally advised not to do this unless you have all the background statistical data and have access to user samples on which you can re-validate your amended version.

There is one general exception. If statements in the questionnaire refer to something like 'this system' or 'this software' you can usually change these words to refer explicitly to the system you are evaluating without making too much damage to the questionnaire. For instance:

  • (1) 'Using this system gives me a headache.'
  • (2) 'Using Word-Mate gives me a headache'.
Changing (1) to (2) is called 'focussing' the questionnaire and is usually no problem.

You may be able to do a more radical change of focus, without affecting the statistical properties too much if for instance you were to change all occurrences of (3) to (4):

  • (3) 'using this system...'
  • (4) 'configuring this system...'
...but you should examine the result very carefully to check that you are not introducing shifts of meaning by doing so. If the questionnaire you are intending to change has an associated database of reference values, then changing the focus in this way is most probably not a good idea if you want to still use the database of reference values.

What's the difference between a questionnaire and a checklist?

A checklist is simply a list of statements or features that it may be desirable or undesirable to have. It is not usually a scale in the psychometric sense of the term. A checklist is not amenable to Likert scaling, for instance, so summing the items of a checklist does not make sense. As an example, consider a checklist for landing a plane. You may have 95% of the items checked, but if you haven't checked that the wheels are down, your landing may be a disaster. But if you haven't checked that the passengers have put their safety belts on, the consequences may not be nearly as grave.

Individual items within a checklist may be averaged across users, so you can get a percentage strength of agreement on each item (thus you are trying to establish truth by consensus) but even then, an expert's opinion may outweigh an averaged opinion of a group of less well informed users (a class of 30 children may decide by vote that a hamster is female, for instance, but an expert may have to over-ride that opinion after a detailed inspection).

Where can I find out more about questionnaires?

Please don't take seriously those books which devote a chapter to Likert scaling and then urge you to go out and try doing a questionnaire yourself. These authors are doing everyone a disservice. Here is a minimalist list of reference sources for questionnaire construction that I have found useful as teaching material.

Aiken, Lewis R., 1996, Rating Scales and Checklists. Wiley. ISBN 0-471-12787-6. Good general introduction including discussions of personality and achievement questionnaires.

Czaja, Ronald, and Johnny Blair, 1996, Designing Surveys. Pine Forge Press. ISBN 0-8039-9056-1. A useful resource for factual-style surveys, including material on interviews as well as mail surveys.

DeVellis, Robert F., 1991, Scale Development, Theory and Applications. Sage Publications, Applied Social Research Methods Series vol. 26. ISBN 0-8039-3776-8. Somewhat theoretical, but important information if you want to take questionnaire development seriously.

Ghiselli, Edwin E., John P. Campbell, and Sheldon Zedeck, 1981, Measurement Theory for the Behavioural Sciences. WH Freeman & Co. ISBN 0-7167-1252-0. A useful reference for statistical issues. Considered 'very readable' by some.

Kline, Paul, 1986, A Handbook of Test Construction. Methuen. ISBN 0-416-39430-2. Practically-orientated, with a lot of good, helpful advice for all stages of questionnaire construction and testing. Some people find it tough going but it is a classic.

Stecher, Brian M. and W. Alan Davis, 1987, How to Focus an Evaluation. Sage Publications. ISBN 0-803903127-1. About more than just questionnaires, but it serves to remind the reader that questionnaires are always part of a broader set of concerns when carrying out an evaluation.

Any comments? How are we doing, so far?

Please don't copy this page since I hope it's going to change over time, but you are very welcome to create a link to it from your site. Reciprocal links would be very nice, please mail me if you'd like a reciprocal link from the HFRG site. Excerpts may be made from reasonable portions of this page and included in information material so long as my authorship is acknowledged.

If you have any comments on the FAQ, or want to suggest some extra questions or resources, please contact me: jzk@ucc.ie.

Acknowledgements

As always thanks to Dr Murray Porteous for keeping me straight. Dick Miller, Owen Daly-Jones, Cynthia Toryu, Julianne Chatelain, Anne-Mari Flemming and Carolyn Snyder have all commented and stimulated. Thanks, folks!

Line
Resources PageTop of page