Wednesday, December 12, 2007

Where Do Tests Come From?

I'm writing this post in followup to my previous post "How This Psychiatrist Thinks About Psychological Tests". In that post I wrote about the different types of psychological tests and why psychiatrists and psychologists use them. In this post I'm going to talk about how psychological tests get invented.

It's always something I thought would be a great gig to have: invent a psychological test, get a copyright, make sure it's good for something, then set up nationwide seminars to train and certify people to use it and sell the test to them. Talk about a self-made entrepreneur!

But there's a reason why everybody isn't doing this. It's because inventing a test---I mean one that is actually meaningful and useful---is actually quite hard to do. Drug companies spend loads of money inventing new drugs only to have them go down in flames during the clinical trials; the same thing happens with psychological tests.

To illustrate the process, let's imagine that we are going to invent a test that would be useful to the blog. We want a tool that will measure the degree to which a post (or blogger, or podcast guest) will entertain a reader or listener. Let's call it the Shrink Rap Silliness Inventory (SRSI).

The first thing you do is scour the literature looking for existing tests that are supposed to do what you want. In our case, there is nothing out there already in use that measures silliness. If we found such a test we'd look at the research behind the test to see what we presently know about the silliness measuring business. This literature review might tell us that there are various characteristics that are indicators of silliness: a tendency to wear big floppy shoes, to talk in a funny voice, to be a Monty Python fan, or to be named Roy (sorry Roy, couldn't resist). We'd use this information to put together the items used in the SRSI. The items might be questions that the subject/patient has to answer (eg. "Is your name Roy?") or observations that the test administrator makes (eg. "On a scale of 1 to 7, how big and floppy are this subject's shoes?"). Once you have a series of experimental test items put together, you're ready to start taking your SRSI for a test run (pardon the pun) to see how well it works.

The first thing you have to figure out is whether or not the test actually measures what you want it to measure---this is known as validity. We want the SRSI to measure silliness when it's present and to rule out subjects who aren't silly. In order to do this you have to give your test to groups of people known to be silly and others who aren't, and compare their scores. If SRSI scores are high for known silly folks (say, students at the local clown college or improv group) and low for non-silly folks (maybe your local newscasters) then this suggests your test is valid because it can distinguish between groups. This is analogous to using a medical laboratory test to distinguish between diseased and healthy people. There are other ways of proving test validity, but this is the usual starting point.

The second thing you have to prove is test reliability. In other words, that you can trust the test to measure things stably over time. We want the SRSI to work every time, like a car that will start in cold weather. You check for this by giving the test repeatedly to the same person or group of people over time and comparing their scores. Since we know silliness is always consistent, we want SRSI scores to be stable too---this is known as test-retest reliability. We also want lots of people to be able to use the test and have it work well for all of them. So we give the SRSI to a lot of people and have them each rate the same subject. If the SRSI scores all turn out the same we know our test has good inter-rater reliability.

Finally, you want to know how likely it is that your test score is going to be wrong. There are two ways a test score could be wrong: if a silly person gets a low SRSI score that would be an error known as a false negative test; if a non-silly person gets a high score that's a false positive test. We would have to look at our test data and figure out the percentage of times the SRSI gets a wrong score, either false positive or false negative.

This is just a portion of the research that has to go into inventing a good psychological or medical test. If we manage to jump through all these hurdles then you'd go on to do research to see if the test actually gives us useful information----if podcast guests with high SRSI scores give us better iTunes ratings and downloads, or higher visitor counts on days when they guest blog. We could even have SRSI scores for each of us Shrink Rappers! But I guess that comes back to my original issue with psychological tests---I don't need a test to tell me that Roy would be silliest.


(Alright, you have to admit that inkblot looks like a pelvis. I can't be the only one seeing body parts here.)