Wednesday, April 23, 2008

Humpty Dumpty on statistical sampling

One of my statistics lecturers used to annoy by oft asserting that When people say random they do not mean random. They mean haphazard.

Oh no, they do not, I used to hiss. Silently

These days some of the confusion is removed by the adoption of the term probability, rather than random sampling to describe the method of selecting the people you are going to measure, study or question

It is disconcerting for someone with this kind of training to find that other scientific disciplines do not have this type of sample selection as an article of faith. Oh, we cannot use a random sample because we are only interested in diabetes

It is particularly difficult to understand how the medical profession can so loftily dismiss the need for probability samples, even on straightforwardly practical grounds. We cannot go around taking a MORI sample of peoples sperm counts

I have just read what I see as a very important paper: Design priorities & disciplinary perspectives: the case of the US National Childrens Study

The authors [Michael & O'Murcheatalgh] carefully & lucidly & sympathetically analyse the reasons why mature scientific disciplines develop different research tools.

Surprise, surprise, the main reason is the nature of the research question that is asked. But close behind come assumptions about the nature & extent of existing knowledge

Since the paper is published by the Royal Statistical Society it is perhaps not a surprise to find that in the end the US decided to go with a probability sample for its child development study

Nigel Hawkes wrote a full page article in The Times not all that long ago which covered some of this ground, in an attempt to explain to readers how science can come up with seemingly contradictory findings along the lines of coffee is good for you – no it is bad for you. If I remember correctly he used the word epistemology to cover some of the grounds for dispute. But even he did not, again if I remember correctly, give weight to the idea of using probability sampling in these areas

The media generally have got hold of the idea about the need for proper sample selection, because of their experience with opinion polls I guess. Which is a bit ironic, since opinion pollsters usually use quota samples

It is doubly ironic because the media generally prefer the term scientific sample when scientists do not usually use probability samples (if they are studying people)

The media have also got hold of the idea that size matters in a survey - though they seem a bit hazy about exactly how or how much. They do love a survey however - well they are fun, add a bit of lightness to a 3-hour news programme

Even when it is a totally non-scientific, off somebodys website or done by a PR company for a client with a deodorant to sell. And they will say Although it is not scientific they did speak to 8,000 people

Which leaves them in a bit of a pickle when they have to say Now this one is serious, it comes from National Statistics