Question

Topic: Research/Metrics

Statistical Validation Of A Sample Size

Posted by Anonymous on 125 Points
I am trying to establish whether the characteristics of a sample of a population is reflection of the population.

Live example. Total population is 10,000 people who carried out a certain action on our website. Of these, a proportion (ranging from 60 to 5,000 depending on what action I am measuring) have given me their demographic information. I want to produce a profile of the people who completed these actions, but this needs to be statistically valid (to a level of confidence, probably 95%).

Is there a simple rule of thumb or equation I can use to determine this. My colleague thinks that as long as it's more than 200 we're OK - is that right?
To continue reading this question and the solution, sign up ... it's free!

RESPONSES

  • Posted by koen.h.pauwels on Member
    Hi,

    There is indeed a statistical formula for sample size; if you want to reach 95% confidence, and you have a 'tolerable error' in mind (eg you want to be 95% confident that the true value is within 1 unit of the sample mean of the action):

    sample size = (1,96)^2* variance / error^2

    The variance is your best guess of what the variance of the action is in the population (often you just square the standard deviation in your sample)

    The error is the "Tolerable error”, i.e, half the width of the entire desired confidence interval.

    Hope this helps
  • Posted on Author
    Thanks Koen. However I'm not that conversant with stats terms (my college degree feels like a long time ago!) so how do I apply this from a laymans viewpoint?

    What I really want to be able to work out is for 10,000 actions, we need a sample of xxx minimum who have given us their profile information for this profile to be reasonably representative of the population. Not sure how I get to this from your equation?

    (Also, what does ^ mean?!)

  • Posted on Moderator
    Jo, you need one more variable in order to do the calculation Koen has provided: How exact do you need the result to be?

    For example, if the possible options are 1-10, and you want to know if the population average is really 8, would you be satisfied with 7.6, or do you need it to be 7.9, or even 8.0?

    You want to be 95% confident in the accuracy of the finding, but it's easy to be highly confident if you can accept a wide range of acceptable answers. (In the example above, if anything from 3-10 would qualify as being an 8, then you won't need such a large sample.)

    The ^ symbol means that the number that follows is an exponent.

    Your question is a reasonable one, but the answer includes some "it depends."
  • Posted by koen.h.pauwels on Accepted
    thanks for the most excellent translation, Mike! ^ is indeed the exponent, and ^2 means: squared.

    Jo, continuing with Mike's example of the 1-10 scale, assuming a variance of 1, and your definition of 'reasonable' being 0.1 away from the mean answer in the population, we would need a sample size of:

    3.84*1 / 0.01 = 384 respondents

    Indeed it all depends. Interestingly, it does not depend on the size of the population. Intuitively, you do need a large sample size (a) the higher the variance (i.e. the 'spread' of answers in the population, which I typically guesstimate by asking 10-20 people and calculating the variance in their answers) and (b) the more precise your definition of 'reasonable'. For instance, if you can live with your sample mean answer being 0.2 away from the population mean, the formula gives us:

    3.84 * 1 / 0.25 = 96
  • Posted on Accepted
    I think Koen has a typo in the last formula in the previous post. It should be:

    3.84 * 1 / 0.04 = 96

    The 0.04 figure is simply 0.2 squared.

    The number would have been 0.25 if the error were 0.5, because 0.5 squared is 0.25. In that case the sample size would be just 15.36. You'd only need 16 people to give you the 95% confidence you seek.

    I know all of this seems like the keynote speech at a convention of statisticians, but it really does give you the correct answer to your question.

    So how do you know what error rate you are willing to accept? It obviously makes a huge difference in determining the sample size.

Post a Comment