Quote:
Originally Posted by
sound dropouts
The higher the sample size, the closer the mean of the sample should be to the mean of the population distribution (the standard deviation is smaller). The central limit theorem says that the sample mean is distributed normally in regards to the population mean if the sample is large enough, and the chi square distribution describes the distribution of the sample standard deviation.
This is exactly what I said. It isn't just the mean either, it's the entire distribution which takes into account the mean (average), range, and standard deviation. The Chi-square distribution
and Z-score distribution (which is a standardized function) describe the deviation distribution as well
What this means to people reading: If you expect every single "production" movie ever made to follow a normal distribution (which I believe) in that most of the density would fall around average, and very little density would be "perfect" or "terrible bad" then:
If you had chosen 10 movies
at random from the entire pool of production movies (hypothetical)
vs
If you had chosen 1,000 movies
at random from the entire pool of production movies (hypothetical)
Then the distribution of the 1,000 randomly selected movies would more closely resemble the hypothetical distribution. All this simply means in Layman's terms: Have large sample sizes, and include everyone's opinion and random.
Remember Sound Dropouts, that the theory of central limit theorem resides on the basis of random sampling as well. Without random sampling, then it's meaningless.
Quote:
Basically, you are incorrect when you say that the sample more accurately describes the population distribution, it just better describes the mean of the population distribution. Also, the central limit theorem doesn't say anything about accurately describing the mean, that fact follows from the sample mean being an unbiased estimator for the population mean.
When you have a small sample size, then your distribution is less likely to resemble the hypothetical distribution, which increases the likelihood of making a type I/II errors (also creating low power). I see no problem calling
extremely small sample sizes an inaccurate depiction of a the hypothetical distribution. Just how Neighbor's 5.2/10 rating with barely over 1,000 votes is
probably not an accurate depiction of what people probably think of this title, signifying a pretty small effect size. Maybe it is? Maybe it isn't? We really don't know until there's a higher sample. Most likely, the score would increase, especially as it builds cult status. "Nightbreed" would probably do the same, as well.
What all this means for the people listening:
Movies at review sites that have low samples (vote counts) may not be an accurate representation of what they should look like with higher votes. Then, we can move on the topic of whether or not these votes are following random selection principles. Does everyone use one site to cast their vote (no)? What about people that don't have internet access (no)? So no, review sites do not follow random sampling procedures, and they are inaccurate in themselves. However, we can still use the framework to explain why it is that some votes seem rather lower/higher than others, sample sizes, and how much "grain of salt" you should take with all this.