Hypotheses are about populations.
It is helpful to think of the population as the group to whom you want
to generalize your experimental results.
So once we have conducted our experiment, taken our measurements, and calculated our statistics, we ask the question: how reasonable is it that the null hypothesis is true? This judgment is based on probability: What is the probability that the test statistic (like a correlation or an F or a t) associated with our sample would be observed if the null hypothesis is true? If the probability is low, then the null hypothesis is not a very reasonable description of what is going on in our sample, and we should reject it. If the probability is high, then we cannot reject the null hypothesis. Inferential statistics helps us infer/estimate that likelihood.
Because we cannot measure every single person in the population that we are interested in, we have to select a subset, or sample, of people to participate in our study. Our main concern is that we select a sample that is representative of the entire population -- that is, the characteristics of the sample should contain all of the characteristics in the population, and the population should not contain characteristics that are not in the sample. If a sample is representative of the population, then we can infer that the findings we observe in our sample apply to the larger population; if the sample is not representative, then the findings do not tell us anything about the population, and we cannot conclude anything about our hypothesis of interest. If our sample is not representative of our population, then we have a lot of sampling error. Sampling error produces a biased estimate of the population paramenter.
For example, if we were conducting a study to find out the percentage of people in the total population who own horses, and the only people who participated in our study were farmers, then the results would not be indicative of the entire world population, and we would have a high degree of sampling error. Sampling error can be avoided by randomly sampling from the population of interest.
There is an important relationship between the size of the sample and our ability to infer something about a population from our sample:
A sample statistic calculated from a large sample is a more accurate estimate of a population parameter than a sample statistic calculated from a small sample. Therefore, sample statistics will cluster more tightly around the value of the population parameter for large sample sizes than for small sample sizes.
This idea can be seen clearly
by examining the notion of a sampling distribution.