Lesson 7: Understanding Sampling

Lesson Progress
0% Complete

Selecting representatives of a target population

In this step, we introduce the principles, purpose and types of sampling methods that are used in epidemiological studies in eye care. To obtain correct estimates of prevalence or other epidemiological measures, all the analyses done have to be weighted according to the sampling method used.

As you read through this article, consider the challenges that a researcher may face when trying to conduct a cross-sectional study in a densely populated urban setting versus a sparsely-populated rural setting.

Definition of sampling

Sampling is a procedure by which some members of a given population are selected as representatives of the entire target population. A sample is, therefore, a subset of the target population.

Why do we use samples?

Collecting information from everyone in a large population would be logistically impossible and financially prohibitive. By using a sample from the population of interest we can:

  • Lower the cost of the study
  • Require a shorter study time
  • Obtain better quality data (through increased accuracy and enhanced tools).

Sampling and representativeness

To ensure that a sample is representative, we must clearly define the target population, sampling frame and the sample selection process. The purpose is to make sure that the study findings (say, for example, on the prevalence of cataract) are similar to those we would find if we conducted a census of the whole population.

In other words, we “trade-off” the above “practicality” factors vs. the “certainty” that we would have by “taking a census” of the entire population.

IMPORTANT: The idea of a sample is to make inferences about the whole population. Key terms in sampling include:

  • Target population: is the population to which the results from the sample have to be extrapolated (projected to)
  • Sampling frame: the population from which the sample is selected
  • Sampling unit: based on your target population from which your sample will be taken e.g village/household/school etc.
  • Selected sample: people who are randomly selected from the population
  • Study sample: those who actually participate. A high response rate enables us to generalise our results to the target population.

Sample size

We determine the sample size at the planning stage of any research study. Selecting the sample size is not an exact procedure. It is an exercise in balance between cost (resources needed) and the precision of the prevalence estimates that are to be made.

We should not carry out any study unless we are confident that the sample size is large enough to give the minimum required precision.

Sampling methods

A. Probability sampling methods

B. Simple random sampling: Each individual in a target population is enumerated and assigned a number. The sample size required is randomly selected by use of a table of random numbers. Everyone has an equal chance of being selected.

C. Systematic random sampling: We do this by taking every nth person on a list. For example, every 6th person. This method can result in a biased sample. For example, if we use an electoral list, family members may be listed in groups. The sampling interval is made regular in order to select the required sample size. Systematic random sampling can be simpler to administer than random sampling and in some circumstances, such as in a large sampling area, it may guarantee more uniform distribution throughout the survey area.

The challenges in carrying out random or systematic sampling are that, in some settings, population lists may not be available or populations may be scattered, so it would make the process of sampling inefficient. No sample will be exactly the same as the true population, there will always be some effect of sampling, this is known as the sampling error.

Simple random sampling has simple statistical properties, so we can easily measure our likely sampling error and establish the range in which true prevalence is found.

Multi-stage cluster sampling

We use cluster sampling in situations when it’s not possible to carry out the systematic random sampling. For example, creating a list of all households in a large population. Clusters are smaller groups within which we then carry out a random sampling.

Multi-stage cluster sampling is carried out in two stages:

  1. We construct a sample from a population by first creating
    and selecting clusters (stage one) e.g list of all villages/ towns, schools,
    households, (a sampling frame) instead of individuals and then.
  2. Selecting individuals through systematic random sampling from within the selected clusters (stage two). For example, to obtain a representative sample of school children in a district, we first list all the schools. This creates our sampling frame from which some schools (clusters) are selected randomly. Then, within each school selected, we randomly draw a sample of children.

Sampling frames and probability proportional to size

Within a sampling frame, each sampling unit may be variable in size. That is, there may be different numbers of people within each sampling unit and this will affect the odds of selection between large and small units.

Therefore, a technique called probability proportional to size (PPS) is applied to ensure an equal chance of selection across the sampling frame. When we sample through PPS, people in larger population units have the same chance of being included as those in smaller population units.

Advantages of cluster sampling

  • Simple as a complete list of people/households within the population is not required.
  • Less travel/resources required

Disadvantages of cluster sampling

  • Individuals within a cluster may be more alike (homogenous) than those in other clusters.
  • Increased sampling error (loss of precision). This is primarily because the focus is the cluster and significant portions of the population remain un-sampled
  • Need to increase the sample size to account for increased sampling error (design effect).

Sampling within clusters

There are various methods that can be used:

  • Simple random method – but this can be very time consuming and requires a full population list of the selected cluster
  • Random walk method – spinning a bottle at the centre of a cluster sets the direction and the first household is selected followed by the next and so on until the required sample is obtained. There is a lack of objectivity and may cause bias e.g subjective researcher bias, central households may vary from peripheral households.
  • Compact segment method – maps are drawn and divided into equal segments. A segment is selected randomly and everyone in that segment is included.

Differences between sampling error and no error

Sampling error ALWAYS occurs when samples are examined instead of whole populations. It can be measured (standard error). It cannot be prevented but it can be reduced by increasing the sample size.

Non-sampling error cannot be measured. It can be prevented or minimised by carrying out a well-designed and conducted the study. When considering the results from a prevalence survey, both types of error have to be considered, as both affect the estimate of the prevalence obtained in the sample.

In summary, “True prevalence” in a population of interest is influenced by:

  • Estimate from sample
  • +/- Sampling error
  • +/- Non-sampling error (bias)