Sampling Theory

In the world of Statistics, the very first thing to be done before any estimation is to create a Sample set from the entire Population Set. The Population set can be seen as the entire tree from where data is collected whereas the Sample Set can be seen as the branch in which the actual study of observations and estimation is done. Population tree is a very large set and making the study of observations on it can be very exhausting, both time and money-wise alike. Thus to cut down on the amount of time and as well as resources, a Sample Set is created from the Population set.

Process of Sampling:

  1. Identifying the Population set.
  2. Determination of the size of our sample set.
  3. Providing a medium for the basis of selection of samples from the Population medium.
  4. Picking out samples from the medium using one of many Sampling techniques like Simple Random, Systematic or Stratified Sampling.
  5. Checking whether the formed sample set, contains elements actually matches the different attributes of population set, without large variations in between.
  6. Checking for errors or inaccurate estimations in the formed sample set, that may or may not have occurred
  7. The set which we get after performing the above steps actually contributes to the Sample Set.

A simple illustration of how sampling is done at it’s basic stages.

Population

Population is the whole set of variables, elements, entities which are considered for a statistical study. It is also known as the universal set from where actual inferences are drawn. Population set consists of all the attributes of individuals or elements under consideration, but doing estimations on a Population is very exhausting resources as well as time-wise alike.

Example: Consider the mean weight of all men on Earth. This here, is considered a hypothetical population because it includes all men that have ever lived on earth which includes people who will exist in the future and also people who have lived earlier before us. But there comes an anomaly, while doing such measurement which is not all men in the population tray are observable (consider men, who will exist in the future and also men, who have lived before and doesn’t exist right now). Also, performing statistics on the population sample (if hypothetically possible) would require a great deal of time as well as resources, which will be exhaustive and inefficient as well.



Thus what is perform instead is to take a subset from the available population and perform statistics on them and interpolate inferences about the entire population. Taking out a subset, makes the task easier as the time required to scrutinize the subset is lesser than the time required to scrutinize the whole set of Population. Statistics is performed on the sample set to draw conclusions about the entire population tray. Calculations are considered to be a conclusion of the population set because it doesn’t measure with the actual data of the population set and is not free from errors. This is obvious as sample set is used as a medium frame, having fewer members and thus some information is lost. (which results in errors).

Sampling Frame

Sampling Frame is the basis of the sample medium. It is a collection of all the sample elements taken into observation. Sometimes it might even happen that all elements in the sampling frame, didn’t even take part in the actual statistics. In that case, the elements that took part in the study are called Samples and potential elements that could have been in the study but didn’t take part forms the Sampling Frame. Thus, Sampling Frame is the potential list of elements on which we will perform our statistics.
Coming up with a good sampling frame is very essential because it will help in predicting the reaction of the statistics result with the population set. A sampling frame is not just a random set of handpicked elements rather it even consists of identifiers which help to identify each and every element in the set.

Example: GeeksForGeeks organized a meetup of all the Geek Interns all over India at Delhi to perform a statistical study on their performances. GfG sent an invitational email to all 500 of those Interns, but since all the Interns are scattered all over India, out of 500 people of 200 show up in actual. And thus, GfG had to perform their study on 200 students only (Sample Set). But the remaining 300 people who could have been the potential candidats in the study, but decided not to show up forms the Sampling Frame.

Methods and Types of sampling:

  1. Simple Random Sampling
  2. Systematic Sampling
  3. Stratified Sampling

These are the most widely used Sampling Processes with each having their both advantages as well as disadvantages.

Let us look at each of these sampling methods in details:

  1. Simple Random Sampling: Simple Random Sampling is the most elementary form of sampling. In this method, all the elements in populations are first divided into random sets of equal sizes. Random sets have no defining property among themselves, i.e one set cannot be identified from another set based on some specific identifiers. Thus every element has an equal property of being selected.
    P(of getting selected) =  \frac{1}{2}

    The basic methods for employing SRS are:

    • Choose the Population Set
    • Identify the basis of Sampling
    • Use of random number/session generators to pick an element from each set.

    Simple Random Sampling

    Pros:



    • Less exhaustive with respect to time as it is the most elementary form of sampling
    • Very useful for population set with very less number of elements
    • SRS can be employed anywhere, anytime even without the use of special random generators

    Cons:

    • Not efficient for large population sets
    • Causes the most number of errors out of the three mentioned methods of sampling
    • There are chances of bias and then SRS won’t be able to provide a correct result
    • Does not provide a specific identifier to separate statistically similar samples
  2. Systematic Sampling: Systematic Sampling is also known as a type of probability sampling. It is much more accurate than SRS and also the standard error formation percentage is very low but not error-free. In this method, first, the population tray elements are arranged based on a specific order or scheme properly known as being sorted. It can be of any order, which totally depends upon the person performing the statistics. The elements are first arranged either ascendingly, descending, lexicographically or any other known methods deemed fit by the tester. Although the start point needs to be random every time. After being arranged, then the sample elements are picked based on a pre-defined interval set or function.
    Example: In a random set of numbers with elements ranging from 1 to 100. The elements are first sorted either in ascending or descending order. Then let’s say every 4th element is picked to be a part of the sampling frame. This kind of sampling is known as Systematic Sampling.

    P(of getting selected) = [depends upon the ordered population tray after it has been sorted]

    The basic methods of employing Systematic Random Sampling are :-

    • Choosing the Population Set wisely
    • Checking whether Systematic Sampling will be the efficient method or not.
    • If Yes, then Application of an sorting method to get an ordered pair of population elements.
    • Choosing a periodicity to crawl out elements.

    Systematic Way of Sampling

    Pros:

    • Accuracy is higher than SRS.
    • Standard probability of error is lesser .
    • No problem for bias to creep in during creation of sample frame.

    Cons:

    • Not much efficient when comes to the time wise
    • Periodicity in population tray elements can lead to absurd results.
    • Systematic sampling can either provide the most accurate result or an impossible one.
  3. Stratified Sampling: Stratified Sampling is the most complex type of Sampling Method out of all the three methods mentioned above. It is a hybrid method concerning both simple random sampling as well as systematic sampling. It is one of the most advanced types of sampling method available, providing near accurate result to the tester. In this method, the population tray is divided into sub-segments also known as stratum(singular). Each stratum can have their own unique property. After being divided into different sub-stratum, SRS or Systematic Sampling can be used to create and pick out samples for performing statistics.
    The elementary methods for Stratified Sampling are :

    • Choosing the population tray wisely.
    • Checking for periodicity or any other features, so that they can be divided into different strata
    • Dividing the population tray into sub-sets and sub-groups on the basis of selective property.
    • Using SRS or Systematic Sampling of each individual strata to form the sample frame.
    • We can even apply different sampling methods to different sub-sets.

    Visual Representation of Stratified Sampling.

    Pros:

    • Provide results with high accuracy measurements.
    • Different results can be desired just by changing the Sampling method.
    • This method also compares different strata when samples are being drawn.

    Cons:

    • Inefficient and Expensive when comes to resources as well as money.
    • This method will fail only in rare cases where homogenity in elements is present.

These three are the widely used methods of Sampling which are being done nowadays. Each of them has their own advantages as well as disadvantages. So, the sampling method must be chosen wisely, because a wrong choice can lead to erroneous answers.



My Personal Notes arrow_drop_up

Its lonely at the top

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.