Exploring Categorical Data

Categorical Variable/Data (or Nominal variable):

Such variables take on a fixed and limited number of possible values. For examples – grades, gender, blood group type etc. Also, in the case of categorical variables, logical order is not the same as categorical data e.g. “one”, “two”, “three”. But the sorting of these variables uses logical order. For example, gender is a categorical variable and has categories – male and female and there is no intrinsic ordering to the categories. A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the variables.

Terms related to Variability Metrics :



  • Mode : Most frequently occuring value in the given data
    Example-

    Data = ["Car", "Bat", "Bat", "Car", "Bat", "Bat", "Bat", "Bike"]
    Mode = "Bat"
  • Expected Value : When working in machine learning, categories have to be associated with a numeric value, so as to give understanding to the machine. This gives an average value based on a category’s probability of occurrence i.e. Expected Value.
    It is calculated by –

    -> Multiply each outcome by its probability of occurring.
    -> Sum these values

    So, it is the sum of values times their probability of occurrence often used to sum up factor variable levels.

  • Bar Charts : Frequency of each category plotted as bars.

    Loading Libraries –

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    import matplotlib.pyplot as plt
    import numpy as np

    chevron_right

    
    

    Data –

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    label = ['Car', 'Bike', 'Truck', 'Cycle', 'Jeeps', 'Amulance']
    no_vehicle = [941, 854, 4595, 2125, 942, 509]

    chevron_right

    
    

    Indexing Data –

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    index = np.arange(len(label))
      
    print ("Total Labels : ", len(label))
    print ("Indexing : ", index)

    chevron_right

    
    

    Output:

    Total Labels :  6
    Indexing :  [0 1 2 3 4 5]

    Bar Graph –

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    plt.bar(index, no_vehicle)
    plt.xlabel('Type', fontsize = 15)
    plt.ylabel('No of Vehicles', fontsize = 15)
    plt.xticks(index, label, fontsize = 10, rotation = 30)
    plt.title('Market Share for Each Genre 1995-2017')
      
    plt.show()

    chevron_right

    
    

    Output:

  • Pie Charts : Frequency of each category plotted as pie or wedges. It is a circular graph, where the arc length of each slice is proportional to the quantity it represents.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    plt.figure(figsize =(8, 8))
    plt.pie(no_vehicle, labels = label, 
            startangle = 90, autopct ='%.1f %%')
    plt.show()

    chevron_right

    
    

    Output:



My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.