With this article be ready to get your hands dirty with ML algorithms, concepts, Maths and coding.
To work with ML code, libraries play a very important role in Python which we will study in details but let see a very brief description of the most important ones :
- NumPy (Numerical Python) : It is one of the greatest Scientific and Mathematical computing library for Python. Platforms like Keras, Tensorflow have embedded Numpy operations on Tensors. The feature we are concerned with its power and easy to handle and perform operation on Array.
- Pandas : This package is very useful when it comes to handle data. This makes it very easier to manipulate, aggregate and visualize data.
- MatplotLib : This library facilitates the task of powerful and very simple visualizations.
There are many more libraries but they have no use right now. So, let’s begin.
Download the dataset :
Go to the link and download Data_for_Missing_Values.csv.
I would suggest you guys to install Anaconda on your systems. Launch Spyder our Jupyter on your system. Reason behind suggesting is – Anaconda has all the basic Python Libraries pre installed in it.
Below is the Python code :
Data Head : Country Age Salary Purchased 0 France 44.0 72000.0 No 1 Spain 27.0 48000.0 Yes 2 Germany 30.0 54000.0 No 3 Spain 38.0 61000.0 No 4 Germany 40.0 NaN Yes Data Describe : Age Salary count 9.000000 9.000000 mean 38.777778 63777.777778 std 7.693793 12265.579662 min 27.000000 48000.000000 25% 35.000000 54000.000000 50% 38.000000 61000.000000 75% 44.000000 72000.000000 max 50.000000 83000.000000 Input : [['France' 44.0 72000.0] ['Spain' 27.0 48000.0] ['Germany' 30.0 54000.0] ['Spain' 38.0 61000.0] ['Germany' 40.0 nan] ['France' 35.0 58000.0] ['Spain' nan 52000.0] ['France' 48.0 79000.0] ['Germany' 50.0 83000.0] ['France' 37.0 67000.0]] Output: ['No' 'Yes' 'No' 'No' 'Yes' 'Yes' 'No' 'Yes' 'No' 'Yes'] New Input with Mean Value for NaN : [['France' 44.0 72000.0] ['Spain' 27.0 48000.0] ['Germany' 30.0 54000.0] ['Spain' 38.0 61000.0] ['Germany' 40.0 63777.77777777778] ['France' 35.0 58000.0] ['Spain' 38.77777777777778 52000.0] ['France' 48.0 79000.0] ['Germany' 50.0 83000.0] ['France' 37.0 67000.0]]
CODE EXPLANATION :
- Part 1 – Importing Libraries : In the above code, imported numpy, pandas and matplotlib but we have used pandas only.
- PART 2 – Importing Data :
Data_for_Missing_Values.csvby giving the path to pandas read_csv function. Now “data_sets” is a DataFrame(Two-dimensional tabular data structure with labeled rows and columns).
- Then print first 5 data-entries of the dataframe using head() function. Number of entries can be changed for e.g. for first 3 values we can use dataframe.head(3). Similarly, last values can also be gotten using tail() function.
- Then used describe() function. It gives statistical summary of data which includes min, max, percentile (.25, .5, .75), mean and standard deviation for each parameter values.
Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True) is a function from Imputer class of sklearn.preprocessing package. It’s role is to transformer parameter value from missing values(NaN) to set strategic value.
Syntax : sklearn.preprocessing.Imputer() Parameters : -> missing_values : integer or “NaN” -> strategy : What to impute - mean, median or most_frequent along axis -> axis(default=0) : 0 means along column and 1 means along row
- Python | Visualize missing values (NaN) values using Missingno Library
- ML | Handling Imbalanced Data with SMOTE and Near Miss Algorithm in Python
- Extendible Hashing (Dynamic approach to DBMS)
- Van Emde Boas Tree | Set 1 | Basics and Construction
- ML | Mathematical explanation of RMSE and R-squared error
- Blockchain to Secure IoT Data
- Spelling Correction using K-Gram Overlap
- Seaborn | Style And Color
- 10 Basic Machine Learning Interview Questions
- Proto Van Emde Boas Tree | Set 3 | Insertion and isMember Query
- Proto Van Emde Boas Trees | Set 4 | Deletion
- Proto Van Emde Boas Tree | Set 5 | Queries: Minimum, Maximum
- Proto Van Emde Boas Tree | Set 6 | Query : Successor and Predecessor
- ML | Models Score and Error
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.