Python – Random Sample Training and Test Data from dictionary
Last Updated :
25 Apr, 2023
Sometimes, while working with Machine Learning Algorithm, we can have problem in which we need to differentiate the training and testing data randomly. This is very common problem and solution to it is desirable for Machine Learning domains. This article discusses approach to solve this without using external libraries.
Method : Using keys() + random.randint() + computations This problem can be solved by using combination of above functions. In this, we perform the task of extraction of random keys using randint(), from the keys extracted using keys(). The logical computations are performed for getting the separated test and training data.
Python3
import random
test_dict = { 'gfg' : 4 , 'is' : 12 , 'best' : 6 , 'for' : 7 , 'geeks' : 10 }
print ("The original dictionary is : " + str (test_dict))
test = 40
training = 60
key_list = list (test_dict.keys())
test_key_count = int (( len (key_list) / 100 ) * test)
test_keys = [random.choice(key_list) for ele in range (test_key_count)]
train_keys = [ele for ele in key_list if ele not in test_keys]
testing_dict = dict ((key, test_dict[key]) for key in test_keys
if key in test_dict)
training_dict = dict ((key, test_dict[key]) for key in train_keys
if key in test_dict)
print ("The testing dictionary is : " + str (testing_dict))
print ("The training dictionary is : " + str (training_dict))
|
Output :
The original dictionary is : {‘is’: 12, ‘gfg’: 4, ‘best’: 6, ‘for’: 7, ‘geeks’: 10} The testing dictionary is : {‘is’: 12, ‘for’: 7} The training dictionary is : {‘gfg’: 4, ‘best’: 6, ‘geeks’: 10}
Time Complexity: O(n*n), where n is the length of the list test_dict
Auxiliary Space: O(n) additional space of size n is created where n is the number of elements in the res list
Share your thoughts in the comments
Please Login to comment...