Open In App

PyBrain – Datasets Types

Last Updated : 21 Feb, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Datasets are useful for allowing comfortable access to training, test, and validation data. Instead of having to mangle with arrays, PyBrain gives you a more sophisticated data structure that allows easier work with your data.

DataSets In PyBrain

The most commonly used datasets that Pybrain supports are SupervisedDataSet and ClassificationDataSet.

SupervisedDataSet: It consists of fields of input and target. It is the simplest form of a dataset and is mainly used for supervised learning tasks. As the name says, this simplest form of a dataset is meant to be used with supervised learning tasks. It is comprised of the fields ‘input’ and ‘target’, the pattern size of which must be set upon creation:

Python3




from pybrain.datasets import SupervisedDataSet
  
DS = SupervisedDataSet(3, 2)
DS.appendLinked([1, 2, 3], [4, 5])
len(DS)
DS['input']
array([[1.2.3.]])


Output:

ClassificationDataSet: It is mainly used to deal with classification problems. It takes in input, target field, and also an extra field called “class” which is an automated backup of the targets given. For example, the output will be either 1 or 0, or the output will be grouped together with values based on input given, i.e., either it will fall in one particular class.

Python3




# Importing all the necessary libraries
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
  
# Loading iris dataset from sklearn datasets
iris = datasets.load_iris()
  
# Defining feature variables and target variable
X_data = iris.data
y_data = iris.target
  
# Defining classification dataset model
classification_dataset = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into classification dataset
for i in range(len(X_data)):
    classification_dataset.addSample(ravel(X_data[i]), y_data[i])
  
# Spilling data into testing and training data 
# with the ratio 7:3
testing_data, training_data = classification_dataset.splitWithProportion(0.3)
  
# Classification dataset for test data
test_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into testing classification dataset
for n in range(0, testing_data.getLength()):
    test_data.addSample(testing_data.getSample(
        n)[0], testing_data.getSample(n)[1])
  
# Classification dataset for train data
train_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into training classification dataset
for n in range(0, training_data.getLength()):
    train_data.addSample(training_data.getSample(
        n)[0], training_data.getSample(n)[1])
  
test_data._convertToOneOfMany()
train_data._convertToOneOfMany()
  
# Building network with outclass as SoftmaxLayer
# on training data
build_network = buildNetwork(
    train_data.indim, 4, train_data.outdim, outclass=SoftmaxLayer)
  
# Building a backproptrainer on training data
trainer = BackpropTrainer(
    build_network, dataset=train_data, learningrate=0.01, verbose=True)
  
# 20 iterations on training data
trainer.trainEpochs(20)
  
# Testing data
print('Error percentage on testing data=>', percentError(
    trainer.testOnClassData(dataset=test_data), test_data['class']))


Output:

Total error:  0.0892390931641
Total error:  0.0821479733597
Total error:  0.0759327938967
Total error:  0.0722385583142
Total error:  0.0690818068826
Total error:  0.0667645311923
Total error:  0.0647079622731
Total error:  0.0630345245312
Total error:  0.0608030839912
Total error:  0.0595356750412
Total error:  0.0586635639408
Total error:  0.0573043661487
Total error:  0.0559188704413
Total error:  0.0548155819544
Total error:  0.0535537679931
Total error:  0.0527051106108
Total error:  0.0515783629912
Total error:  0.0501025301423
Total error:  0.0499123823243
Total error:  0.0482250742606
Error percentage on testing data=> 20.0


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads