Skip to content
Related Articles

Related Articles

Working with XML Files in R Programming
  • Last Updated : 01 Jun, 2020

XML which stands for Extensible Markup Language is made up of markup tags, wherein each tag illustrates the information carried by the particular attribute in the XML file. We can work with the XML files using the XML package provided by R. The package has to be explicitly installed using the following command:

install.packages("XML")

Creating XML file

XML files can be created by saving the data with the respective tags containing information about the content and saving it with ‘.xml’.
We will use the following XML file ‘sample.xml’ to see the various operations that can be performed on the file:




<RECORDS>
  <STUDENT>
      <ID>1</ID>
      <NAME>Alia</NAME>
      <MARKS>620</MARKS>
      <BRANCH>IT</BRANCH>
  </STUDENT>
  <STUDENT>
      <ID>2</ID>
      <NAME>Brijesh</NAME>
      <MARKS>440</MARKS>
      <BRANCH>Commerce</BRANCH>
   </STUDENT
  <STUDENT>
      <ID>3</ID>
      <NAME>Yash</NAME>
      <MARKS>600</MARKS>
      <BRANCH>Humanities</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>4</ID>
      <NAME>Mallika</NAME>
      <MARKS>660</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>5</ID>
      <NAME>Zayn</NAME>
      <MARKS>560</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
</RECORDS>

Reading XML File

The XML file can be read after installing the package and then parsing it with xmlparse() function, which takes as input the XML file name and prints the content of the file in the form of a list. The file should be located in the current working directory. An additional package named ‘methods’ should also be installed. The following code can be used to read the contents of the file “sample.xml”.




# loading the library and other important packages
library("XML")
library("methods")
  
# the contents of sample.xml are parsed
data <- xmlParse(file = "sample.xml")
  
print(data)

Output:

1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT

Extracting information about the XML file

XML files can be parsed and operations can be performed on its various components. There are various in-built functions available in R, to extract the information of the nodes associated with the file, getting the number of nodes in the file, and also the specific attributes of some particular node in the file.






# loading the library and other important packages
library("XML")
library("methods")
  
# the contents of sample.xml are parsed
# Load the packages required to read XML files.
library("XML")
library("methods")
  
# Give the input file name to the function.
res <- xmlParse(file = "sample.xml")
  
# Exract the root node.
rootnode <- xmlRoot(res)
  
# number of nodes in the root.
nodes <- xmlSize(rootnode)
  
# get entire contents of a record
second_node <- rootnode[2]
  
# get 3rd attribute of 4th record
attri <- rootnode[[3]][[4]]
  
cat('number of nodes: ', nodes)
print ('details of 2 record: ')
print (second_node)
  
# prints the marks of the fourth record
print ('3rd attribute of 4th record: ', attr)

Output:

[1] number of nodes: 5
[2] details of 2 record:
$STUDENT
    2
    Brijesh
    440
    Commerce
[3] 3rd attribute of 4th record: 660

Conversion of XML to dataframe

In order to enhance the readability of the data, the XML data can be converted into a data frame consisting of a data frame comprising of rows and columns. R contains an in-built function xmlToDataFrame() which contains as input the XML file and outputs the corresponding data in the form of a data frame. This simulates the easy handling and processing of large amounts of data.




# Load the required packages.
library("XML")
library("methods")
  
# Convert the input xml file to a data frame.
dataframe <- xmlToDataFrame("sample.xml")
print(dataframe)

Output:

       ID   NAME     MARKS      BRANCH       
1      1    Alia      620         IT
2      2    Brijesh   440      Commerce
3      3    Yash      600      Humanities
4      4    Mallika   660         IT
5      5    Zayn      560         IT

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up
Recommended Articles
Page :