Working with XML Files in R Programming

XML which stands for Extensible Markup Language is made up of markup tags, wherein each tag illustrates the information carried by the particular attribute in the XML file. We can work with the XML files using the XML package provided by R. The package has to be explicitly installed using the following command:

install.packages("XML")

Creating XML file

XML files can be created by saving the data with the respective tags containing information about the content and saving it with ‘.xml’.
We will use the following XML file ‘sample.xml’ to see the various operations that can be performed on the file:

filter_none

edit
close

play_arrow

link
brightness_4
code

<RECORDS>
  <STUDENT>
      <ID>1</ID>
      <NAME>Alia</NAME>
      <MARKS>620</MARKS>
      <BRANCH>IT</BRANCH>
  </STUDENT>
  <STUDENT>
      <ID>2</ID>
      <NAME>Brijesh</NAME>
      <MARKS>440</MARKS>
      <BRANCH>Commerce</BRANCH>
   </STUDENT
  <STUDENT>
      <ID>3</ID>
      <NAME>Yash</NAME>
      <MARKS>600</MARKS>
      <BRANCH>Humanities</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>4</ID>
      <NAME>Mallika</NAME>
      <MARKS>660</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
  <STUDENT>
      <ID>5</ID>
      <NAME>Zayn</NAME>
      <MARKS>560</MARKS>
      <BRANCH>IT</BRANCH>
   </STUDENT>
</RECORDS>

chevron_right


Reading XML File

The XML file can be read after installing the package and then parsing it with xmlparse() function, which takes as input the XML file name and prints the content of the file in the form of a list. The file should be located in the current working directory. An additional package named ‘methods’ should also be installed. The following code can be used to read the contents of the file “sample.xml”.

filter_none

edit
close

play_arrow

link
brightness_4
code

# loading the library and other important packages
library("XML")
library("methods")
  
# the contents of sample.xml are parsed
data <- xmlParse(file = "sample.xml")
  
print(data)

chevron_right


Output:

1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT

Extracting information about the XML file

XML files can be parsed and operations can be performed on its various components. There are various in-built functions available in R, to extract the information of the nodes associated with the file, getting the number of nodes in the file, and also the specific attributes of some particular node in the file.



filter_none

edit
close

play_arrow

link
brightness_4
code

# loading the library and other important packages
library("XML")
library("methods")
  
# the contents of sample.xml are parsed
# Load the packages required to read XML files.
library("XML")
library("methods")
  
# Give the input file name to the function.
res <- xmlParse(file = "sample.xml")
  
# Exract the root node.
rootnode <- xmlRoot(res)
  
# number of nodes in the root.
nodes <- xmlSize(rootnode)
  
# get entire contents of a record
second_node <- rootnode[2]
  
# get 3rd attribute of 4th record
attri <- rootnode[[3]][[4]]
  
cat('number of nodes: ', nodes)
print ('details of 2 record: ')
print (second_node)
  
# prints the marks of the fourth record
print ('3rd attribute of 4th record: ', attr)

chevron_right


Output:

[1] number of nodes: 5
[2] details of 2 record:
$STUDENT
    2
    Brijesh
    440
    Commerce
[3] 3rd attribute of 4th record: 660

Conversion of XML to dataframe

In order to enhance the readability of the data, the XML data can be converted into a data frame consisting of a data frame comprising of rows and columns. R contains an in-built function xmlToDataFrame() which contains as input the XML file and outputs the corresponding data in the form of a data frame. This simulates the easy handling and processing of large amounts of data.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Load the required packages.
library("XML")
library("methods")
  
# Convert the input xml file to a data frame.
dataframe <- xmlToDataFrame("sample.xml")
print(dataframe)

chevron_right


Output:

       ID   NAME     MARKS      BRANCH       
1      1    Alia      620         IT
2      2    Brijesh   440      Commerce
3      3    Yash      600      Humanities
4      4    Mallika   660         IT
5      5    Zayn      560         IT



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.