Working with XML Files in R Programming
XML which stands for Extensible Markup Language is made up of markup tags, wherein each tag illustrates the information carried by the particular attribute in the XML file. We can work with the XML files using the XML package provided by R. The package has to be explicitly installed using the following command:
install.packages("XML")
Creating XML file
XML files can be created by saving the data with the respective tags containing information about the content and saving it with ‘.xml’.
We will use the following XML file ‘sample.xml’ to see the various operations that can be performed on the file:
HTML
< RECORDS >
< STUDENT >
< ID >1</ ID >
< NAME >Alia</ NAME >
< MARKS >620</ MARKS >
< BRANCH >IT</ BRANCH >
</ STUDENT >
< STUDENT >
< ID >2</ ID >
< NAME >Brijesh</ NAME >
< MARKS >440</ MARKS >
< BRANCH >Commerce</ BRANCH >
</ STUDENT >
< STUDENT >
< ID >3</ ID >
< NAME >Yash</ NAME >
< MARKS >600</ MARKS >
< BRANCH >Humanities</ BRANCH >
</ STUDENT >
< STUDENT >
< ID >4</ ID >
< NAME >Mallika</ NAME >
< MARKS >660</ MARKS >
< BRANCH >IT</ BRANCH >
</ STUDENT >
< STUDENT >
< ID >5</ ID >
< NAME >Zayn</ NAME >
< MARKS >560</ MARKS >
< BRANCH >IT</ BRANCH >
</ STUDENT >
</ RECORDS >
|
Reading XML File
The XML file can be read after installing the package and then parsing it with xmlparse() function, which takes as input the XML file name and prints the content of the file in the form of a list. The file should be located in the current working directory. An additional package named ‘methods’ should also be installed. The following code can be used to read the contents of the file “sample.xml”.
Python3
library( "XML" )
library( "methods" )
data < - xmlParse( file = "sample.xml" )
print (data)
|
Output:
1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT
Extracting information about the XML file
XML files can be parsed and operations can be performed on its various components. There are various in-built functions available in R, to extract the information of the nodes associated with the file, getting the number of nodes in the file, and also the specific attributes of some particular node in the file.
Python3
library( "XML" )
library( "methods" )
library( "XML" )
library( "methods" )
res < - xmlParse( file = "sample.xml" )
rootnode < - xmlRoot(res)
nodes < - xmlSize(rootnode)
second_node < - rootnode[ 2 ]
attri < - rootnode[[ 4 ]][[ 3 ]]
cat( 'number of nodes: ' , nodes)
print ( 'details of 2 record: ' )
print (second_node)
print ( '3rd attribute of 4th record: ' , attr)
|
Output:
[1] number of nodes: 5
[2] details of 2 record:
$STUDENT
2
Brijesh
440
Commerce
[3] 3rd attribute of 4th record: 660
Conversion of XML to dataframe
In order to enhance the readability of the data, the XML data can be converted into a data frame consisting of a data frame comprising of rows and columns. R contains an in-built function xmlToDataFrame() which contains as input the XML file and outputs the corresponding data in the form of a data frame. This simulates the easy handling and processing of large amounts of data.
Python3
library( "XML" )
library( "methods" )
dataframe < - xmlToDataFrame( "sample.xml" )
print (dataframe)
|
Output:
ID NAME MARKS BRANCH
1 1 Alia 620 IT
2 2 Brijesh 440 Commerce
3 3 Yash 600 Humanities
4 4 Mallika 660 IT
5 5 Zayn 560 IT
Last Updated :
01 Mar, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...