Material Analysis using Python
Using basic principles of more than one field to solve a complicated problem might be very difficult to reach by using one area of knowledge. With this kind of approach, one can also redefine problems outside usual boundaries and reach solutions using a new understanding of impenetrable situations which early considered impossible to get.
In layman’s terms, it is a field of study to analyze materials and their properties.
Scientifically-It’s a study to get insight into a material’s fundamental properties to determine whether the material is suitable for its intended use-case or needs some doping (or any other approach) to make it well qualified for the purpose.
Use-case: This study is also getting integrated with computer science to get better and precise insights into data without having much practical implementation.
Example: If one has a database of, say, Mn(Manganese) compounds and their magnetic behavior. Analyzing the same data can predict the magnetic properties of unknown magnetic compounds( whose properties are still novel) using a machine learning approach.
In the dictionary, it is described as a word or expression used to describe or identify something.
A descriptor is used to describe the compound to computing algorithms. Many representations of properties elements can be converted in a mathematical format of vectors and matrices (like using one-hot vector encoding for describing an electronic configuration of an element ) to pass them as an input to a machine learning algorithm.
Pymatgen is a short form for Python Materials Genomics. It is a robust, open-source, and widely used Python library for material analysis.
Note- Only getting electronic configuration, atomic number, or any other very basic material properties does not account for material analysis.
Pymatgen is widely preferred as it is:
- Highly flexible classes for representation of Element, Site, Molecule, Structure objects, Nearest Neighbors.
- Variety of input/output formats like CIF, Gaussian, XYZ, VASP.
- Electronic structure analyses, such as the density of states and band structure.
- Powerful analysis tools.
- Integration with Materials Project REST API, Crystallography Open Database, and other external data sources.
- It is free to use, well documented, open and fast.
As it is not an inbuilt python library so need to install it externally.
The most straightforward installation is using conda. After installing conda:
conda install –channel conda-forge pymatgen
Pymatgen uses ‘gcc’ for compilation so the latest version of the same is required to compile pymatgen.
conda install gcc
Pymatgen is open source so new features are added regularly. So to upgrade pymatgen to the latest version:
conda upgrade pymatgen
pip install pymatgen
and to upgrade pymatgen
pip install –upgrade pymatgen
To install pymatgen on google collab
!pip install pymatgen
Details of an element and a compound
Fetching details(like atomic mass, melting point ) of an element using Element class of Pymatgen library. Pass the element symbol as a parameter to the Element class.
Similarly, can also get details of a compound.
Structure & file formats
Pymatgen has many libraries that are grouped/separated according to the properties they represent. Here, the pymatgen first diagonal lattice matrix is created followed by fetching its structure. Without a filename, a string is returned. Otherwise, the output is written to the file. If only the filename is provided
Pymatgen also allows the user to read a structure from an external file. One can achieve the same in two ways using string and file that are used in the following code. The file we are going to fetch is the computed version of MnO2.cif.
It can also work as a file converter, as it allows to read a molecule from a file in one format and write the same molecule in a file of another format.
External Data Sources
As explained above that pymatgen can be linked to different external data sources. Material Project’s data can be accessed in pymatgen using the MPRester API of the project.
The Materials Project is one of those external databases that make available its data and scientific analysis through the open Materials Application Programming Interface API (also known as MPRester API as it is based on REpresentational State Transfer (REST) principles). This API can conceivably be used with any programming language supporting basic HTTP requests, a wrapper to MPRester API has already been implemented in pymatgen library to facilitate researchers who want to utilize their data.
Refer to this website for API key generation -> https://materialsproject.org/open
Here first, the object is created of the API key followed by querying the properties’ data of a particular task id( task id can be thought of as a unique identity of each element present in the database of Material Project).
Note- property names are mentioned under properties. If no such particular property data present then a null object is received for that particular property.
The output format is in the dictionary data structure for easy and understandable access to the required property
Secondly, fetching all the defined properties data (elements and compounds) of iron(Fe) compounds.
The output of querying all data of Fe element is in nested dictionary format and is very large to show in the console so it is first converted to a pandas data frame followed by saving it as a .csv file.
Real-Life Use case
Here, we are going to count the number of atoms in a compound. It can easily be done by fetching structural details of the compound in CIF format as it contains all the coordinate locations of each and every atom of the compound.
First, remove all the unnecessary text from the file and then count the number of remaining lines.
CoNi3 compound is associated with mp-1183751