What is Data Curation ?
The process of managing data throughout life cycle, collecting data from various sources, integrating this data into various repositories and making sure that data is easily available and retrievable for future purposes is called as Data Curation.
Data Curation mainly concerns is mainly concerned with maintaining and managing metadata. It is an iterative process and has a capacity of adding value to data. Data curation manages and maintains metadata than database itself. It includes various processes and activities related to organization and integration of data collected from various sources. It is an active and an ongoing process.
Stages of Data Curation’s iterative process :
The Data Curation iterative process has three stages. Those three stages are as follows :
- Preserving –
Collecting data from various sources and then collecting and managing it is called as preserving.
- Sharing –
Making sure that data is available and retrievable for future needs of an authenticated user.
- Discovering –
Reusing data with different combination and generating some new data comes under discovering.
Data Curation Life-cycle :
The Data Curation life-cycle represents all of stages of data throughout its life from its creation for a study to its distribution and reuse. There are various components in data curation life-cycle.
Those components are as follows :
- Data or Databases or Digital Objects –
This is the first layer of the data curation life-cycle model. Data or Databases or Digital objects are considered as key or core components of model. The digital objects can be any type of file or complex digital objects.
- Description and Representation of Information –
This is the second layer of data curation life-cycle model. To describe metadata, use appropriate standards so that it could be controlled over a long term. In this phase, assignment of administrative, structural, descriptive, technical and preservation of metadata is done. Make sure that metadata is represented and understood in a proper format.
- Preservation and Planning –
This is third layer of data curation life-cycle model. In this phase, planning for preservation of data throughout life-cycle is done. Planning for preservation requires management and administration of data creation during its life-cycle.
- Community watch and Participation –
This is fourth layer of data curation life-cycle model. In this phase, tracking of various community activities is done using various standards and tools. Monitor activities of data creation and assist in creation of standards. This action is the best for librarians or archivists, as they can assist in the managing duties of making sure that data is created appropriately and preserved.
- Curate and Preserve –
This is fifth layer of data curation life-cycle model. The action plans are promoted to curation and are preserved throughout life-cycle in this phase. Taking up monitoring and managing administrative actions that will promote curation and preservation. Paying close attention on creation of data and encouraging best practices through policies and standards will improve the organization of data throughout its life cycle.
The 6th layer of Data Curation life-cycle model includes the following phases:
- Create and Receive –
Creation of various types of data takes place here. Also this phase receives data from various formats. Creation of data using descriptive and technical metadata and include preservation metadata. Build a collecting policy in order to be prepared to receive data from data creators, other archives and data centers.
- Appraise and Select –
The evaluation and selection of data is done to preserve it for long term in this phase. An appraise and selection policy must be created in this phase with data creators and the data curators. Then establish this policy and evaluate. Further do selection for long term preservation and curation of data.
- Ingest –
Data is been transferred to an archive, repository or data center. Ensure that this activity is properly completed.
- Preservation Action –
Process like data cleaning, validation, assigning representation information and ensuring acceptable data structures or file is done. Take steps to ensure the long-term preservation of data.
- Store –
Store the data in a very secure manner.
- Access, use and reuse –
Making sure data is accessible to authenticated users and can be retrieved easily whenever required.
- Transform –
The original data gets converted here into some different format that is required by the user.
The other 4 important phases of Data Curation life-cycle model are as follows:
- Conceptualize –
Conceptualizing data means using various methods for generating data, storing data and then capturing data. In the phase basically data is collected and created.
- Dispose –
If some data is of no use then this unwanted data is discarded and disposed to generate space for the new data in this phase.
- Reappraise –
Data that fails validation process is returned back in this phase.
- Migrate –
Data is migrated at various places and is also converted according to new environment depending on the need.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.