Characteristics of Biological Data (Genome Data Management)
There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities.
Characteristics of Biological Information:
- There is a high amount and range of variability in data.
There should be a flexibility in biological systems so that it can handle data types and values. Placing constraints on data types must be limited with such a wide range of possible data values. There can be a loss of information when there is exclusion of such values.
- There will be a difference in representation of the same data by different biologists.
This can be done even using the same system. There is a multiple ways to model any given entity with the results often reflecting the particular focus of the scientist.
There should be a linking of data elements in a network of schemas.
- Defining the complex queries and also important to the biologists.
Complex queries must be supported by biological systems. Knowledge of the data structure is needed for the average users because with the help of this knowledge average user can construct a complex query across data sets on their own. For this systems must provide some tools for building these queries.
- When compared with most other domains or applications, biological data becomes highly complex.
Such data must ensure that no information is lost during biological data modelling and such data must be able to represent a complex substructure of data as well as relationships. An additional context is provided by the structure of the biological data for interpretation of the information.
- There is a rapid change in schemas of biological databases.
There should be a support of schema evolution and data object migration so that there can be an improved information flow between generations or releases of databases.
The relational database systems support the ability to extend the schema and a frequent occurrence in the biological setting.
- Most biologists are not likely to have knowledge of internal structure of the database or about schema design.
Users need an information which can be displayed in a manner such that it can be applicable to the problem which they are trying to address. Also the data structure should be reflected in an easy and understandable manner. An information regarding the meaning of the schema is not provided to the user because of the failure by the relational schemas. A present search interfaces is provided by the web interfaces, which may limit access into the database.
- There is no need of the write access to the database by the users of biological data, instead they only require read access.
There is limitation of write access to the privileged users called curators. There are only small numbers of users which require write access but a wide variety of read access patterns are generated by the users into the databases.
- Access to “old” values of the data are required by the users of biological data most often while verifying the previously reported results.
Hence system of archives must support the changes to the values of the data in the database. Access to both the most recent version of data value and its previous version are important in the biological domain.
- Added meaning is given by the context of data for its use in biological applications.
Whenever appropriate, context must be maintained and conveyed to the user. For the maximization of the interpretation of a biological data value, it should be possible to integrate as many contexts as possible.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.