Large objects(LOBs) for Semi Structured and Unstructured Data

Semi-structured data

Semi-structured data is the data which does not conform to a data model but has some structure. It lacks a fixed or rigid schema. It is the data that does not reside in a rational database but that have some organisational properties that make it easier to analyse. With some process, we can store them in the relational database.

Characteristics of semi-structured Data:

  • Data does not conforms to a data model but has some structure.
  • Data can not be stored in the form of rows and columns as in Databases
  • Semi-structured data contains tags and elements (Metadata) which is used to group data and describe how the data is stored
  • Similar entities are grouped together and organised in a hierarchy
  • Entities in the same group may or may not have the same attributes or properties
  • Does not contains sufficient metadata which makes automation and management of data difficult
  • Size and type of the same attributes in a group may differ
  • Due to lack of a well defined structure, it can not used by computer programs easily

Using LOBs for Semi structured Data

Document files such as XML documents or word processor files are examples of semi-structured data. These types of documents contain data in a logical structure that is interpreted or processed by an application, and it is not broken down into smaller logical units when stored in the database.

Those applications which are having semi structured data typically use large amount of character data. For storing and manipulating this kind of data, Character Large Object (CLOB) and National Character Large Object (NCLOB) datatypes are available.

Binary File objects (BFILE datatypes) can also used to store character data. BFILES can be also used to load read-only data from operating system into CLOB or NCLOB instances so that you can manipulate data in your application.

Unstructured data

Unstructured data is the data which does not conform to a data model and has no easily identifiable structure such that it can not be used by a computer program easily. Unstructured data is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database.

Characteristics of Unstructured Data:

  • Data neither conforms to a data model nor has any structure.
  • Data can not be stored in the form of rows and columns as in Databases
  • Data does not follows any semantic or rules
  • Data lacks any particular format or sequence
  • Data has no easily identifiable structure
  • Due to lack of identifiable structure, it can not used by computer programs easily

Using LOBs for Unstructured Data

Unstructured data cannot be broken into standard components. For example data of an employee can be separated/displayed as a name, which is stored as string; ID number, stored as an integer, the salary of employee & so on whereas on the other hand, A photograph consists of a long stream of 1s and 0s. These bits are manipulated to switch pixels as On & Off so that we can see the pictures on display, but they are not broken down into any structure for database storage.

Also, unstructured data like graphics images, still video clips, motion videos and sound waveform tends to be large in size whereas a typical employee record may be equals to few hundred of bytes, while even small size of multimedia data can be equals to thousands of times larger.

Ideal datatypes which are used for large amount of unstructured data includes BLOB datatype (Binary Large Object) and the BFILE datatype (Binary file object).



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.