Big Data includes huge valume, high velocity, and extensible variaty of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.
- Structured data –
Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. It concern all data which can be stored in database SQL in table with rows and columns. They have relational key and can easily mapped into pre-designed fields. Today, those data are most processed in development and simplest way to manage information. Example: Relational data.
- semi-structured data –
Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyze. With some process, you can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data.
- Unstructured data –
Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.
Differences between Structured, Semi-structured and Unstructured data:
|Properties||Structured data||Semi-structured data||Unstructured data|
|Technology||It is based on Relational database table||It is based on XML/RDF||It is based on character and binary data|
|Transaction management||Matured transaction and various concurrency technique||Transaction is adapted from DBMS not matured||No transaction management and no concurrency|
|Version management||Versioning over tuples,row,tables||Versioning over tuples or graph is possible||Versioned as whole|
|Flexibility||It is sehema dependent and less flexible||It is more flexible than structuded data but less than flexible than unstructured data||it very flexible and there is abbsence of schema|
|Scalability||It is very difficult to scale DB schema||It’s scaling is simpler than sstructured data||It is very scalable|
|Robustness||Very robust||New technology, not very spread||—|
|Query performance||Structured query allow complex joining||Queries over anonymous nodes are possible||Only textual query are possible|
- Large objects(LOBs) for Semi Structured and Unstructured Data
- What is Unstructured Data?
- What is Structured Data?
- What is Semi-structured data?
- Difference between Data Scientist, Data Engineer, Data Analyst
- Difference between Data Warehousing and Data Mining
- Difference between a Data Analyst and a Data Scientist
- Difference between data type and data structure
- Difference between Data Warehouse and Data Mart
- Structured Query Language (SQL)
- Difference between Stack and Queue Data Structures
- Difference Between High-level Data Link Control (HDLC) and Point-to-Point Protocol (PPP)
- Characteristics of Biological Data (Genome Data Management)
- Data Mining | Sources of Data that can be mined
- Data Preprocessing in Data Mining
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.