Data Mining | Sources of Data that can be mined

In this post, we will discuss what are different sources of data that are used in data mining process. The data from multiple sources are integrated into a common source known as Data Warehouse.

Let’s discuss what type of data can be mined:

  1. Flat Files
  2. Relational Databases
  3. DataWarehouse
  4. Transactional Databases
  5. Multimedia Databases
  6. Spatial Databases
  7. Time Series Databases
  8. World Wide Web(WWW)
  1. Flat Files
    • Flat files is defined as data files in text form or binary form with a structure that can be easily extracted by data mining algorithms.
    • Data stored in flat files have no relationship or path among themselves, like if a relational database is stored on flat file, then there will be no relations between the tables.
    • Flat files are represented by data dictionary. Eg: CSV file.
    • Application: Used in DataWarehousing to store data, Used in carrying data to and from server, etc.
  2. Relational Databases
    • A Relational database is defined as the collection of data organized in tables with rows and columns.
    • Physical schema in Relational databases is a schema which defines the structure of tables.
    • Logical schema in Relational databases is a schema which defines the relationship among tables.
    • Standard API of relational database is SQL.
    • Application: Data Mining, ROLAP model, etc.
  3. DataWarehouse
    • A datawarehouse is defined as the collection of data integrated from multiple sources that will queries and decision making.
    • There are three types of datawarehouse: Enterprise datawarehouse, Data Mart and Virtual Warehouse.
    • Two approaches can be used to update data in DataWarehouse: Query-driven Approach and Update-driven Approach.
    • Application: Business decision making, Data mining, etc.
  4. Transactional Databases
    • Transactional databases is a collection of data organized by time stamps, date, etc to represent transaction in databases.
    • This type of database has the capability to roll back or undo its operation when a transaction is not completed or committed.
    • Highly flexible system where users can modify information without changing any sensitive information.
    • Follows ACID property of DBMS.
    • Application: Banking, Distributed systems, Object databases, etc.
  5. Multimedia Databases
    • Multimedia databases consists audio, video, images and text media.
    • They can be stored on Object-Oriented Databases.
    • They are used to store complex information in a pre-specified formats.
    • Application: Digital libraries, video-on demand, news-on demand, musical database, etc.
  6. Spatial Database
    • Store geographical information.
    • Stores data in the form of coordinates, topology, lines, polygons, etc.
    • Application: Maps, Global positioning, etc.
  7. Time-series Databases
    • Time series databases contains stock exchange data and user logged activities.
    • Handles array of numbers indexed by time, date, etc.
    • It requires real-time analysis.
    • Application: eXtremeDB, Graphite, InfluxDB, etc.
  8. WWW
    • WWW refers to World wide web is a collection of documents and resources like audio, video, text, etc which are identified by Uniform Resource Locators (URLs) through web browsers, linked by HTML pages, and accessible via the Internet network.
    • It is the most heterogeneous repository as it collects data from multiple resources.
    • It is dynamic in nature as Volume of data is continuously increasing and changing.
    • Application: Online shopping, Job search, Research, studying, etc.



Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.




Practice Tags :

Recommended Posts:



2 Average Difficulty : 2/5.0
Based on 1 vote(s)