Open In App

Types of Sources of Data in Data Mining

In this post, we will discuss what are different sources of data that are used in data mining process. The data from multiple sources are integrated into a common source known as Data Warehouse. Let’s discuss what type of data can be mined:

  1. Flat Files
    • Flat files is defined as data files in text form or binary form with a structure that can be easily extracted by data mining algorithms.
    • Data stored in flat files have no relationship or path among themselves, like if a relational database is stored on flat file, then there will be no relations between the tables.
    • Flat files are represented by data dictionary. Eg: CSV file.

In summary, flat files are a simple and efficient way to store and transfer small to medium-sized data sets, but they are not well-suited for large data sets or complex data relationships.

  1. Relational Databases
    • A Relational database is defined as the collection of data organized in tables with rows and columns.
    • Physical schema in Relational databases is a schema which defines the structure of tables.
    • Logical schema in Relational databases is a schema which defines the relationship among tables.
    • Standard API of relational database is SQL.
  1. DataWarehouse
    • A datawarehouse is defined as the collection of data integrated from multiple sources that will queries and decision making.
    • There are three types of datawarehouse: Enterprise datawarehouse, Data Mart and Virtual Warehouse.
    • Two approaches can be used to update data in DataWarehouse: Query-driven Approach and Update-driven Approach.
    • Application: Business decision making, Data mining, etc.
  2. Transactional Databases
    • Transactional databases is a collection of data organized by time stamps, date, etc to represent transaction in databases.
    • This type of database has the capability to roll back or undo its operation when a transaction is not completed or committed.
    • Highly flexible system where users can modify information without changing any sensitive information.
    • Follows ACID property of DBMS.
    • Application: Banking, Distributed systems, Object databases, etc.
  3. Multimedia Databases
    • Multimedia databases consists audio, video, images and text media.
    • They can be stored on Object-Oriented Databases.
    • They are used to store complex information in a pre-specified formats.
    • Application: Digital libraries, video-on demand, news-on demand, musical database, etc.
  4. Spatial Database
    • Store geographical information.
    • Stores data in the form of coordinates, topology, lines, polygons, etc.
    • Application: Maps, Global positioning, etc.
  5. Time-series Databases
    • Time series databases contains stock exchange data and user logged activities.
    • Handles array of numbers indexed by time, date, etc.
    • It requires real-time analysis.
    • Application: eXtremeDB, Graphite, InfluxDB, etc.
  6. WWW
    • WWW refers to World wide web is a collection of documents and resources like audio, video, text, etc which are identified by Uniform Resource Locators (URLs) through web browsers, linked by HTML pages, and accessible via the Internet network.
    • It is the most heterogeneous repository as it collects data from multiple resources.
    • It is dynamic in nature as Volume of data is continuously increasing and changing.
    • Application: Online shopping, Job search, Research, studying, etc.
  7. Structured Data: This type of data is organized into a specific format, such as a database table or spreadsheet. Examples include transaction data, customer data, and inventory data.
  8. Semi-Structured Data: This type of data has some structure, but not as much as structured data. Examples include XML and JSON files, and email messages.
  9. Unstructured Data: This type of data does not have a specific format, and can include text, images, audio, and video. Examples include social media posts, customer reviews, and news articles.
  10. External Data: This type of data is obtained from external sources such as government agencies, industry reports, weather data, satellite images, GPS data, etc.
  11. Time-Series Data: This type of data is collected over time, such as stock prices, weather data, and website visitor logs.
  12. Streaming Data: This type of data is generated continuously, such as sensor data, social media feeds, and log files.
  13. Relational Data: This type of data is stored in a relational database, and can be accessed through SQL queries.
  14. NoSQL Data: This type of data is stored in a NoSQL database, and can be accessed through a variety of methods such as key-value pairs, document-based, column-based or graph-based.
  15. Cloud Data: This type of data is stored and processed in cloud computing environments such as AWS, Azure, and GCP.
  16. Big Data: This type of data is characterized by its huge volume, high velocity, and high variety, and can be stored and processed using big data technologies such as Hadoop and Spark.
Article Tags :