Open In App

Data Mining – Time-Series, Symbolic and Biological Sequences Data

Last Updated : 21 Feb, 2022
Like Article

Data mining refers to extracting or mining knowledge from large amounts of data. In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns. Theoreticians and practitioners are continually seeking improved techniques to make the process more efficient, cost-effective, and accurate. 

This article discusses Sequence data. Evaluation of data reached the maximum extent and may still peruse in the future. To generalize the evaluation of data we classify them as Sequence Data, Graphs, and Network Mining, another kind of data.

Data Mining

A sequence is an ordered list of events. Sequences data are classified based on characteristics as:

  • Time-Series data (data with respect to time)
  • Symbolic data (data with laps in an interval of time)
  • Biological data (data related to DNA and protein)

Time-Series Data:

In this type of sequence, the data are of numeric data type recorded at a regular level. They are generated by an economic process like Stock Market analysis, Medical Observations. They are useful for studying natural phenomena.

Nowadays these times series are used for piecewise data approximations for further analysis. In this time-series data, we find a subsequence that matches the query we search.

  • Time Series Forecasting: Forecasting is a method of making predictions based on past and present data to know what happens in the future. Trend analysis is a method of forecasting Time Series. It is a function that generates historic patterns in time series that are used in short and long-term predictions. We can obtain various patterns in time series like cyclic movements, trend movements, seasonal movements as we see they are with respect to time or season. ARIMA, SARIMA, long memory time series modeling are some of the popular methods for such analysis

Symbolic Data:

This type of ordered set of elements or events is recorded with or without a concrete notion of time. Some symbolic sequences such as customer shopping sequences, web clickstreams are examples of symbolic data. Sequential pattern mining is mainly used for symbolic sequence

Constraint-based pattern matching is one of the best ways to interact with user-defined data. Apriori is an Algorithm used for this type of analysis Below is an example of a symbolic date where we see customers c1 and c2 are purchasing products at different time intervals

Tid Time Cid Event(purchase products) 
t1 11:45:30 c1 wheat, rice, fruit
t2 11:36:50 c2 rice, fruit
t1 12:00:01 c1 juice, rice
t2 01:00:34 c2 sugar, milk

Biological Data:

They are made of DNA and protein sequences. They are very long and complicated but have some hidden meaning. These types of data are used for the sequence of nucleotides or amino acids. These analyses are used for aligning, indexes, analyze biological sequence and play a crucial role in bioinformatics and modern biology. Substitution trees are used to find the probabilities of amino acids and probabilities of intersections. BLAST-Basic Local Alignment Search Tool is the most effective tool for biological sequence. 

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads