A data warehouse is meant for only query and analysis rather than transaction processing. The data warehouse is essentially subject-oriented, non-volatile, integrated, time-variant, and consists of historical data stored over long periods of time. A blueprint of BI and data mining algorithms.
Data marts :
Data marts are subsidiary data warehouses and more in lines of OLTP and have transient data, normalized tables, perform transactions, cannot slice or dice data or rollup or drill down, etc.
Spatial Databases :
Spatial databases are derived from the geospatial consortium and used to draw or describe 3D geometric shapes as regular polygons, can add onto or modify existing 3D geometric shapes, have observer reference, and inherently endowed with 3D spatial indexes as HH Code and Z-Order.
Text Databases :
Text databases are available in text files and word documents and respond to Ad Hoc queries.
Nominal / Ordinal / Interval / Ratio attributes :
- Nominal attributes –
Order of data is not important though matters are the difference in intervals of data.
Example: zip codes of a country etc.
- Ordinal attributes –
Order of data is important though not important is a difference in intervals of data e.gs socio-economic status.
- Interval attributes –
Important is both orders of data and difference in intervals of data.
- Ratio attributes –
Important is the order of data, the difference in intervals of data, and 0.0that stands for none or no data items e.gs enzymes, concentrations, the temperature in KELVIN.
Components of a Data Warehouse :
- Centralized database.
- Query and optimization tools.
- Data warehouse bus architecture.
- Data marts.
Decision Tree :
A Decision tree is a data mining algorithm based and uses a top-down design and uses the ID3 Algorithm that finds the homogeneity of data items and entropy measures the amount to which data items are homogeneous and for perfectly homogeneous data items entropy is a 0 and where data items are split in an attribute entropy is a 1. We can find entropy from a frequency table either considering one attribute of a single frequency table or 2 or more attributes of a frequency table.
Artificial Neural Networks or ANNs are simulation models based on AI and machine learning and simulate target nerve cells and interconnecting synaptic channels. Feed-forward ANNs where feedback loop is absent and presence of single or multi-layers and recurrent ANNs where feedback loop occur and presence of single or multiple layers.
Discrete and Continuous attributes :
Discrete attributes are statistical, continuous attributes are qualitative measures. Discrete attributes are non-overlapping, mutually inclusive, both limiting values occur and continuous attributes are overlapping, mutually exclusive, only one limiting value occurs. Discrete attributes are represented in isolated points, continuous attributes are represented in connected points on a graph.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.