Schema Integration in DBMS
Definition: Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases. For large databases with many expected users and applications, the integration approach of designing individual schema and then merging them can be used. Because the individual views can be kept relatively small and simple. Schema Integration is divided into the following subtask.
1. Identifying correspondences and conflicts among the schema:
As the schemas are designed individually it is necessary to specify constructs in the schemas that represent the same real-world concept. We must identify these correspondences before proceeding with the integration. During this process, several types of conflicts may occur such as:
- Naming conflict –
Naming conflicts are of two types synonyms and homonyms. A synonym occurs when two schemas use different names to describe the same concept, for example, an entity type CUSTOMER in one schema may describe an entity type CLIENT in another schema. A homonym occurs when two schemas use the same name to describe different concepts. For example, an entity type Classes may represent TRAIN classes in one schema and AEROPLANE classes in another schema.
- Type conflicts –
A similar concept may be represented in two schemas by different modeling constructs. For example, DEPARTMENT may be an entity type in one schema and an attribute in another.
- Domain conflicts –
A single attribute may have different domains in different schemas. For example, we may declare Ssn as an integer in one schema and a character string in another. A conflict of the unit of measure could occur if one schema represented weight in pounds and the other used kgs.
- Conflicts among constraints –
Two schemas may impose different constraints, for example, the KEY of an entity type may be different in each schema.
2. Modifying views to conform to one another:
Some schemas are modified so that they conform to other schemas more closely. Some of the conflicts that may occur during the first steps are resolved in this step.
3. Merging of Views and Restructuring:
The global schemas are created by merging the individual schemas. Corresponding concepts are represented only once in the global schema and mapping between the views and the global schemas are specified. This is the hardest step to achieve in real-world databases which involve hundreds of entities and relations. It involves a considerable amount of human intervention and negotiation to resolve conflicts and to settle on the most reasonable and acceptable solution for a global schema. Restructuring As a final optional step the global schemas may be analyzed and restructured to remove any redundancies or unnecessary complexity.
Schema integration is the process of combining multiple database schemas into a single schema, which can be used to support data integration, data sharing, and other data management tasks. Schema integration is necessary when working with multiple databases or when integrating data from multiple sources, such as in data warehousing, data integration, and business intelligence applications.
The process of schema integration involves several steps:
- Identify the source schemas: The first step in schema integration is to identify the schemas of the databases or data sources that need to be integrated.
- Analyze the source schemas: Once the source schemas have been identified, they should be analyzed to identify common attributes and data structures that can be used to integrate the data.
- Define the target schema: The target schema is the schema that will be used to represent the integrated data. The target schema should be designed to support the requirements of the application or task for which the data will be used.
- Map the source schemas to the target schema: The next step in schema integration is to map the attributes and data structures from the source schemas to the target schema. This involves identifying the common attributes and creating mappings between the source and target schema.
- Merge the schemas: Once the source schemas have been mapped to the target schema, the schemas can be merged to create a single schema that represents the integrated data.
- Resolve conflicts: Inevitably, conflicts will arise during the schema integration process, such as data type conflicts, naming conflicts, or conflicts in data models. These conflicts must be resolved to ensure the integrity of the integrated data.
- Test the integrated schema: The final step in schema integration is to test the integrated schema to ensure that it meets the requirements of the application or task for which the data will be used.
Provides a unified view of data: Schema integration enables the creation of a single database that can be accessed by different users, departments or applications. This makes it easier for users to access and work with data from different sources.
Reduces data redundancy: By combining multiple databases into a single integrated schema, data redundancy can be reduced, leading to more efficient use of storage space and improved data consistency.
Increases productivity: An integrated schema simplifies data management and enables users to work more efficiently. This can lead to increased productivity and reduced costs.
Enables better data analysis: An integrated schema provides a more comprehensive view of data, making it easier to analyze and identify patterns and relationships between different data sources.
Complexity: Integrating multiple database schemas into a single schema can be a complex and time-consuming process. It requires a detailed understanding of each database schema and the relationships between them.
Data inconsistencies: Combining multiple database schemas can result in data inconsistencies if the schemas are not properly integrated. This can lead to errors and incorrect results when querying the integrated database.
Performance issues: The performance of the integrated database may be negatively impacted if the integration is not properly optimized. This can result in slower query response times and reduced system performance.
Security concerns: Integrating multiple databases into a single schema can increase the risk of security breaches, as it can be more difficult to control access to data from different sources. Proper security measures must be put in place to prevent unauthorized access to sensitive data.