Open In App

Last Minute Notes – DBMS

Database Management System is an organized collection of interrelated data that helps in accessing data quickly, along with efficient insertion, and deletion of data into the DBMS. DBMS organizes data in the form of tables, schemas, records, etc. 

DBMS over File System



The file system has numerous issues, which were resolved with the help of DBMS, the issues with the file system are:

ER-Model

ER Diagram: An ER diagram is a model of a logical view of the database which is represented using the following components:



Some Important Terms

Cardinality of DBMS: Cardinality of relation expresses the maximum number of possible relationship occurrences for an entity participating in a relationship. Cardinality of a relationship can be defined as the number of times an entity of an entity set participates in a relationship set. Let’s suppose a binary relationship R between two entity sets A and B.  The relationship must have one of the following mapping cardinalities:

The most commonly asked question in ER diagram is the minimum number of tables required for a given ER diagram. Generally, the following criteria are used:    

Cardinality Minimum No. of tables
1:1 cardinality with partial participation of both entities 2
1:1 cardinality with a total participation of at least 1 entity 1
1:n cardinality 2
m:n cardinality 3

Note: This is a general observation. Special cases need to be taken care of. We may need an extra table if the attribute of a relationship can’t be moved to any entity side. 

Specialization: It is a  top-down approach in which one entity is divided/specialized into two or more sub-entities based on its characteristics.

Generalization: It is a bottom-up approach in which common properties of two or more subentities are combined/generalized to form one entity. It is exactly the reverse of Specialization. In this, two or lower level entities are generalized to one higher level entity.

Aggregation: Aggregation is an abstraction process through which relationships are represented as higher-level entity sets. 

Participation Constraint: It specifies the maximum or a minimum number of relationship instances in which any entity can participate. In simple words, participation means how an entity is linked to a relationship.

Database Design

Database design Goals: The prime goal of designing a database is:

According to E.F. Codd, “All the records of the table must be unique”. 

Keys of a relation: There are various types of keys in a relation which are:

Functional Dependency: It is a constraint that specifies the association/ relationship between a set of attributes. In functional dependency, one set can accurately determine the value of another set.  It is represented as A->B, where set A can determine the values of set B correctly. The A is known as the Determinant, and B is known as the Dependent. 

Functional dependencies are further categorized into two types:

Armstrong’s Axioms: It is a statement that is always considered true and used as a starting point for further arguments. Armstrong axiom is used to generate a closure set in a relational database. 

Armstrong Axiom

Attribute Closure(X+): All attributes of the set are functionally determined by X. 

Example: If the relation R(ABCD) {A->B, B->C, C->D}, then the attribute closure of 

A will be (A+)={ABCD} [A can determine B, B can determine C, C can determine D]
B will be (B+)={BCD}    [B can determine C, C can determine D]
C will be (C+)={CD}      [C can determine D]      
D will be (D+)={D}        [D can determine itself]    

Note: With the help of Attribute closure, we can easily determine the Superkey [The set of attributes whose closure contains all attributes of a relation] of a relation, So in the above example A is the superkey of the given relation.  There can be more than one superkey in a relationship. 

Example: If the relation R(ABCDE) {A->BC, CD->E, B->D, E->A}, then the attribute closure will be

A+= {ABCDE}
B+= {BD}
C+= {C}
D+= {D}
E+= {ABCDE}

Equivalence sets of Functional Dependency: If two sets of a functional dependency are equivalent, i.e. if A+= B+. Every FD in A can be inferred from B, and every FD in B can be inferred from A, then A and B are functionally equivalent. 

Minimal set of Functional Dependency: A set of FD  will be minimal if it satisfies the following conditions:

Normalization: Normalization is used to eliminate the following anomalies:

Normalization was introduced to achieve integrity in the database and make the database more maintainable.

1. First Normal Form: A relation is in first normal form if it does not contain any multi-valued or composite attribute.  If the data is in 1NF then it will have high redundancy. 

2. Second Normal Form: A relation is in the second normal form if it is in the first normal form and if it does not contain any partial dependency.

Partial Dependency

           Y->A will be partial dependency iff, Y is a proper subset of candidate key, and A is a non-prime attribute. 

3. Third Normal Form: A relation is in the third normal form if it is in the second normal form and it does not contain any transitive dependency. For a relation to be in Third Normal Form, either LHS of FD should be super key or RHS should be the prime attribute.

4. Boyce-Codd Normal Form: A relation is in Boyce-Codd Normal Form if the LHS of every FD is super key. The relationship between Normal Forms can be represented as 1NF, 2NF, 3NF or BCNF.

Design Goal 1NF 2NF 3NF BCNF
Zero Redundancy High redundancy Less than 1NF Less than 2NF No redundancy
Loss-less decomposition Always Always Always Always
Dependency preservation Always Always Always Sometimes Not possible

Properties of Decomposition:

If F1, F2, F3…..Fn ≣ F, then the decomposition is dependency preserving otherwise not. 

Data Retrieval (SQL, RA)

Commands to Access Database: For efficient data retrieval, insertion, deletion, updation, etc. The commands in the Database are categorized into three categories, which are as follows:

Query Language: Language using which any user can retrieve some data from the database. 

Note: Relational model is a theoretical framework RDBMS is its implementation. 

Relational Algebra: Procedural language with basic and extended operators.  

Basic Operator Semantic
σ(Selection) Select rows based on a given condition
π (Projection) Project some columns
X (Cross Product/ Cartesian Product) Cross product of relations, returns m*n rows where m and n are numbers of rows in R1 and R2 respectively.
U (Union) Return those tuples which are either in R1 or R2. 
Maximum number of rows returned = m+n 
Minimum number of rows returned = max(m,n)
– (Minus) R1-R2 returns those tuples which are in R1 but not in R2. 
Maximum number of rows returned =
Minimum number of rows returned = m-n
ρ(Rename) Renaming a relation to another relation.
Extended Operator Semantic
(Intersection) Returns those tuples which are in both relation R1 and R2. 
Maximum number of rows returned = min(m,n) 
Minimum number of rows returned = 0

(Conditional Join)
Selection from two or more tables based on some condition (Cross product followed by selection)
⋈(Equi Join) It is a special case of conditional join when only an equality condition is applied between attributes.

(Natural Join)
In natural join, equality condition on common attributes holds, and duplicate attributes are removed by default. Note: Natural Join is equivalent to the cross product of two relations have no attribute in common and the natural join of a relation R with itself will return R only.
⟕(Left Outer Join) When applying join on two relations R and S, Left Outer Joins gives all tuples of R in the result set. The tuples of R which do not satisfy the join condition will have values as NULL for attributes of S.
⟖(Right Outer Join) When applying join on two relations R and S, Right Outer Joins gives all tuples of S in the result set. The tuples of S which do not satisfy the join condition will have values as NULL for attributes of R.
⟗(Full Outer Join) When applying join on two relations R and S, Full Outer Joins gives all tuples of S and all tuples of R in the result set. The tuples of S which do not satisfy the join condition will have values as NULL for attributes of R and vice versa.
/     (Division Operator) Division operator A/B will return those tuples in A which is associated with every tuple of B.
Note: Attributes of B should be a proper subset of attributes of A. 
The attributes in A/B will be Attributes of A- Attribute of B.

Relational Calculus: Relational calculus is a non-procedural query language. It explains what to do but not how to do it. It is of two types:

SQL: Structured Query Language, lets you access or modify databases. SQL can execute queries, retrieve data, insert records, update records, delete records, create a new database, create new tables, create views, and set permissions on tables, procedures, or views. 

SQL Commands:

Operator Meaning
SELECT Selects columns from a relation or set of relations. It defines WHAT is to be returned. 
Note: As opposed to Relational Algebra, it may give duplicate tuples for the repeated values of an attribute.
FROM FROM is used to define the Table(s) or View(s) used by the SELECT or WHERE statements
WHERE WHERE is used to define what records are to be included in the query. It uses conditional operators. 
EXISTS EXISTS is used to check whether the result of a correlated nested query is empty (contains no tuples) or not.
GROUP BY GROUP BY is used to group the tuples based on some attribute or set of attributes like counting the number of students GROUP BY the department.
ORDER BY ORDER BY is used to sort the fetched data in either ascending or descending according to one or more columns.
Aggregate functions Find the aggregated value of an attribute. Used mostly with GROUP BY. e.g.; count, sum, min max. 
select count(*) from the student group by dept_id
Note: we can select only those columns which are part of GROUP BY.
Nested Queries When one query is a part of another query. 
UPDATE It is used to update records in a table.
DELETE It is used to delete rows in a table.
LIKE LIKE operator is used with the WHERE clause to search a specified pattern in a column.
IN  IN operator is used to specify multiple values in the WHERE clause.
BETWEEN It selects values within a range. 
Aliases It is used to temporarily rename a table or a column heading. 
HAVING The HAVING clause was added because the WHERE keyword could be used with aggregate functions. 

File Structure

File organization: It is the logical relation between records and it defines how file records are mapped into disk blocks(memory). A database is a collection of files, each file is a collection of records, and each record contains a sequence of fields. The blocking Factor is the average number of records per block. 

Strategies for storing files of records in block:

File organizations is of following types:

Sequential File: In this method, files are stored in sequential order one after another.

Index File: 

Indexing Type:

1. Single level Index

2. Multilevel Index

Transaction and Concurrency Control

A transaction is a unit of instruction or set of instructions that performs a logical unit of work. Transaction processes are always atomic in nature either they will execute completely or do not execute. 

Transaction Properties:

Schedule: Sequences in which instructions of the concurrent transactions get executed. Schedules are of two types: 

∩ (Intersection)Serializability:  A schedule is said to be serializable if it is equivalent to a serial schedule. It is categorized into two categories: Conflict Seriablizability, and View Serializability. 

Conflict Serializability: A schedule will be conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. It is a polynomial-time problem. 

View Serializability: A schedule will be view serializable if it is view equivalent to a serial schedule. It is an NP-Complete Problem. 

Types of Schedule based recoverability:

Concurrency Control with Locks: 

To achieve consistency,  isolation is the most important concept. Isolation can be achieved using locking very easily. A schedule acquires a lock prior to accessing the transaction and the lock is released when the transaction is completed. A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules. 

Lock Types:

Binary Locks

Shared/Exclusive Locks:

Two-Phase Locking: This protocol requires that each transaction in a schedule will be two phases: i.e. Growing phase and the shrinking phase. 

Time Stamp Ordering Protocols: Time stamp protocols assign a globally unique timestamp value to each transaction and produce the order for transaction submission. The timestamp protocol is free from deadlock, it may suffer from starvation but it ensures serializability.

Thomas Write Rule: It modify time-stamping protocol in obsolete write case, when Ti requests write (Q) if TS (Ti) < WTS(Q) then roll back T. Otherwise execute successfully. Set RTS (A)= max{TS(T), RTS (A)}.

Thomas write rule is View serializable, hence it guarantees consistency, but there is no guarantee of cascadeness and recoverable system because there is no restriction on write. 

 See Last Minute Notes on all subjects here.


Article Tags :