NOTES - PART
III
The Relational
Data Model
(Based on
Chapter 7 in
Fundamentals of Database Systems by Elmasri and Navathe, Ed.
3)
1 Relational Model Concepts
2 Characteristics of Relations
3 Relational Integrity
Constraints
3.1 Key Constraints
3.2 Entity Integrity Constraints
3.3 Referential Integrity Constraints
4 Update Operations on Relations
5 Relational Algebra Operations
5.1 SELECT s and
PROJECT P
5.2 Set Operations
5.3 JOIN Operations
5.4 Additional Relational Operations
1. Relational Model Concepts
BASIS OF THE
MODEL
• The
relational Model of Data is based on the concept of a Relation.
• A Relation is a mathematical concept based on the
ideas of sets.
• The
strength of the relational approach to data management comes from the formal
foundation provided by the theory of relations.
• We review
the essentials of the relational approach in this chapter.
__________________________________________________________
1. The
relational model is covered in Chapter 7 of the book Fundamentals of Database
Systems, by Elmasri and S.B. Navathe, Ed. III, 2000.
2. The model
was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper -
"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.
INFORMAL DEFINITIONS
• RELATION:
A
table of values
• A relation
may be thought of as a set of rows.
• A relation
may alternately be though of as a set of
columns.
• Each row of the relation may be given an identifier.
• Each
column typically is called by its column name or column header or attribute name.
FORMAL DEFINITIONS
• A Relation may be defined in multiple
ways.
• The Schema of a Relation:
R (A1,
A2, .....An)
Relation R is defined over attributes A1, A2, .....An
For Example
-
CUSTOMER
(Cust-id, Cust-name, Address, Phone#)
Here, CUSTOMER is a relation defined over the four
attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers.
• A tuple is an ordered set of values
• Each value
is derived from an appropriate domain.
• Each row
in the CUSTOMER table may be called as a tuple in the table and would consist
of four values.
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a triple belonging to the CUSTOMER relation.
• A relation
may be regarded as a set of tuples (rows).
• Columns in
a table are also called as attributes of the relation.
FORMAL DEFINITIONS (contd.)
• The
relation is formed over the cartesian product of the sets; each set has values
from a domain; that domain is used in a specific role which is conveyed by the
attribute name.
• For
example, attribute Cust-name is defined over the domain of strings of 25
characters. The role these strings play
in the CUSTOMER relation is that of the name of customers.
• Formally,
Given R(A1,
A2, .........., An)
r(R) subset-of dom
(A1) X dom (A2) X ....X dom(An)
R: schema of
the relation
r of R: a
specific "value" or population of R.
• R is also
called the intension of a relation
r is also
called the extension of a relation
Let S1 = {0,1}
Let S2
= {a,b,c}
Let R subset-of S1 X S2
for example: r(R) = {<0.a> , <0,b> ,
<1,c> }
DEFINITION SUMMARY
Informal Terms Formal
Terms
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema
of Relation
Populated Table Extension
Notes:
Whereas languages like SQL use the informal terms of
TABLE (e.g. CREATE TABLE), COLUMN (e.g. SYSCOLUMN variable), the relational
database textbooks present the model and operations on it using the formal
terms.

2
Characteristics of Relations
Ordering of tuples in a relation r(R): The tuples are not considered to be
ordered, even though they appear to be in the tabular form.
Ordering of attributes in a relation schema R (and of values within each
tuple): We will consider the attributes in R(A1, A2, ...,
An) and the values in t=<v1, v2, ..., vn>
to be ordered .
(However, a more general alternative definition of relation does not require this ordering).
Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values
that are unknown or inapplicable to certain tuples.
Notation:
- We refer to component values of a tuple t by t[Ai]
= vi (the value of attribute Ai for tuple t).
Similarly, t[Au, Av,
..., Aw] refers to the subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively.

3 Relational
Integrity Constraints
Constraints are conditions that must hold on all valid relation instances. There are three
main types of constraints:
Key
constraints, entity integrity
constraints, and referential integrity
constraints
3.1 Key
Constraints
Superkey of R: A set of attributes SK of R such that no two tuples
in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t1
and t2 in r(R), t1[SK] <> t2[SK].
Key of R: A "minimal" superkey; that is, a superkey
K such that removal of any attribute from K results in a set of attributes that
is not a superkey.
Example: The CAR relation schema:
CAR(State, Reg#,
SerialNo, Make, Model, Year)
has two keys Key1 = {State,
Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a
superkey but not a key.
If a relation has several
candidate keys, one is chosen
arbitrarily to be the primary key.
The primary key attributes are underlined.

3.2 Entity
Integrity
Relational Database Schema: A set S of relation schemas that belong to the
same database. S is the name of the database.
S = {R1, R2, ..., Rn}
Entity Integrity: The primary key attributes PK of each relation schema R in S cannot
have null values in any tuple of r(R). This is because primary key values are
used to identify the individual tuples.
t[PK] <>
null for any tuple t in r(R)
Note: Other attributes of R may
be similarly constrained to disallow
null values, even though they are not members of the primary key.
3.3
Referential Integrity
A constraint involving two relations
(the previous constraints involve a single relation).
Used to specify a relationship among tuples in two relations: the referencing relation and the referenced
relation.
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference
the primary key attributes PK of the referenced
relation R2. A tuple t1
in R1 is said to reference
a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity
constraint can be displayed in a relational database schema as a directed arc
from R1.FK to R2.



4 Update
Operations on Relations
- INSERT a tuple.
- DELETE a tuple.
- MODIFY a tuple.
- Integrity constraints should
not be violated by the update operations.
- Several update operations may
have to be grouped together.
- Updates may propagate to cause other updates automatically. This may be necessary to
maintain integrity constraints.
- In case of integrity violation,
several actions can be taken:
- cancel the operation that
causes the violation (REJECT optiom)
- perform the operation but
inform the user of the violation
- trigger additional updates so
the violation is corrected (CASCADE option, SET NULL option)
- execute a user-specified
error-correction routine
5 The
Relational Algebra
- Operations to manipulate
relations.
- Used to specify retrieval
requests (queries).
- Query result is in the form of
a relation.
Relational Operations:
5.1 SELECT s and PROJECT P operations.
5.2 Set operations: These
include UNION U, INTERSECTION | |, DIFFERENCE -, CARTESIAN PRODUCT X.
5.3 JOIN operations X.
5.4 Other relational
operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.
5.1 SELECT s and PROJECT P
SELECT operation (denoted bys ):
- Selects the tuples (rows) from
a relation R that satisfy a certain selection
condition c
- Form of the operation: s c(R)
- The condition c is an arbitrary
Boolean expression on the attributes of R
- Resulting relation has the same attributes as R
- Resulting relation includes
each tuple in r(R) whose attribute values satisfy the condition c
Examples:
s DNO=4(EMPLOYEE)
s SALARY>30000(EMPLOYEE)
s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
PROJECT operation (denoted byP ):
- Keeps only certain attributes
(columns) from a relation R specified in an attribute
list L
- Form of operation: P L(R)
- Resulting relation has only
those attributes of R specified in L
Example: P FNAME,LNAME,SALARY(EMPLOYEE)
- The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains
a mathematical set (no duplicate elements)
Example: P SEX,SALARY(EMPLOYEE)
If several male employees
have salary 30000, only a single tuple <M, 30000> is kept in the
resulting relation.
Duplicate tuples are eliminated by the P operation.

Sequences of operations:
- Several operations can be
combined to form a relational algebra
expression (query)
Example: Retrieve the names and salaries
of employees who work in department 4:
P FNAME,LNAME,SALARY
(s
DNO=4(EMPLOYEE) )
- Alternatively, we specify explicit intermediate relations for each
step:
DEPT4_EMPS <-s DNO=4(EMPLOYEE)
R <-P FNAME,LNAME,SALARY(DEPT4_EMPS)
- Attributes can optionally be renamed
in the resulting left-hand-side relation (this may be required for some
operations that will be presented later):
DEPT4_EMPS <-s DNO=4(EMPLOYEE)
R(FIRSTNAME,LASTNAME,SALARY)
<-
P FNAME,LNAME,SALARY(DEPT4_EMPS)

5.2 Set Operations
- Binary operations from
mathematical set theory:
UNION: R1 U R2,
INTERSECTION: R1 | | R2,
SET DIFFERENCE: R1 - R2,
CARTESIAN PRODUCT: R1
X R2.
- For U, | |, -, the operand
relations R1(A1, A2, ..., An) and R2(B1,
B2, ..., Bn) must have the same number of attributes, and
the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi)
for i=1, 2, ..., n. This condition is called union compatibility.
- The resulting relation for U, |
|, or - has the same attribute names as
the first operand relation R1 (by convention).

CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, B2,
..., Bn) <-
R1(A1,
A2, ..., Am) X R2 (B1, B2,
..., Bn)
- A tuple t exists in R for each
combination of tuples t1 from R1 and t2 from R2
such that:
t[A1, A2,
..., Am]=t1 and t[B1, B2, ..., Bn]=t2
- If R1 has n1 tuples and R2 has n2
tuples, then R will have n1*n2 tuples.
- CARTESIAN PRODUCT is a meaningless operation on its own. It can combine related tuples from
two relations if followed by the
appropriate SELECT operation .
Example: Combine each DEPARTMENT
tuple with the EMPLOYEE tuple of the manager.
DEP_EMP <-DEPARTMENT X
EMPLOYEE
DEPT_MANAGER <-s MGRSSN=SSN(DEP_EMP)

5.3 JOIN Operations
THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a
SELECT. The condition c is called a join
condition.
R(A1, A2,
..., Am, B1, B2, ..., Bn) <-
R1(A1, A2, ..., Am) X c R2 (B1, B2, ..., Bn)
EQUIJOIN: The join condition c includes one or more equality comparisons involving attributes from R1 and
R2. That is, c is of the form:
(Ai=Bj)
AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN
operation:
Ai, ..., Ah
are called the join attributes of R1
Bj, ..., Bk
are called the join attributes of R2
Example of using EQUIJOIN:
Retrieve each DEPARTMENT's
name and its manager's name:
T <-DEPARTMENT XMGRSSN=SSN
EMPLOYEE
RESULT <-P DNAME,FNAME,LNAME(T)
NATURAL JOIN (*):
In an EQUIJOIN R <- R1 X c R2,
the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN,
the redundant join attributes of R2 are eliminated from R. The
equality condition is implied and need not be specified.
R <- R1 *(join attributes
of R1),(join attributes of R2) R2
Example: Retrieve each EMPLOYEE's
name and the name of the DEPARTMENT he/she works for:
T<- EMPLOYEE *(DNO),(DNUMBER)
DEPARTMENT
RESULT <-P FNAME,LNAME,DNAME(T)
If the join attributes have the same names in both relations, they need not be specified and
we can write R <- R1 * R2.
Example: Retrieve each EMPLOYEE's
name and the name of his/her SUPERVISOR:
SUPERVISOR(SUPERSSN,SFN,SLN)<-
P