NOTES - PART
III
The Relational
Data Model
(Based on
Chapter 7 in
Fundamentals of Database Systems by Elmasri and Navathe, Ed.
3)
1 Relational Model Concepts
2 Characteristics of Relations
3 Relational Integrity
Constraints
3.1 Key Constraints
3.2 Entity Integrity Constraints
3.3 Referential Integrity Constraints
4 Update Operations on Relations
5 Relational Algebra Operations
5.1 SELECT s and PROJECT
P
5.2 Set Operations
5.3 JOIN Operations
5.4 Additional Relational Operations
1. Relational Model Concepts
BASIS OF THE
MODEL
• The
relational Model of Data is based on the concept of a Relation.
• A Relation is a mathematical concept based on the
ideas of sets.
• The strength
of the relational approach to data management comes from the formal foundation
provided by the theory of relations.
• We review
the essentials of the relational approach in this chapter.
__________________________________________________________
1. The
relational model is covered in Chapter 7 of the book Fundamentals of Database
Systems, by Elmasri and S.B. Navathe, Ed. III, 2000.
2. The model
was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper -
"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.
INFORMAL DEFINITIONS
• RELATION:
A
table of values
• A relation
may be thought of as a set of rows.
• A relation
may alternately be though of as a set of
columns.
• Each row of the relation may be given an identifier.
• Each column
typically is called by its column name or column
header or attribute name.
FORMAL DEFINITIONS
• A Relation may be defined in multiple
ways.
• The Schema of a Relation:
R (A1,
A2, .....An)
Relation R is defined over attributes A1, A2, .....An
For Example
-
CUSTOMER
(Cust-id, Cust-name, Address, Phone#)
Here, CUSTOMER is a relation defined over the four
attributes Cust-id, Cust-name, Address, Phone#,
each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers.
• A tuple is an ordered set of values
• Each value
is derived from an appropriate domain.
• Each row in
the CUSTOMER table may be called as a tuple in the table and would consist of
four values.
<632895, "John Smith", "
• A relation
may be regarded as a set of tuples (rows).
• Columns in
a table are also called as attributes of the relation.
FORMAL DEFINITIONS (contd.)
• The
relation is formed over the cartesian product of the sets; each set has values
from a domain; that domain is used in a specific role which is conveyed by the
attribute name.
• For example,
attribute Cust-name is defined over the domain of strings of 25
characters. The role these strings play
in the CUSTOMER relation is that of the name of customers.
• Formally,
Given R(A1,
A2, .........., An)
r(R) subset-of dom
(A1) X dom (A2) X ....X dom(An)
R: schema of
the relation
r of R: a
specific "value" or population of R.
• R is also
called the intension of a relation
r is also
called the extension of a relation
Let S1 = {0,1}
Let S2 = {a,b,c}
Let R subset-of S1 X S2
for example: r(R) = {<0,a> , <0,b> ,
<1,c> }
DEFINITION SUMMARY
Informal Terms Formal
Terms
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema
of Relation
Populated Table Extension
Notes:
Whereas languages like SQL use the informal terms of
TABLE (e.g. CREATE TABLE), COLUMN (e.g. SYSCOLUMN variable), the relational
database textbooks present the model and operations on it using the formal
terms.

2
Characteristics of Relations
Ordering of tuples in a relation r(R): The tuples are not considered to be
ordered, even though they appear to be in the tabular form.
Ordering of attributes in a relation schema R (and of values within each
tuple): We will consider the attributes in R(A1, A2, ...,
An) and the values in t=<v1, v2, ..., vn>
to be ordered .
(However, a more general alternative definition of relation does not require this ordering).
Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or
inapplicable to certain tuples.
Notation:
- We refer to component values of a tuple t by t[Ai]
= vi (the value of attribute Ai for tuple t).
Similarly, t[Au, Av,
..., Aw] refers to the subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively.

3 Relational
Integrity Constraints
Constraints are conditions that must hold on all valid relation
instances. There are three main types of constraints:
Key
constraints, entity integrity
constraints, and referential integrity
constraints
3.1 Key
Constraints
Superkey of R: A set of attributes SK of R such that no two tuples
in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t1
and t2 in r(R), t1[SK] <> t2[SK].
Key of R: A "minimal" superkey; that is, a superkey
K such that removal of any attribute from K results in a set of attributes that
is not a superkey.
Example: The CAR relation schema:
CAR(State, Reg#,
SerialNo, Make, Model, Year)
has two keys Key1 = {State,
Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a
superkey but not a key.
If a relation has several
candidate keys, one is chosen
arbitrarily to be the primary key.
The primary key attributes are underlined.

3.2 Entity
Integrity
Relational Database Schema: A set S of relation schemas that belong to the
same database. S is the name of the database.
S = {R1, R2, ..., Rn}
Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have
null values in any tuple of r(R). This is because primary key values are used
to identify the individual tuples.
t[PK] <>
null for any tuple t in r(R)
Note: Other attributes of R may
be similarly constrained to disallow
null values, even though they are not members of the primary key.
3.3
Referential Integrity
A constraint involving two relations
(the previous constraints involve a single relation).
Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation.
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference
the primary key attributes PK of the referenced
relation R2. A tuple t1
in R1 is said to reference
a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity
constraint can be displayed in a relational database schema as a directed arc
from R1.FK to R2.



4 Update
Operations on Relations
- INSERT a tuple.
- DELETE a tuple.
- MODIFY a tuple.
- Integrity constraints should
not be violated by the update operations.
- Several update operations may
have to be grouped together.
- Updates may propagate to cause other updates automatically. This
may be necessary to maintain integrity constraints.
- In case of integrity violation,
several actions can be taken:
- cancel the operation that
causes the violation (REJECT optiom)
- perform the operation but inform
the user of the violation
- trigger additional updates so
the violation is corrected (CASCADE option, SET NULL option)
- execute a user-specified
error-correction routine
5 The
Relational Algebra
- Operations to manipulate
relations.
- Used to specify retrieval
requests (queries).
- Query result is in the form of
a relation.
Relational Operations:
5.1 SELECT s and PROJECT P operations.
5.2 Set operations: These
include UNION U, INTERSECTION | |, DIFFERENCE -, CARTESIAN PRODUCT X.
5.3 JOIN operations X.
5.4 Other relational
operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.
5.1 SELECT s and
PROJECT P
SELECT operation (denoted bys ):
- Selects the tuples (rows) from
a relation R that satisfy a certain selection
condition c
- Form of the operation: s c(R)
- The condition c is an arbitrary
Boolean expression on the attributes of R
- Resulting relation has the same attributes as R
- Resulting relation includes
each tuple in r(R) whose attribute values satisfy the condition c
Examples:
s DNO=4(EMPLOYEE)
s SALARY>30000(EMPLOYEE)
s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
PROJECT operation (denoted byP ):
- Keeps only certain attributes
(columns) from a relation R specified in an attribute
list L
- Form of operation: P L(R)
- Resulting relation has only
those attributes of R specified in L
Example: P FNAME,LNAME,SALARY(EMPLOYEE)
- The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains
a mathematical set (no duplicate elements)
Example: P SEX,SALARY(EMPLOYEE)
If several male employees
have salary 30000, only a single tuple <M, 30000> is kept in the
resulting relation.
Duplicate tuples are eliminated by the P operation.

Sequences of operations:
- Several operations can be
combined to form a relational algebra
expression (query)
Example: Retrieve the names and
salaries of employees who work in department 4:
P FNAME,LNAME,SALARY
(s
DNO=4(EMPLOYEE) )
- Alternatively, we specify explicit intermediate relations for each
step:
DEPT4_EMPS <-s DNO=4(EMPLOYEE)
R <-P FNAME,LNAME,SALARY(DEPT4_EMPS)
- Attributes can optionally be renamed
in the resulting left-hand-side relation (this may be required for some
operations that will be presented later):
DEPT4_EMPS <-s DNO=4(EMPLOYEE)
R(FIRSTNAME,LASTNAME,SALARY)
<-
P FNAME,LNAME,SALARY(DEPT4_EMPS)

5.2 Set Operations
- Binary operations from
mathematical set theory:
INTERSECTION: R1 | | R2,
SET DIFFERENCE: R1 - R2,
CARTESIAN PRODUCT: R1
X R2.
- For U, | |, -, the operand
relations R1(A1, A2, ..., An) and R2(B1,
B2, ..., Bn) must have the same number of attributes, and
the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi)
for i=1, 2, ..., n. This condition is called union compatibility.
- The resulting relation for U, |
|, or - has the same attribute names as
the first operand relation R1 (by
convention).

CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, B2,
..., Bn) <-
R1(A1,
A2, ..., Am) X R2 (B1, B2,
..., Bn)
- A tuple t exists in R for each
combination of tuples t1 from R1 and t2 from R2
such that:
t[A1, A2,
..., Am]=t1 and t[B1, B2, ..., Bn]=t2
- If R1 has n1 tuples and R2 has n2
tuples, then R will have n1*n2 tuples.
- CARTESIAN PRODUCT is a meaningless operation on its own. It can combine related tuples from
two relations if followed by the
appropriate SELECT operation .
Example: Combine each DEPARTMENT
tuple with the EMPLOYEE tuple of the manager.
DEP_EMP <-DEPARTMENT X
EMPLOYEE
DEPT_MANAGER <-s MGRSSN=SSN(DEP_EMP)

5.3 JOIN Operations
THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a SELECT.
The condition c is called a join
condition.
R(A1, A2,
..., Am, B1, B2, ..., Bn) <-
R1(A1, A2,
..., Am) X c R2 (B1, B2,
..., Bn)
EQUIJOIN: The join condition c includes one or more equality comparisons involving attributes from R1 and R2.
That is, c is of the form:
(Ai=Bj)
AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN
operation:
Ai, ..., Ah
are called the join attributes of R1
Bj, ..., Bk
are called the join attributes of R2
Example of using EQUIJOIN:
Retrieve each DEPARTMENT's
name and its manager's name:
T <-DEPARTMENT XMGRSSN=SSN
EMPLOYEE
RESULT <-P DNAME,FNAME,LNAME(T)
NATURAL JOIN (*):
In an EQUIJOIN R <- R1 X c R2,
the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN,
the redundant join attributes of R2 are eliminated from R. The
equality condition is implied and need not be specified.
R <- R1 *(join attributes
of R1),(join attributes of R2) R2
Example: Retrieve each EMPLOYEE's
name and the name of the DEPARTMENT he/she works for:
T<- EMPLOYEE *(DNO),(DNUMBER)
DEPARTMENT
RESULT <-P FNAME,LNAME,DNAME(T)
If the join attributes have the same names in both relations, they need not be specified and we
can write R <- R1 * R2.
Example: Retrieve each EMPLOYEE's name
and the name of his/her SUPERVISOR:
SUPERVISOR(SUPERSSN,SFN,SLN)<-
P SSN,FNAME,LNAME(EMPLOYEE)
T<-EMPLOYEE * SUPERVISOR
RESULT <-P FNAME,LNAME,SFN,SLN(T)

Note: In the original definition of
NATURAL JOIN, the join attributes were required to have the same names in both relations.
There can be a more than one set of join attributes with a different
meaning between the same two
relations. For example:
JOIN ATTRIBUTES RELATIONSHIP
EMPLOYEE.SSN= EMPLOYEE manages
DEPARTMENT.MGRSSN the
DEPARTMENT
EMPLOYEE.DNO= EMPLOYEE works for
DEPARTMENT.DNUMBER the
DEPARTMENT
Example: Retrieve each EMPLOYEE's
name and the name of the DEPARTMENT he/she works for:
T<-EMPLOYEE XDNO=DNUMBERDEPARTMENT
RESULT <-P FNAME,LNAME,DNAME(T)
A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES RELATIONSHIP
EMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises
EMPLOYEE(2).SSN EMPLOYEE(1)
- One can think of this as joining two distinct copies of the relation, although only one
relation actually exists
- In this case, renaming
can be useful
Example: Retrieve each EMPLOYEE's
name and the name of his/her SUPERVISOR:
SUPERVISOR(SSSN,SFN,SLN)<-
P SSN,FNAME,LNAME(EMPLOYEE)
T<-EMPLOYEE XSUPERSSN=SSSNSUPERVISOR
RESULT <-P FNAME,LNAME,SFN,SLN(T)
Complete Set of Relational Algebra Operations:
- All the operations discussed so
far can be described as a sequence of only the operations SELECT, PROJECT,
- Hence, the set {s ,P , U, - , X } is called a complete set of relational
algebra operations. Any query language equivalent
to these operations is called relationally complete.
- For database applications,
additional operations are needed that were not part of the original relational algebra.
These include:
1. Aggregate functions and
grouping.
2. OUTER JOIN and OUTER
5.4 Additional
Relational Operations
AGGREGATE FUNCTIONS
- Functions such as SUM, COUNT,
AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in
database applications
<grouping attributes> F<function list> (R)
- The grouping attributes are
optional
Example 1: Retrieve the average salary
of all employees (no grouping needed):
R(AVGSAL) <- F
AVERAGE SALARY (EMPLOYEE)
Example 2: For each department,
retrieve the department number, the number of employees, and the average salary
(in the department):
R(DNO,NUMEMPS,AVGSAL) <-
DNO F COUNT SSN, AVERAGE
SALARY (EMPLOYEE)
DNO is called the grouping
attribute in the above example

OUTER JOIN
- In a regular EQUIJOIN or
NATURAL JOIN operation, tuples in R1 or R2 that do not
have matching tuples in the other relation do
not appear in the result
- Some queries require all tuples
in R1 (or R2 or
both) to appear in the result
- When no matching tuples are
found, nulls are placed for the
missing attributes
- LEFT OUTER JOIN: R1 X R2 lets every tuple in R1 appear in
the result
- RIGHT OUTER JOIN: R1
X R2 lets every tuple in R2 appear in
the result
- FULL OUTER JOIN: R1 X R2
lets every tuple in R1
or R2 appear in the result
