NOTES - PART III

The Relational Data Model

(Based on Chapter 7 in

Fundamentals of Database Systems by Elmasri and Navathe, Ed. 3)

 

1   Relational Model Concepts

2   Characteristics of Relations

3   Relational Integrity Constraints

     3.1    Key Constraints

     3.2    Entity Integrity Constraints

     3.3    Referential Integrity Constraints

4   Update Operations on Relations

5   Relational Algebra Operations

     5.1    SELECT s   and PROJECT P

     5.2    Set Operations

     5.3    JOIN Operations

     5.4    Additional Relational Operations


 

1. Relational Model Concepts

 

BASIS OF THE MODEL

 

 

  The relational Model of Data is based on the concept of a Relation.

 

  A Relation is a mathematical concept based on the ideas of sets.

 

  The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations.

 

  We review the essentials of the relational approach in this chapter.

 

__________________________________________________________

 

1.  The relational model is covered in Chapter 7 of the book Fundamentals of Database Systems, by Elmasri and S.B. Navathe, Ed. III, 2000.

 

2.  The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper - "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.

 

 

INFORMAL DEFINITIONS

 

 

 

 

  RELATION:  A table of values

 

  A relation may be thought of as a set of rows.

 

  A relation may alternately be though of as a set of         columns.

 

  Each row of the relation may be given an identifier.

 

  Each column typically is called by its column name or column header or attribute name.

 

 

 


 

FORMAL DEFINITIONS

 

  A Relation may be defined in multiple ways.

  The Schema of a Relation:

    R (A1, A2, .....An)

Relation R is defined over attributes A1, A2, .....An

 

  For Example -

      CUSTOMER (Cust-id, Cust-name, Address, Phone#)

 

Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#,  each of which has a domain or a set of valid values.  For example, the domain of Cust-id is 6 digit numbers.

  A tuple is an ordered set of values

  Each value is derived from an appropriate domain.

  Each row in the CUSTOMER table may be called as a tuple in the table and would consist of four values.

<632895, "John Smith", "101 Main St. Atlanta, GA  30332", "(404) 894-2000"> is a triple belonging to the CUSTOMER relation.

  A relation may be regarded as a set of tuples (rows).

  Columns in a table are also called as attributes of the relation.

 


 

FORMAL DEFINITIONS (contd.)

 

  The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name.

 

  For example, attribute Cust-name is defined over the domain of strings of 25 characters.  The role these strings play in the CUSTOMER relation is that of the name of customers.

 

  Formally,

    Given R(A1, A2, .........., An)

      r(R) subset-of dom (A1) X dom (A2) X ....X dom(An)

 

R:  schema of the relation

r of R:  a specific "value" or population of R.

 

  R is also called the intension of a relation

   r is also called the extension of a relation

 

Let S1 = {0,1}

Let  S2 =  {a,b,c}

 

Let R subset-of S1 X S2

 

for example: r(R) = {<0.a> , <0,b> , <1,c> }

DEFINITION SUMMARY

 

Informal Terms                           Formal Terms

 

Table                                           Relation

Column                                       Attribute/Domain

Row                                             Tuple

Values in a column                      Domain

Table Definition                          Schema of Relation

Populated Table                          Extension

 

 

Notes:

 

Whereas languages like SQL use the informal terms of TABLE (e.g. CREATE TABLE), COLUMN (e.g. SYSCOLUMN variable), the relational database textbooks present the model and operations on it using the formal terms.



 

2 Characteristics of Relations

 

Ordering of tuples in a relation r(R): The tuples are not  considered to be ordered, even though they appear to be in the tabular form.

 

Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .

(However, a more general alternative definition  of relation does not require this ordering).

 

Values in a tuple: All values are considered atomic  (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.

 

Notation:

-  We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t).

   Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.


 

 


 

3 Relational Integrity Constraints

 

Constraints are conditions  that must hold on all  valid relation instances. There are three main types of constraints:

Key constraints, entity integrity constraints, and referential integrity constraints

 

3.1 Key Constraints

 

Superkey of R: A set of attributes SK of R such that no two tuples in any valid relation instance r(R)  will have the same value for SK.  That is, for any distinct tuples t1 and t2 in r(R), t1[SK] <> t2[SK].

Key of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey.

Example: The CAR relation schema:

CAR(State, Reg#, SerialNo, Make, Model, Year)

has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not  a key.

 

If a relation has several  candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.


 

 


 

3.2 Entity Integrity

 

Relational Database Schema: A set S of relation schemas that belong to the same database. S is the name  of the database.

S = {R1, R2, ..., Rn}

 

Entity Integrity: The primary key attributes  PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify  the individual tuples.

t[PK] <> null for any tuple t in r(R)

 

Note: Other attributes of R may be similarly constrained  to disallow null values, even though they are not members of the primary key.


 

3.3 Referential Integrity

 

A constraint involving two  relations (the previous constraints involve a single  relation).

 

Used to specify a relationship  among tuples in two relations: the referencing relation and the referenced relation.

 

Tuples in the referencing relation  R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation  R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

 

A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.


 

 


 

 


 

 


 

4 Update Operations on Relations

 

-  INSERT a tuple.

-  DELETE a tuple.

-  MODIFY a tuple.

 

-  Integrity constraints should not be violated by the update operations.

 

-  Several update operations may have to be grouped together.

 

-  Updates may propagate  to cause other updates automatically. This may be necessary to maintain integrity constraints.

 

-  In case of integrity violation, several actions can be taken:

   - cancel the operation that causes the violation (REJECT optiom)

   - perform the operation but inform the user of the violation

   - trigger additional updates so the violation is corrected (CASCADE option, SET NULL option)

   - execute a user-specified error-correction routine


 

5 The Relational Algebra

 

-  Operations to manipulate relations.

-  Used to specify retrieval requests (queries).

-  Query result is in the form of a relation.

 

Relational Operations:

5.1    SELECT s   and PROJECT P  operations.

5.2    Set operations: These include UNION  U, INTERSECTION  | |, DIFFERENCE  -, CARTESIAN PRODUCT  X.

5.3    JOIN operations  X.

5.4    Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.


 

5.1 SELECT s   and PROJECT P

 

SELECT operation (denoted bys ):

 

-  Selects the tuples (rows) from a relation R that satisfy a certain selection condition  c

-  Form of the operation: s c(R)

 

-  The condition c is an arbitrary Boolean expression on the attributes of R

 

-  Resulting relation has the same attributes  as R

 

-  Resulting relation includes each tuple in r(R) whose attribute values satisfy the condition c

 

Examples:

              s DNO=4(EMPLOYEE)

          s SALARY>30000(EMPLOYEE)

s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)


 

PROJECT operation (denoted byP  ):

 

-  Keeps only certain attributes (columns) from a relation R specified in an attribute list  L

 

-  Form of operation: P L(R)

 

-  Resulting relation has only those attributes of R specified in L

 

Example:  P FNAME,LNAME,SALARY(EMPLOYEE)

 

-  The PROJECT operation eliminates duplicate tuples  in the resulting relation so that it remains a mathematical set (no duplicate elements)

Example:  P SEX,SALARY(EMPLOYEE)

If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation.

Duplicate tuples are eliminated by the P  operation.


 

 

 


 

Sequences of operations:

 

-  Several operations can be combined to form a relational algebra expression  (query)

Example: Retrieve the names and salaries of employees who work in department 4:

 P FNAME,LNAME,SALARY (s DNO=4(EMPLOYEE) )

 

-  Alternatively, we specify explicit intermediate relations for each step:

      DEPT4_EMPS <-s DNO=4(EMPLOYEE)

   R <-P FNAME,LNAME,SALARY(DEPT4_EMPS)

 

-  Attributes can optionally be renamed  in the resulting left-hand-side relation (this may be required for some operations that will be presented later):

    DEPT4_EMPS <-s DNO=4(EMPLOYEE)

 R(FIRSTNAME,LASTNAME,SALARY) <-

            P FNAME,LNAME,SALARY(DEPT4_EMPS)


 

 


 

 5.2 Set Operations

 

-  Binary operations from mathematical set theory:

   UNION: R1 U R2,

   INTERSECTION: R1 | | R2,

   SET DIFFERENCE: R1  - R2,

   CARTESIAN PRODUCT: R1  X R2.

 

-  For U, | |, -, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility.

 

-  The resulting relation for U, | |, or -  has the same attribute names as the first  operand relation R1 (by convention).


 

 

 


 

CARTESIAN PRODUCT

R(A1, A2, ..., Am, B1, B2, ..., Bn) <-

          R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn)

 

-  A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that:

  t[A1, A2, ..., Am]=t1 and t[B1, B2, ..., Bn]=t2

 

- If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.

 

-  CARTESIAN PRODUCT is a meaningless operation  on its own. It can combine related tuples  from two relations if followed by the appropriate SELECT operation .

 

Example: Combine each DEPARTMENT tuple with the EMPLOYEE tuple of the manager.

DEP_EMP <-DEPARTMENT X EMPLOYEE

DEPT_MANAGER <-s MGRSSN=SSN(DEP_EMP)


 

 

 


 

 5.3 JOIN Operations

 

THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a SELECT. The condition c is called a join condition.

R(A1, A2, ..., Am, B1, B2, ..., Bn) <-

      R1(A1, A2, ..., Am)  X c  R2 (B1, B2, ..., Bn)

 

EQUIJOIN: The join condition c includes one or more equality comparisons  involving attributes from R1 and R2. That is, c is of the form:

(Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n

 

In the above EQUIJOIN operation:

Ai, ..., Ah are called the join attributes of R1

Bj, ..., Bk are called the join attributes of R2

 

Example of using EQUIJOIN:

Retrieve each DEPARTMENT's name and its manager's name:

T <-DEPARTMENT XMGRSSN=SSN EMPLOYEE

RESULT <-P DNAME,FNAME,LNAME(T)


 

NATURAL JOIN (*):

In an EQUIJOIN R <-  R1 X c R2, the join attribute of R2 appear redundantly  in the result relation R. In a NATURAL JOIN, the redundant join attributes  of R2 are eliminated  from R. The equality condition is implied  and need not be specified.

R <- R1 *(join attributes of R1),(join attributes of R2) R2

Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she works for:

T<- EMPLOYEE *(DNO),(DNUMBER) DEPARTMENT

RESULT <-P FNAME,LNAME,DNAME(T)

 

If the join attributes have the same names  in both relations, they need not be specified  and we can write R <- R1 * R2.

Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR:

SUPERVISOR(SUPERSSN,SFN,SLN)<-

                  P