Notes
Slide Show
Outline
1
 
2
Chapter 2
  • Database System Concepts and Architecture
3
Outline
  • Data Models and Their Categories
  • History of Data Models
  • Schemas, Instances, and States
  • Three-Schema Architecture
  • Data Independence
  • DBMS Languages and Interfaces
  • Database System Utilities and Tools
  • Centralized and Client-Server Architectures
  • Classification of DBMSs
4
Data Models
  • Data Model:
    • A set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey.
  • Data Model Structure and Constraints:
    • Constructs are used to define the database structure
    • Constructs typically include elements (and their data types) as well as groups of elements (e.g. entity, record, table), and relationships among such groups
    • Constraints specify some restrictions on valid data; these constraints must be enforced at all times
5
Data Models (continued)
  • Data Model Operations:
    • These operations are used for specifying database retrievals and updates by referring to the constructs of the data model.
    • Operations on the data model may include basic model operations (e.g. generic insert, delete, update) and user-defined operations (e.g. compute_student_gpa, update_inventory)
6
Categories of Data Models
  • Conceptual (high-level, semantic) data models:
    • Provide concepts that are close to the way many users perceive data.
      • (Also called entity-based or object-based data models.)
  • Physical (low-level, internal) data models:
    • Provide concepts that describe details of how data is stored in the computer. These are usually specified in an ad-hoc manner through DBMS design and administration manuals
  • Implementation (representational) data models:
    • Provide concepts that fall between the above two, used by many commercial DBMS implementations (e.g. relational data models used in many commercial systems).
7
Schemas versus Instances
  • Database Schema:
    • The description of a database.
    • Includes descriptions of the database structure, data types, and the constraints on the database.
  • Schema Diagram:
    • An illustrative display of (most aspects of) a database schema.
  • Schema Construct:
    • A component of the schema or an object within the schema, e.g., STUDENT, COURSE.
8
Schemas versus Instances
  • Database State:
    • The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database.
    • Also called database instance (or occurrence or snapshot).
      • The term instance  is also applied to individual database components, e.g. record instance, table instance, entity instance
9
Database Schema
vs. Database State
  • Database State:
    • Refers to the content of a database at a moment in time.
  • Initial Database State:
    • Refers to the database state when it is initially loaded into the system.
  • Valid State:
    • A state that satisfies the structure and constraints of the database.
10
Database Schema
vs. Database State (continued)
  • Distinction
    • The database schema changes very infrequently.
    • The database state changes every time the database is updated.


  • Schema is also called intension.
  • State is also called extension.
11
Example of a Database Schema
12
Example of a database state
13
Three-Schema Architecture
  • Proposed to support DBMS characteristics of:
    • Program-data independence.
    • Support of multiple views of the data.
  • Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization
14
Three-Schema Architecture
  • Defines DBMS schemas at three levels:
    • Internal schema at the internal level to describe physical storage structures and access paths (e.g indexes).
      • Typically uses a physical data model.
    • Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users.
      • Uses a conceptual or an implementation data model.
    • External schemas at the external level to describe the various user views.
      • Usually uses the same data model as the conceptual schema.
15
The three-schema architecture
16
Three-Schema Architecture
  • Mappings among schema levels are needed to transform requests and data.
    • Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution.
    • Data extracted from the internal DBMS level is reformatted to match the user’s external view (e.g. formatting the results of an SQL query for display in a Web page)
17
Data Independence
  • Logical Data Independence:
    • The capacity to change the conceptual schema without having to change the external schemas and their associated application programs.
  • Physical Data Independence:
    • The capacity to change the internal schema without having to change the conceptual schema.
    • For example, the internal schema may be changed when certain file structures are reorganized or new indexes are created to improve database performance
18
Data Independence (continued)
  • When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence.
  • The higher-level schemas themselves are unchanged.
    • Hence, the application programs need not be changed since they refer to the external schemas.
19
DBMS Languages
  • Data Definition Language (DDL)
  • Data Manipulation Language (DML)
    • High-Level or Non-procedural Languages: These include the relational language SQL
      • May be used in a standalone way or may be embedded in a programming language
    • Low Level or Procedural Languages:
      • These must be embedded in a programming language


20
DBMS Languages
  • Data Definition Language (DDL):
    • Used by the DBA and database designers to specify the conceptual schema of a database.
    • In many DBMSs, the DDL is also used to define internal and external schemas (views).
    • In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas.
      • SDL is typically realized via DBMS commands provided to the DBA and database designers


21
DBMS Languages
  • Data Manipulation Language (DML):
    • Used to specify database retrievals and updates
    • DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java.
      • A library of functions can also be provided to access the DBMS from a programming language
    • Alternatively, stand-alone DML commands can be applied directly (called a query language).
22
Types of DML
  • High Level or Non-procedural Language:
    • For example, the SQL relational language
    • Are “set”-oriented and specify what data to retrieve rather than how to retrieve it.
    • Also called declarative languages.
  • Low Level or Procedural Language:
    • Retrieve data one record-at-a-time;
    • Constructs such as looping are needed to retrieve multiple records, along with positioning pointers.
23
DBMS Interfaces
  • Stand-alone query language interfaces
    • Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE)
  • Programmer interfaces for embedding DML in programming languages
  • User-friendly interfaces
    • Menu-based, forms-based, graphics-based, etc.
24
DBMS Programming Language Interfaces
  • Programmer interfaces for embedding DML in a programming languages:
    • Embedded Approach: e.g embedded SQL (for C, C++, etc.), SQLJ (for Java)
    • Procedure Call Approach: e.g. JDBC for Java, ODBC for other programming languages
    • Database Programming Language Approach: e.g. ORACLE has PL/SQL, a programming language based on SQL; language incorporates SQL and its data types as integral components

25
User-Friendly DBMS Interfaces

    • Menu-based, popular for browsing on the web
    • Forms-based, designed for naïve users
    • Graphics-based
      • (Point and Click, Drag and Drop, etc.)
    • Natural language: requests in written English
    • Combinations of the above:
      • For example, both menus and forms used extensively in Web database interfaces
26
Other DBMS Interfaces
    • Speech as Input and Output
    • Web Browser as an interface
    • Parametric interfaces, e.g., bank tellers using function keys.
    • Interfaces for the DBA:
      • Creating user accounts, granting authorizations
      • Setting system parameters
      • Changing schemas or access paths
27
Database System Utilities
  • To perform certain functions such as:
    • Loading data stored in files into a database. Includes data conversion tools.
    • Backing up the database periodically on tape.
    • Reorganizing database file structures.
    • Report generation utilities.
    • Performance monitoring utilities.
    • Other functions, such as sorting, user monitoring, data compression, etc.
28
Other Tools
  • Data dictionary / repository:
    • Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc.
    • Active data dictionary is accessed by DBMS software and users/DBA.
    • Passive data dictionary is accessed by users/DBA only.
29
Other Tools
  • Application Development Environments and CASE (computer-aided software engineering) tools:
  • Examples:
    • PowerBuilder (Sybase)
    • JBuilder (Borland)
    • JDeveloper 10G (Oracle)
30
Typical DBMS Component Modules
31
Centralized and
Client-Server DBMS Architectures
  • Centralized DBMS:
    • Combines everything into single system including- DBMS software, hardware, application programs, and user interface processing software.
    • User can still connect through a remote terminal – however, all processing is done at centralized site.
32
A Physical Centralized Architecture
33
Basic 2-tier Client-Server Architectures
  • Specialized Servers with Specialized functions
    • Print server
    • File server
    • DBMS server
    • Web server
    • Email server
  • Clients can access the specialized servers as needed
34
Logical two-tier client server architecture
35
Clients
  • Provide appropriate interfaces through a client software module to access and utilize the various server resources.
  • Clients may be diskless machines or PCs or Workstations with disks with only the client software installed.
  • Connected to the servers via some form of a network.
    • (LAN: local area network, wireless network, etc.)
36
DBMS Server
  • Provides database query and transaction services to the clients
  • Relational DBMS servers are often called SQL servers, query servers, or transaction servers
  • Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as:
    • ODBC: Open Database Connectivity standard
    • JDBC: for Java programming access
  • Client and server must install appropriate client module and server module software for ODBC or JDBC
  • See Chapter 9
37
Two Tier Client-Server Architecture
  • A client program may connect to several DBMSs, sometimes called the data sources.
  • In general, data sources can be files or other non-DBMS software that manages data.
  • Other variations of clients are possible: e.g., in some object DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc.
38
Three Tier Client-Server Architecture
  • Common for Web applications
  • Intermediate Layer called Application Server or Web Server:
    • Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server
    • Acts like a conduit for sending partially processed data between the database server and the client.
  • Three-tier Architecture Can Enhance Security:
    • Database server only accessible via middle tier
    • Clients cannot directly access database server
39
Three-tier client-server architecture
40
Classification of DBMSs
  • Based on the data model used
    • Traditional: Relational, Network, Hierarchical.
    • Emerging: Object-oriented, Object-relational.
  • Other classifications
    • Single-user (typically used with personal computers)
      vs. multi-user (most DBMSs).
    • Centralized (uses a single computer with one database)
      vs. distributed (uses multiple computers, multiple databases)
41
Variations of Distributed DBMSs (DDBMSs)
  • Homogeneous DDBMS
  • Heterogeneous DDBMS
  • Federated or Multidatabase Systems
  • Distributed Database Systems have now come to be known as client-server based database systems because:
    • They do not support a totally distributed environment, but rather a set of database servers supporting a set of clients.

42
Cost considerations for DBMSs
  • Cost Range: from free open-source systems to configurations costing millions of dollars
  • Examples of free relational DBMSs: MySQL, PostgreSQL, others
  • Commercial DBMS offer additional specialized modules, e.g. time-series module, spatial data module, document module, XML module
    • These offer additional specialized functionality when purchased separately
    • Sometimes called cartridges (e.g., in Oracle) or blades
  • Different licensing options: site license, maximum number of concurrent users (seat license), single user, etc.
43
History of Data Models
  • Network Model
  • Hierarchical Model
  • Relational Model
  • Object-oriented Data Models
  • Object-Relational Models
44
History of Data Models
  • Network Model:
    • The first network DBMS was implemented by Honeywell in 1964-65 (IDS System).
    • Adopted heavily due to the support by CODASYL (Conference on Data Systems Languages) (CODASYL - DBTG report of 1971).
    • Later implemented in a large variety of systems - IDMS (Cullinet - now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P. (Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next COMPAQ, now H.P.).
45
Example of Network Model Schema
46
Network Model
  • Advantages:
    • Network Model is able to model complex relationships and represents semantics of add/delete on the relationships.
    • Can handle most situations for modeling using record types and relationship types.
    • Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc.
      • Programmers can do optimal navigation through the database.
47
Network Model
  • Disadvantages:
    • Navigational and procedural nature of processing
    • Database contains a complex array of pointers that thread through a set of records.
      • Little scope for automated “query optimization”
48
History of Data Models
  • Hierarchical Data Model:
    • Initially implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems.
    • IBM’s IMS product had (and still has) a very large customer base worldwide
    • Hierarchical model was formalized based on the IMS system
    • Other systems based on this model: System 2k (SAS inc.)
49
Hierarchical Model
  • Advantages:
    • Simple to construct and operate
    • Corresponds to a number of natural hierarchically organized domains, e.g., organization (“org”) chart
    • Language is simple:
      • Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc.
  • Disadvantages:
    • Navigational and procedural nature of processing
    • Database is visualized as a linear arrangement of records
    • Little scope for "query optimization"
50
History of Data Models
  • Relational Model:
    • Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82.
    • Now in several commercial products (e.g. DB2, ORACLE, MS SQL Server, SYBASE, INFORMIX).
    • Several free open source implementations, e.g. MySQL, PostgreSQL
    • Currently most dominant for developing database applications.
    • SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99, SQL3, …
    • Chapters 5 through 11 describe this model in detail
51
History of Data Models
  • Object-oriented Data Models:
    • Several models have been proposed for implementing in a database system.
    • One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
    • Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB).
    • Object Database Standard: ODMG-93, ODMG-version 2.0, ODMG-version 3.0.
    • Chapters 20 and 21 describe this model.
52
History of Data Models
  • Object-Relational Models:
    • Most Recent Trend. Started with Informix Universal Server.
    • Relational systems incorporate concepts from object databases leading to object-relational.
    • Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server and other DBMSs.
    • Standards included in SQL-99 and expected to be enhanced in future SQL standards.
    • Chapter 22 describes this model.
53
Summary
  • Data Models and Their Categories
  • History of Data Models
  • Schemas, Instances, and States
  • Three-Schema Architecture
  • Data Independence
  • DBMS Languages and Interfaces
  • Database System Utilities and Tools
  • Centralized and Client-Server Architectures
  • Classification of DBMSs