Organizing Data and Information

2/18/00


Click here to view all slides


Table of Contents

Data

Database (definition 1)
A collection of data organized to meet users’ needs
Database Management System (DBMS)
A group of programs that manipulate the database and provide an interface between the database and the user of the database or other application programs

The Hierarchy of Data (Figure 5.1)The Hierarchy of Data These slides show the data hierarchy: the smallest unit of data is the character (a bit isn't considered data, it's just a building block). A collection of related characters is a field. A collection of related fields is a record. A collection of related records is a file. A collection of related files is a database (defn 2).

Data Entities, Attributes, and KeysMake sure you know these definitions, especially primary key.

Keys and Attributes (Figure 5.2) Picture to illustrate above.

The Traditional Approach (Figure 5.3) Basically, when companies first got computers, they had a team of programmers write billing systems, and another team write a purchasing system. The billing system could compute total revenues, and the purchasing system could compute total Cost of Goods Sold (COGS), but the systems were separate, so product managers had to manually compute profit on each product for which they were responsible. Originally, this was done before DBMS had been developed.

Drawbacks to the Traditional Approach Know for exam. Data redundance, lack of data integrity, program data dependence. Example, at St. Johns, employee applications use traditional approach, so ADDRESS for Prof. Michael Wolfe is stored 3 times. One is in Texas (where I lived when I applied for a job); one is in Bellerose (where I had a temporary place with a landlady who was the title role from Die Fledermaus); one is my current address in Howard Beach. I get mail sent to all 3 addresses, depending on who's sending it. If the program has to be changed, all data files must be changed; if new data must be tracked, all programs using the data files must be re-written.

The Database Approach (Figure 5.4) Note that just buying a database does not gurantee the IT group will actually use the database approach. St. Johns has all 3 apps in the DBMS Oracle, but they are still not integrated. This is the most common situation today: Data have been moved into a DBMS, but apps are still not integrated.

Advantages to the Database Approach (Table 5.1) There are a bunch of them, and they'll all be on the exam. Please read.

Disadvantages to the Database Approach (Table 5.2) Doesn't make a lot of sense. Book lists increased security as advantage and increased vulnerability as disadvantage of DBMS. Actually, DBMS security is usually much better than programmers home-grown code, but, if there is a hole, all data in database is compromised. With traditional approach, generally only data in one application is lost if the application fails. Biggest disadvantage is cost. DBMS is slower, requires more expensive CPU; but biggest cost is that DBMS requires much more training for programmers.

Planned Data Redundancy May list same data more than once for safety or for enhanced performance.

Data Modeling and Database ModelsKnow definitions: data model, enterprise data model, ER diagram.

Hierarchical Database Model (Figure 5.6) Oldest, most efficient model. Know: tree, single parent per child.

A Network Data Model (Figure 5.7)Second model developed. Allows multiple parents per child. Example: categories of transactions. In hierarchical, all transactions must be in single category. Auto taxes must be either auto expense or tax, can't be both. In Network, can put in both categores, and will come up in both totals, but won't be counted twice in total expenditures.

Both hierarchical and network databases are called navigational. Very efficient for routine (MIS) type information processing. Can't do ad hoc (DSS) processing.

A Relational Data Model (Figure 5.8) Slower than navigational. Data stored in tables. Requires much more training for programmers to use effectively (but less training to use badly.) If normalized, allows arbitrary ad hoc queries. If developed by untrained programmers, and not normalized, only pre-specified queries will be possible, just like navigational dbms.

Relational Database Terms Know select, project, join. Select picks certain row(s) from table--Select records for customer number 2301.

Project--selects certain columns. Show only name and amount due, omit rest of data.

Join - join two tables togethere --i.e., have address in one table, amount due in another, join to send bill.

Linking Data Tables to Answer an Inquiry (Figure 5.9)

Building and Modifying a Relational Database (Figure 5.10) Picture of someone using Access.

The Use of Schemas and Subschemas (Figure 5.11)

Schema
A description of the entire database.
Subschema
A file that contains a description of a subset of the database and identifies which users can perform modifications on the data items in that subset.

Using a Data Definition Language (DDL) (Figure 5.12) Database has 3 languages (SQL includes all 3) First have to set up database (DDL). E.g., state that database will include products, quantities, suppliers, costs.

Data Dictionary (Figure 5.13) Key to database. Try to get definitions of all items in database to allow linking, ad hoc queries.

Data Dictionary Features Long list. Know them all.

Logical and Physical Access Paths (Figure 5.14) User (and programmers) don't need to know how dbms actually gets data (physical path). Just ask for data element by name (as per data dictionary).

Manipulating Data 2nd database language, data manipulation language (DML). Actually input (and change) suppliers, products, costs, etc. specified in DDL.

Structured Query Language (Figure 5.15) Picture of sample SQL statement. Again, SQL includes DDL, DML, and QL (Query Language). The Query Language (QL) allows retrieval of information from database. In early databases, dbms designers used 3 teams for DDL, DML and QL, so all were very different.

Database Output (Figure 5.16) Picture of an Access report. Look at Northwind reports.

Popular Database Management Systems for End Users Out of data, except for Access. One of my MBA students uses a dbms written in dBase IV, but then they access the data with Access. Also know selection criterial (long list, but mostly common sense, features, cost, vendor reputation, etc.)

The Use of Distributed Databases (Figure 5.17)Distributed Database A database in which the actual data may be spread across several smaller databases connected via telecommunications devices.

Elements of a Data Warehouse (Figure 5.18)

Data Warehouse
A relational database management system designed specifically to support management decision making. Current evolution of Decision Support Systems (DSSs).
Data Mart
A subset of a data warehouse for small and medium-size businesses or departments within larger companies.

How to Design a Customer Data Warehouse (Table 5.3) 5 steps. Know.

Common Data Mining Applications (Table 5.4) Text says data mining must be fully automated. Data warehouse system finds all pattens without user involvement. Common but not universal definition.

On-Line Analytical Processing (OLAP) Allows user to search data warehouse to search for patterns. Sometimes used interchangeably with data mining.

OLTP, OLAP, and Data Warehousing (Table 5.5) There are questions about differences in 3 concepts. OLTP is the basic on-line transaction processing, goes back 20+ years. OLAP is trying to study on-line data. Just now catching on. OLAP slowed down OLTP, stopped systems from selling to customers.Possible because storage, processors much more powerful today.

Open Database Connectivity (ODBC) (Figure 5.19) Picture is wrong. ODBC is a client server technique that allows applications to access mainframe databases, Oracle, Informix, DB/2, SQL Server, etc.

Object-Relational Database Management Systems (ORDBMS)

Object-Relational Database Management System (ORDBMS)
A DBMS capable of manipulating audio, video, and graphical data. Also, inheritance.
Hypertext
Users can search and manipulate alphanumeric data in an unstructured way. Term coined by Nelson in 1964. Concept developed by Bush in 1945. Click on concept to view links. Originally, text only.
Hypermedia
Allows businesses to search and manipulate multimedia forms of data. Hypertext concept extended to include audio and video.
Spatial Data Technology
Use of an object-relational database to store and access data according to the location it describes and to permit spatial queries and analysis.