Table of Contents
The Hierarchy of Data (Figure 5.1)The Hierarchy of DataThese slides show the data hierarchy: the smallest unit of data is the character (a bit isn't considered data, it's just a building block). A collection of related characters is a field. A collection of related fields is a record. A collection of related records is a file. A collection of related files is a database (defn 2). Data Entities, Attributes, and KeysMake sure you know these definitions, especially primary key. Keys and Attributes (Figure 5.2) Picture to illustrate above. The Traditional Approach (Figure 5.3) Basically, when companies first got computers, they had a team of programmers write billing systems, and another team write a purchasing system. The billing system could compute total revenues, and the purchasing system could compute total Cost of Goods Sold (COGS), but the systems were separate, so product managers had to manually compute profit on each product for which they were responsible. Originally, this was done before DBMS had been developed. Drawbacks to the Traditional Approach Know for exam. Data redundance, lack of data integrity, program data dependence. Example, at St. Johns, employee applications use traditional approach, so ADDRESS for Prof. Michael Wolfe is stored 3 times. One is in Texas (where I lived when I applied for a job); one is in Bellerose (where I had a temporary place with a landlady who was the title role from Die Fledermaus); one is my current address in Howard Beach. I get mail sent to all 3 addresses, depending on who's sending it. If the program has to be changed, all data files must be changed; if new data must be tracked, all programs using the data files must be re-written. The Database Approach (Figure 5.4) Note that just buying a database does not gurantee the IT group will actually use the database approach. St. Johns has all 3 apps in the DBMS Oracle, but they are still not integrated. This is the most common situation today: Data have been moved into a DBMS, but apps are still not integrated. Advantages to the Database Approach (Table 5.1) There are a bunch of them, and they'll all be on the exam. Please read. Disadvantages to the Database Approach (Table 5.2) Doesn't make a lot of sense. Book lists increased security as advantage and increased vulnerability as disadvantage of DBMS. Actually, DBMS security is usually much better than programmers home-grown code, but, if there is a hole, all data in database is compromised. With traditional approach, generally only data in one application is lost if the application fails. Biggest disadvantage is cost. DBMS is slower, requires more expensive CPU; but biggest cost is that DBMS requires much more training for programmers. Planned Data Redundancy May list same data more than once for safety or for enhanced performance. Data Modeling and Database ModelsKnow definitions: data model, enterprise data model, ER diagram. Hierarchical Database Model (Figure 5.6) Oldest, most efficient model. Know: tree, single parent per child. A Network Data Model (Figure 5.7)Second model developed. Allows multiple parents per child. Example: categories of transactions. In hierarchical, all transactions must be in single category. Auto taxes must be either auto expense or tax, can't be both. In Network, can put in both categores, and will come up in both totals, but won't be counted twice in total expenditures. Both hierarchical and network databases are called navigational. Very efficient for routine (MIS) type information processing. Can't do ad hoc (DSS) processing.A Relational Data Model (Figure 5.8) Slower than navigational. Data stored in tables. Requires much more training for programmers to use effectively (but less training to use badly.) If normalized, allows arbitrary ad hoc queries. If developed by untrained programmers, and not normalized, only pre-specified queries will be possible, just like navigational dbms. Relational Database Terms Know select, project, join. Select picks certain row(s) from table--Select records for customer number 2301. Project--selects certain columns. Show only name and amount due, omit rest of data. Join - join two tables togethere --i.e., have address in one table, amount due in another, join to send bill. Linking Data Tables to Answer an Inquiry (Figure 5.9) Building and Modifying a Relational Database (Figure 5.10) Picture of someone using Access. The Use of Schemas and Subschemas (Figure 5.11)
Using a Data Definition Language (DDL) (Figure 5.12) Database has 3 languages (SQL includes all 3) First have to set up database (DDL). E.g., state that database will include products, quantities, suppliers, costs. Data Dictionary (Figure 5.13) Key to database. Try to get definitions of all items in database to allow linking, ad hoc queries. Data Dictionary Features Long list. Know them all. Logical and Physical Access Paths (Figure 5.14) User (and programmers) don't need to know how dbms actually gets data (physical path). Just ask for data element by name (as per data dictionary). Manipulating Data 2nd database language, data manipulation language (DML). Actually input (and change) suppliers, products, costs, etc. specified in DDL. Structured Query Language (Figure 5.15) Picture of sample SQL statement. Again, SQL includes DDL, DML, and QL (Query Language). The Query Language (QL) allows retrieval of information from database. In early databases, dbms designers used 3 teams for DDL, DML and QL, so all were very different. Database Output (Figure 5.16) Picture of an Access report. Look at Northwind reports. Popular Database Management Systems for End Users Out of data, except for Access. One of my MBA students uses a dbms written in dBase IV, but then they access the data with Access. Also know selection criterial (long list, but mostly common sense, features, cost, vendor reputation, etc.) The Use of Distributed Databases (Figure 5.17)Distributed Database A database in which the actual data may be spread across several smaller databases connected via telecommunications devices. Elements of a Data Warehouse (Figure 5.18)
How to Design a Customer Data Warehouse (Table 5.3) 5 steps. Know. Common Data Mining Applications (Table 5.4) Text says data mining must be fully automated. Data warehouse system finds all pattens without user involvement. Common but not universal definition. On-Line Analytical Processing (OLAP) Allows user to search data warehouse to search for patterns. Sometimes used interchangeably with data mining. OLTP, OLAP, and Data Warehousing (Table 5.5) There are questions about differences in 3 concepts. OLTP is the basic on-line transaction processing, goes back 20+ years. OLAP is trying to study on-line data. Just now catching on. OLAP slowed down OLTP, stopped systems from selling to customers.Possible because storage, processors much more powerful today. Open Database Connectivity (ODBC) (Figure 5.19) Picture is wrong. ODBC is a client server technique that allows applications to access mainframe databases, Oracle, Informix, DB/2, SQL Server, etc. Object-Relational Database Management Systems (ORDBMS)
|