The Data Dictionary
An automated dictionary is one of the most powerful tools available to analysts for documenting the information gathered from their activities. The dictionary can hold these analytical findings and can also be used to develop cross-references and correlations between data, information and business data and process modeling items.
This chapter discusses the functions of a dictionary, how a dictionary is constructed, and the roles of the database administrator and the data administrator in the analytical process.
One of the primary products of the analytical process is documentation. Some large projects produce large amounts of documentation, some of it running into hundreds of pages. Although development methodologies provide some structure to the documentation, even the best methodology and the most meticulous analyst cannot organize, index, and cross-index the information sufficiently to make it entirely useful. There is always the need to see the information differently from the way in which it is presented. Even with the best organization and indexing methods, hardcopy documentation (printed on paper) is difficult to revise and still more difficult to keep up to date.
In order to overcome these and other purely mechanical problems involved with producing and maintaining proper documentation, many firms rely on automated dictionaries. These data processing system products are specifically designed to hold, maintain, and organize analytical information; they come equipped with flexible facilities for producing a wide variety of reports on the dictionary contents.
A Data dictionary is an automated tool for collecting and organizing the detailed information about system components. Data dictionaries maintain facilities to document data elements, records, programs, systems, files, users, and other system components. A dictionary will also have facilities to cross-reference all system components to each other.
Why a Dictionary?
Although automated dictionaries are usually discussed within the context of database administration, data administration, or database in general, it is equally appropriate to discuss them in terms of the process of systems analysis. The reasons for this become obvious when one looks at what they are designed to do and what they are designed to contain.
These dictionaries, usually called data dictionaries or data dictionary/directories, are in reality application systems which have been designed to manage documentation. Although the majority of the documentation within them pertains to data, most of them also contain provisions for documenting the other components of the systems environment: systems, users, reports, forms, functions, processes, etc.
Most people do not look at them as systems but rather as automated tools, usually implemented under a DBMS, which are designed to contain system documentation and to facilitate management of data. Dictionary systems and their files, unlike most systems, do not contain data, but rather they contain "data about data." This "data about data," usually called "metadata," describes and defines the components of the data processing environment and allows for each type of component to be related to every other type of component, thus creating a powerful research, analytical, and cross-reference tool.
Because of their automated nature, and normal ease of updating and maintenance, dictionaries are the primary documentation tool for most well-run data processing organizations. In some cases the dictionary is also the repository for the control and definition information which drives the DBMS. In these cases the DBMS software is constantly interrogating the dictionary for information. Because any changes made to the dictionary are immediately reflected in the operations of the DBMS , and because no change can be made to the DBMS environment except through the dictionary, these "active" dictionaries are an integral part of the operational software of the DBMS product and help to ensure consistency of data definition and usage.
The opposite of an "active" dictionary is a "passive" dictionary. These products provide the same documentation capabilities as their active counterparts but are completely separated from the DBMS product. Because of these differences between the two types of dictionary products, active dictionaries are normally provided by the DBMS vendor, whereas passive dictionaries may be provided by the DBMS vendor or by an independent product vendor.
What Is Data Administration?
Within most companies where a DBMS product is installed, there is a functional unit or organization which is usually part of the development support environment and which is assigned responsibility for data management and for updating and maintaining the dictionary. These organizations usually have the additional responsibility for the functions of "data analysis" and "logical database design."
The name given to this part of the organization is Data Administration. Generically, data administration is that organization which is assigned responsibility for gathering information about corporate data; for verifying the definition, source, and usage of that data; and for preserving that information for the company and making it available in a timely way to all company personnel who may need it.
Normally data administration is not responsible for the capture, verification, processing, storage, retrieval, or dissemination of company data, only for the data about data, the definitions, descriptions, and so on. Within the data administration purview would also be the responsibility for determining firm wide consensus on coding structures, naming conventions, or naming standards.
Data administrators work with analysts to define both the data elements required for the business function and the structure of that data. Data administrators assist analysts in the process of data analysis and in documenting that analysis into the dictionary.
Since all information processing systems require the acquisition and manipulation of data, most analysis is devoted to identifying that data, its sources and uses, and the processes applied to it. Because of the critical role data plays in most firms, data administration has also become very important, and data administrators routinely play an important role in assisting the analysts with their documentation. In effect data administration has become the documentation center of most firms. In many cases the data administration function has also been assigned the responsibility for developing the system methodologies and for ensuring that they are followed.
One final activity of the data administrators with the aid of the dictionary is to coordinate the integration of the data models and data files of the firm. That is to aid in bringing a variety of separate parts (models, files and tables) into a harmonious whole. This is necessary since in most organizations the models and systems are developed by different development teams for many different reasons. The data administration organization with its dictionary aids in coordinating the merging of those parts and in developing standardized forms (normally developed by consensus) of common elements.
Because most firms have some form of automated dictionary, it is appropriate to discuss how they are designed and thus what they are capable of containing. Most data dictionaries are composed of a number of relational database tables, which store the documentation for application systems and their definition components. A simplified conceptual model of automated dictionary documentation categories and relationships is shown in Figure 8.1. Data dictionary are usually designed such that each table or set of tables contains data about some aspect of the systems environment. Each table is thus tailored to document a specific information system or business model component. These components form the bulk of the firm's documentation needs for its automated and business systems.
Part of the power of the dictionary is derived from its ability to relate various pieces of documentation to other pieces of documentation.
Although most dictionary systems are capable of documenting each major component, not all firms insist that each part be used.
With the rise in use of CASE tools, and the rise in the use of process, data and other types of models as analysis and development tools, dictionaries have been expanded beyond their original scope to include the wide variety of items contained in those models
Information Systems Components
Data elements are among most detailed piece of information systems documentation within the dictionary. Data elements, (referred to as fields when they are contained within files and records, and columns or data items when contained within rows of relational tables) are the lowest unit of meaningful information within the business, and by extension, within the data processing environment (Figure 8.2). Data elements have characteristics called properties: size, shape, format, value range, content validity, and location. In addition data elements have a business meaning. Data elements can be natural, coded, or derived.
When dealing with coded elements the analyst must first identify which elements are coded, then determine if the complete code list exists, verify that list, and determine if the coding structure needs to be expanded.
Net profit = gross sales - (expense of sales + overhead expense)
Here the analyst must determine which data elements are derived, document the formula for derivation, document the functions of that formula, and determine if the formula needs to be modified or if it can be simplified or redefined.
Data elements are related within the dictionary to all other Information system components and Business System Components
If the data element is the most detailed documentation item within the dictionary, then the next most detailed items are records (or table rows). Data records come in a variety of forms. In essence any aggregate of data elements can be considered a record and any aggregate of columns or data items can be considered a row. A file may contain many different kinds of data records. A table may contain only one type of rows, all identically constructed. A record can also be a single line of a report, a single form in a folder, a single folder in a file.
Data records may contain as few as one data element and there is no practical upper limit in terms of maximum data elements. Documentation of records consists of descriptions of the record, its identification by name, type, use, source, distribution, frequency of creation, etc. As with data elements, within the dictionary records are related to all other Information system components and Business System Components
Date files or data bases
Just as many different types of data elements are aggregated to records, tables and forms, so too, many different types of data records, tables and forms are aggregated to data files or data bases. Data files (or data bases) are in fact collections of like or records (or tables). Data files normally contain all occurrences of data records on a particular subject, such as all employees, all payroll records, all accounts, all securities, etc. Similarly, a database will contain all tables on a subject
Data files (or data bases) may contain as few as one record (or table), and there is no practical upper limit in terms of maximum records or tables. These ranges apply both to the types of records and to the numbers of occurrences of each type. Data files, like their component records, are named, described, and identified as to source, use, ownership, frequency of use, storage medium, location, etc. Data files are related within the dictionary to all other Information system components and Business System Components.
Screens (also known as panels, forms mor pages) may be treated in the dictionary either as separate items, records or files depending the approach taken by the Data Administration organization and depending on the capabilities of the dictionary. A screen is a collection of items normally consisting of paired labels and data fields used for data entry, data display or both. On some cases either the label or the data field may be omitted. Data entered to the screen or displayed on the screen may come from the firm’s database or files or it may be system generated.
Screens may contain as few as one data element or many. Dictionary entries representing the screen consist of an entry for the screen itself naming the screen, describing its contents and use, who may use it, when and why it is used, where the date comes from and where it goes. In addition to the standard data element documentation associated with the each field on the screen the dictionary normally includes the screen label text, screen coordinate information (where on the screen the field is located) and display editing associated with the data, any editing associated with the field that occurs on initial entry, and where the field is stored after entry and validation.
If a screen was developed as part of a set of screens for entering or displaying data , or both, the first screen of the set (or display sequence) should document the entire sequence, or screen hierarchy from first to last, and any on-screen capabilities to move between screens, forward and back, move to first screen, move to last screen, exit, or other options.
In some cases screens have passwords associated with them for either data entry or display. Screens may also have non-displayed or hidden fields which are used by the programs which manipulate them but which are not visible to the user, These passwords and other hidden fields must also be documented and associated with the screen. If a screen was generated by a specific tool, or used components from a specific Graphic User Interface (GUI) tool kit, the tool kit and the components used must be identified, documented and associated with the screen.
Screens are related within the dictionary to all other Information system components and Business System Components.
Graphic User Interface (GUI) is the term given to that set of items and facilities which provide the user with a graphic means for manipulating screen data rather than being limited to character based commands. Graphic User Interface tool kits are provided by many different vendors and contain a variety of components including (but are not limited to) tools for creating and manipulating:
Graphic User Interface tool kits may also provide facilities for using a mouse to locate and manipulate on screen data and activate program components.
Reports or forms may be treated in the dictionary either as separate items, records or files depending the approach taken by the Data Administration organization and depending on the capabilities of the dictionary. A report is a collection of lines each containing data items and or data field labels or other. Data displayed on a report may come from the firm’s database or files or it may be system generated.
A report may contain as few as one data element or many. Dictionary entries representing the report consist of an entry for the report itself naming the report, describing its contents and use, who may use it, when and why it is used, where the date comes from and who receives copies of the report it. In addition to the standard data element documentation associated with the each field on the report the dictionary normally includes the row and column heading label text and display editing or formatting associated with the data.
Reports are related within the dictionary to all other Information system components and Business System Components.
Business Systems Components
Data elements, data records, and data files do not exist in a vacuum. They must be associated with procedures which define how they are to be used, who uses them, how they are to be processed, when they are processed, where the data comes from and where it goes, how long it is to be retained, how it is to be verified and validated, and how to handle exceptions to normal processing rules.
A procedure then is a formalized method of accomplishing a given task or set of tasks. It sets down, in a step-by-step fashion, that which must be accomplished, when it must be accomplished, how it must be accomplished, and by whom it must be accomplished. Procedures detail methods of handling inputs and generating outputs, or simply methods for analyzing file and record contents and making decisions based upon that analysis.
Automated procedures, programs, and program fragments (called modules) provide the detailed step-by-step instructions which direct the computer's central processors to take the appropriate actions on the input data, and to manipulate that data to produce the desired outputs. Procedures, both manual and automated, contain the rules and guidelines for accomplishing a given task or set of tasks. Procedures are related within the dictionary to all other Information system components and Business System Components
A system is a collection of procedures, automated, manual, or both, which has been assembled and organized to accomplish the tasks, activities, and processes associated with a user function or subfunction.
Systems are documented as to what processes, procedures and activities are encompassed by them, Who is responsible for their performance, who initiates them, who supplies the inputs, who receives the outputs, what files are associated with them, what their purpose is, and what functional areas they support. Systems are related within the dictionary to all other Information system components and Business System Components
A user is usually someone outside the data processing area for whom a project is undertaken. Users have primary corporate responsibility for the accomplishment of functions, processes, and business activities. Users supply analysts with the background, and general and detailed knowledge about the tasks which need to be performed and about the business problems which need to be solved.
Users are documented in terms of their functional responsibility, the systems that service them, their reporting structure (to other users), the files which they are responsible for, the transactions they generate or process, and the reports they generate or use. Users are related within the dictionary to all other Information system components and Business System Components
Data and Process Model Components
Entities are developed as components of data models. Within the data models each entities represent one of the following: a person, a place, a thing, or an event. In case tools developed using CASE tools entities may also represent tables and or files. Entity documentation within the dictionary normally consists of its name and description or definition, an narrative describing why this entity is important to the firm, what role it plays within the firm, and a general discussion of the kinds of data that the firm must collect about this entity, and why.
Additional documentation can consist of size and growth rates, and narratives which discuss the business rules that apply for determining membership in the entity group.
Entities are related within the dictionary to all other Business Data and Process Model components, to all Business System components, to many Information System components.
Attributes are developed as a component of data models. Like data elements attributes are among the most detailed piece of documentation within the dictionary. Attributes are the lowest unit of meaningful information within the business data. Like data elements, attributes have characteristics called properties: size, shape, format, value range, content validity, and location. In addition each attribute has a business meaning, and like data elements, attributes can be natural, coded, or derived.
Net profit = gross sales - (expense of sales + overhead expense)
Attributes are related within the dictionary to all other Business Data and Process Model components, to all Business System components, to many Information System components.
Relationships are components of business data models. Relationships connect entities within a data model. Each relationship must connect a pair of entities within a model. The dictionary documentation must indicate which entities this relationship connects.
Relationships can be viewed from either of the associated entities. The names assigned to the relationship within a model are usually different depending upon which entity is viewing the relationship (e.g. record contains data element and data element is contained within record. Since not all CASE tools and thus not all models support these bi-directional names in ma y cases only one name is available.
Relationships are assigned names which reflect business rules if the firm. The description or definition of the relationship should include a discussion of the appropriate business rules associated with this relationship, its source, and relevant citations (formal sources) if available (e.g. Policies and Procedures Manual, Human Resources Manual, etc.
Relationships are related within the dictionary to all other Business Data and Process Model components, to all Business System components, to many Information System components.
The production of data and process models have become a standard part of most systems analysis projects. Since the components of the models are usually documented within the dictionary, it is highly recommended that the model itself be documented as well. Model documentation consists of the name and description of the model, the name and version of the CASE tool that produced it, the location of the model files, the date the model was produced and the name of the organization responsible for producing and maintaining it.
Models are related within the dictionary to all other Business Data and Process Model components, to all Business System components.
A project is a set of tasks or activities, governed by a fixed schedule , with a fixed set of products established and staffed to accomplish a specific goal. Normally dictionaries document the systems analysis and system development projects that generate and their products. Any Business Process Reengineering projects and their products may also be documented in the dictionary.
Project documentation consists of the name and description of the project, the date the project was initiated, its expected completion date, the location of the project plan, documentation on any relevant milestones or critical dates associated with the project, and the name of the project manager, the names and titles of other critical project personnel, and the organization(s) responsible for project staffing, monitoring and review, and funding.
Projects are related within the dictionary to all other Business Data and Process Model components, to all Business System components.
A function is a series of related activities, involving one or more entities, performed for the direct or indirect purpose of fulfilling one or more missions or objectives of the firm, generating revenue for the firm, servicing the customers of the firm, producing the products and services of the firm, or managing, administering, monitoring, recording, or reporting on the activities, states, or conditions of the entities of the firm.
Function documentation consists of the name and description of the function, the name of the organization responsible for producing and maintaining it. Unless separately documented in the dictionary function narratives should contain discussions of the functions associated missions and objectives, a description of the customers it serves, and the activities which it performs
Functions are related within the dictionary to all other Business Data and Process Model components, to all Business System components.
A process is a sequence of related activities, or it may be a sequence of related tasks which make up an activity. These activities or tasks are usually interdependent, and there is a well-defined flow from one activity to another or from one task to another.
Process documentation consists of the name and description of the process, the name of the organization responsible for it. Unless separately documented in the dictionary process narratives should contain discussions of the activities associated with it.
Processes are related within the dictionary to all other Business Data and Process Model components, to all Business System components.
Other dictionary contents
Dictionaries may also be used to document activities, and tasks independently of the functions, processes and user areas themselves. Other items and categories of information within these automated dictionaries can be the organizational units and structure of the firm itself, and the physical resources of the firm as they relate to information processing. In some cases the dictionaries have been used to contain the standards, guidelines, and procedures which govern the development process itself, as well as information relating to the types and locations of documentation not contained in the dictionary.
How Does a Dictionary Function?
Automated dictionaries are fundamentally application systems. In this case they are applications which assist the development unit itself. As with any other system, especially purchased ones, they may have to be modified by the specific firm for its own use.
A dictionary consists of a collection of files designed to store dictionary data and a collection of programs, batch, on-line, or both, which perform the tasks of data input, data manipulation, and data presentation. A dictionary may have programs and procedures associated with it which were written for the specific installation.
Generally speaking the process of dictionary use begins with the gathering of information about some aspect of data or other category of information within the dictionary. This is written down, verified, and codified to suit the particular dictionary being used. (See Figure 8.2 for an example of a dictionary element data collection form.)
After the data are gathered, they are either given to the data administrator for entry or may be entered by the analyst or the user, assuming he or she has sufficient training in dictionary use.
Once the primary entry is in the dictionary the analyst or data administrator begins the process of editing the information and associating that entry with every other known entry to which the new entry is related. As new entries or new information about old entries are gathered, they are also entered into the dictionary and related in a similar manner.
How Can the Dictionary Assist the Analysis Function?
Progress reporting and documentation
The data administration staff produces periodic reports for the analyst listing all entries associated with the analyst or the user to date and their cross reference information. These reports can be produced in any order, and from any vantage point within the dictionary configuration. That is by element, by record, by file, by procedure, by system, by user, or by any attribute of any of the above. These reports may be all-inclusive or may be restricted to some subset of the data entries.
Because of the wide variety of entries which can be stored in the dictionary and because of its ability to store narrative definition on each entry, the dictionary can act as a primary documentation tool for the analyst. The analyst may also use the on-line capabilities of the dictionary to check on any reference.
Because the dictionary contains not only data entered by the analyst doing the research but all data entered by other analysts, and about all systems, the analyst can use either the on-line or batch-reporting capabilities of the dictionary to browse its contents. Using this capability the analyst can determine if any of the entries about to be made are already entered, in which case only the additional information need be added.
In some cases the analyst may be searching for some particular data to service a user, and the dictionary can be used to determine if it is already being captured, and if so, by whom, and when, and where.
Conversely if the analyst is seeking to determine the impact of a proposed change, the dictionary can be used to find all known users and uses of the item to be changed.
Dictionaries, Encyclopedias and Repositories.
In order to discuss dictionaries, encyclopedias and repositories we need to develop a comparative set of definitions. The definition of a dictionary below is different form that presented above to facilitate the discussion. The dictionary  defines each of these works in the following manner:
A dictionary is a reference book containing words usually alphabetically arranged along with information about their forms, pronunciations, functions, etymologies, meanings and syntactical and idiomatic uses.
A reference book listing alphabetically terms or names important to a particular subject or activity along with discussions of their meanings and applications.
An encyclopedia is a work that contains information on all branches of knowledge or treats comprehensively a particular branch of knowledge, usually in articles arranged alphabetically, often by subject
A repository is a place, room or container where something is deposited or stored. Something that contains or stores something non-material such as knowledge.
Before the advent of CASE tools documentation about systems components, such as functions, data, records, files, users, programs and the like were stored in textual form in complex files labeled by their vendors as Data Dictionary/Directory systems. These systems stores definition s and descriptions of all these components in as much detail as the particular installation wanted. The minimum stored was a definition, a standard name and some physical descriptors, such as location, size, frequency of occurrence, etc. Most dictionaries were mainframe resident, text based and resident on a database management system. That is they relied on a database management system to manage their files.
Because they were DBMS based, many vendors provided the capability to extend the contents of their products. They provided a basic set of entries and a basic set of attributes for each entry and provided facilities for the user to create their own entry types and to assign attributes to those entry types. Thus the user could customize his dictionary and populate it with a wide variety of entries. Dictionaries provided reporting capabilities and some ability to extract information from their files and generate files to export data to other products. For the most part though this export capability was limited to the generation of various code segments for use in system development, and code blocks for use in generating new databases.
Given the above we can develop a more specific definition for a Data Dictionary:
A data dictionary is an automated tool for collecting and organizing the detailed information about system components. Data dictionaries maintain facilities to document data elements, records, programs, systems, files, users, and other system components. A dictionary will also have facilities to cross-reference all system components to each other. A data dictionary may be independent of or tightly associated with a DBMS product.
CASE tool vendors required some method of storing the information they captured during the model building process. These model objects included such things as entities, relationships, attributes, processes, functions, data stores, data flows, record and screen layouts, pseudo-code text, and other design information items. To distinguish the place where they stored their information form the data dictionaries already in use by the organization, they called these storage places encyclopedias.
Encyclopedias differed from data dictionaries in several ways. First the only way to populate an encyclopedia was through the CASE tool during the model building process. Most encyclopedias contained only that information which the CASE vendor deemed necessary to document and maintain the models as well as coordinate information to help draw the models on the screens. Because they were an integral part of the CASE tools, encyclopedias tend to be rigidly controlled by the vendors, and although some had fairly flexible reporting facilities, almost none provided the user with the ability to modify or extend the contents.
Anencyclopedia is an integral part of a CASE product and is designed by the CASE tool vendor specifically to collect and organize the detailed information about the data and process model system components developed using the CASE tool components. CASE encyclopedias maintain facilities to document attributes, entities and relationships, functions and processes, screens and reports, data flows, data stores, missions, goals, objectives, critical success factors, users and organization structures, and other data and process model components.
An encyclopedia will have facilities to cross-reference all components of its data and process models. A specific vendor’s encyclopedia cannot normally operate independently of its associated CASE tool.
Encyclopedias were tightly linked to the CASE tools. Dictionaries were DBMS linked and were limited to text based information. Industry representatives, pundits and vendors looked for a method to store the wide variety of development and production information generated by an active development organization, including information or reports, screens, files, DBMS information, meta-data, documents and possibly even data files. Two separate groups developed specifications for such a product
The first specification was developed by the European Computer Manufacturers Association (ECMA) and was know as the Portable Common Tool Environment (PCTE) specification. The PCTE specification has been supported by the National Institute for Standards and Technology (NIST). The PCTE is:
The second specification was developed by a Committee of the American National Standards Institute (ANSI) and is know as the Information Resource Dictionary System (IRDS). IRDS is also supported by the National Institute for Standards and Technology (NIST). The IRDS is:
PCTE-based repository can incorporate an IRDS repository with in it, but not the reverse.
CASE translators, integrators and busses
IRDS-based repositories are sold with optional translators, (also know as CASE integrators, or CASE busses) which provide the capability to extract data from one or more CASE tool encyclopedias and to load that data into the repository itself. These facilities are constrained by the availability of data from the CASE tools themselves (many of which have proprietary file (encyclopedias or otherwise). These differences, vendor secrecy, and lack of recognized file transfer standards cause the following problems:
A repository is independent of both DBMS and CASE products and designed to collect and organize all analysis and design information regardless of source. Repositories can collect information about the components of the data and process model components developed using the CASE tools, as well as that information collected about non-model Information systems components such as data elements, records, programs, systems, files, and users. Repositories maintain facilities to document attributes, entities and relationships, functions and processes, screens and reports, data flows, data stores, missions, goals, objectives, critical success factors, users and organization structures, and other data and process model components.
Webster’s II New Riverside University Dictionary, Riverside Publishing Company, Boston, MA 1984
A Professional's Guide to Systems Analysis, Second Edition
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.