Developing the Data Model

Contact Martin Modell   Table of Contents

Within the context of data framework development, there are two key components. The first, the data model will be discussed in this chapter, the other the data dictionary will be discussed in the section on tools.

The data model must represent all data components of the new system. Although many methodologies restrict the data model to files, a complete data model must include not only the data files (both private and common data resource) but also the documents which bring raw data into the firm, the documents which transport data between various areas of the firm and the documents which take data from the firm. These documents, although their contents are usually also included in the data files, are distinct entities, and must be accounted for in the design process.

The data model for the framework level of the design does not and usually cannot have very much detail associated with it. This design is developed during the general business design portion of the design phase. It includes an identification of the major private data and common data resource files and a brief description of what each file is expected to contain. It also includes an identification of each of

the documents, forms and reports which are expected to be a component of the system and what they are expected to contain. The model itself contains each of these data entities and their expected or desired relationships to each other. The relationships may be expressed in any terms which represent how the individual components relate to each other, or in some cases how they create a relationship between to other entities within the model.

In most cases the incoming document entities are processed and stored in file entities. The relationships between source document and stored data file must be represented in the data framework model. The data framework model should not contain any processing specific information, nor any explicit processing logic. However, it will indicate where documents must be processed prior to being filed, or where files are processed to produce documents. It should not indicate any processing which does not introduce new data, documents or files into the system.

One of the primary modeling techniques for data framework development is the Entity-relationship model.

The development of the Entity-Relationship model is a multilevel process where each level produces a clearer and more well-defined view of the proposed environment. The first two of the four (figure 12-1) Entity-Relationship model levels compose the data framework. These are the enterprise level model and the entity-relationship level model. The remaining two models (the entity-relationship-attribute level model and the data element level) can only be developed when more is known about the specific procedural processing which is to occur within the system. The complete results of this modeling effort results in a series of leveled environmental descriptions, along with a diagrammatic representation of each level. These diagrammatic representations, the ER models, are descriptions of the entities, or real components, of the business environment and how they relate to each other.

On a regular basis the firm deals with customers products, employees, places, orders, shipments, etc. The firm collects data about these things, these entities, and stores that data in files. With few exceptions, it is the common data resource files of the firms which hold the descriptions of those entities in the form of collections of data items.

The ER models are not data structure models, but data descriptions or representations of the entities of interest to the firm. These data descriptions also correspond to the data contents of the files and thus serve well as a mechanism for developing our data framework. And, although at their most detailed level they contain and identify data elements, and they are not data processing models. They are business models and as such, they model business environments and depict business components.

Entity-Relationship models consist of representations of the various levels and parts of the organization, from the strategic to the operational levels. Each of these leveled models represent the entities and relationships from the perspective of that level, and within a level the Entity-Relationship models represent the perspective of a one or more particular users at that level.

Basic Entity-Relationship Notation

Although there are numerous variations of the Entity-Relationship Approach model notation, the three basic notational components of the Entity-Relationship model consist of symbols representing an entity, a relationship between two entities, and the attributes, or descriptors of either entities or relationships.

These symbols as shown in figure 12-1, are:

  1. Rectangles - Each unique entity type or entity subtype is represented by a rectangle which contains the name of that unique entity type or entity subtype.
  2. Diamonds - Each relationship which exist between any two different entities or between two occurrences of the same entity is represented by a diamond which contains the name of that relationship.
  3. Circles - Each unique attribute of either an entity or a relationship is represented by a circle which contains the name of that attribute.

Leveled model development

Entity-Relationship models have been applied to individual business units, and even to individual business functions. The full ER approach model addresses the whole organization, and each of its parts in a top down manner. Only by using this top down leveled approach can a complete and accurate business perspective be attained. The approach develops the models in pyramid fashion, beginning with the senior management level and proceeding downward. This corresponds most closely to the manner in which most firms view themselves.

For the data framework, we will present the models in the sequence in which they are most easily developed, a sequence which corresponds to the upper levels of the organization: - Strategic and Managerial

As seen in figure 12-2, the Entity Relationship Approach produces a different type of diagram or set of diagrams for each of these organizational levels:

The Enterprise level The Entity Relationship level

Because it is based upon a top down approach, the models at each level represents a decomposition, or expansion of detail, of the immediately preceding level. The number of diagrams at each level is dependent upon the number of entities and relationships involved, and on the complexity of those entities and relationships. There is no requirement to maintain the diagrams on a single chart, or to break them down into many smaller charts.

Aside from the enterprise level model, which should be a single and by definition a firmwide chart, the lower level charts may be developed against any perspective. A version of one such chart is illustrated in figure 12-3. These perspectives may be firmwide, by functional, by business unit, or by product line. Because they are developed to be an aid for understanding the data framework of the business, the diagrams at each level can be combined or split in any manner which aids comprehension, but above all they should be drawn such that they can be easy to follow and meaningful to the designer and the user.

Enterprise Level

The Enterprise level consists of identifying the major entities of the firm. Although an entity is usually represented as a single instance, at this and all succeeding levels, each entity in fact, represents a set of people, places or things.

A set is a group of things of the same kind that belong together and are so used, or a group of persons sharing a common interest, performing a common function, or who are described in a highly similar, if not identical manner.

At the Enterprise level, the definition of the entity is very general and represents all people, places or things which relate to the firm in the same general manner, or which are viewed by the firm in the same manner. These entities at the enterprise level are the foundations of the common data resource files, and will appear essentially the same in any design which is developed for any system within the firm.

The definition of the entity is as general as is possible, so as to include all current and potential members of the set, but specific enough to retain its meaning within the firm. For instance, a Customer may be defined as "any person, or organization, which buys, rents, leases, or otherwise acquires product or service from the company." Product might be "any physical thing or service which the company provides to its customers in the course of conducting its business (not necessarily for a price)."

The Enterprise model consists of the identification and definition of the major entities of the firm and an indication as to whether or not a relationship exists between them. At this level, there is no differentiation between the various subtypes of any given entity, nor the number or types of relationships between any two entities. The definition of each entity should however include the names of all known subtypes of the entity, or role variations for the entity (i.e. Salesperson, Executive, part-time, full-time, etc.). Normally the only document entities which are included at this level are those which are critical to the firm, such as the order.

Basic Entities of the firm

Every firm, large or small, deals with multiple different types of entities in the course of conducting its business. Although the names of the various entities will vary from firm to firm, at the most general level they can be grouped into four major categories: People, Places, Physical Things (such as a document, a product, a machine, etc.) and Logical or Legal Things (such as, a corporation, or a business unit). Using these four major categories, we can identify some of the most commonly occurring entities regardless of the type of business a company does.

People Entities fall into three major classes: the people who make up the firm's workforce, the people who are its customers or clients, and or people who supply it with raw materials, products, parts, and financial or other services.

Place Entities also fall into three major classes: the places where its services are offered, or its products are made, stored and/or sold, the places where its workforce is located, the places where its customers reside, or are otherwise located.

Physical Thing Entities include: the actual products of the firm, its physical assets (buildings, land, furniture, machinery or other equipment, inventory, supplies, etc.), its financial assets (money, securities, leases, contracts, bank accounts, loans, notes, credit lines, etc.), and the documents, memoranda, accounts, contracts, orders, invoices, statements, checks, vouchers, reports and files which record its business transactions and activities

Logical or Legal Thing Entities include: the services offered by the firm, the firms who are its customers or clients, the firms or people who supply it with raw materials, products, parts, and financial or other services, the markets within which the firm operates, the governmental and regulatory units under whose jurisdiction the firm operates, and the organizational units into which the firm's workforce are grouped for business, functional, and reporting purposes.

Within any given firm it can be expected that most, if not all, of the above entities will be represented. What they are called, how they are defined and described, how they are subtyped, and more importantly, what the firm needs to know about them, depends upon the specific business of the firm, its culture, and the business rules, policies, mission statements, charters and procedures which govern what it does and how it operates.

It should be remembered however that the business entities being identified and defined at the enterprise level relate to the firm as a whole. As such these entities and relationships may be numerous, complex, and may lack the precision of definition which lower level models require.

The Enterprise model depicts entities at the set level. At this level the entities have the widest possible definition and scope, while still maintaining the general physical and role characteristics of the individual entities which comprise them. These entity sets are treated as if there were no variations in type and as if each of their component entities were defined in a similar manner and behaved in a similar manner.

At the enterprise level:

Enterprise level model

The Enterprise Level model, is created in two steps. Step one, the entity is selected, named, identified and defined. It is helpful, although not mandatory, to select a primary, or core entity to begin the model. In most cases this core entity will be either the Customer, Product, Order or Employee, or all four, since normally these are the most important entities to the firm. Entities are represented as a rectangle with a name in it. The symbols for the selected entity(s) are placed in the center of the page.

All Entities, directly related to the core entity(s) are placed, one at a time, in a ring around those at the core. These entities will be called secondary entities only for purposes of describing the model creation process. Each entity symbol should have the entity name within it. This is the entity's primary name (the name of all entities in the set) or its role name, (the name of the entities in the subset being depicted).

Step two is to connect each pair of related entities with a single line between them. Each pair of entities may be related in multiple distinct ways, or in only one way. The line between each related pair does not distinguish between the various types of relationships and thus contains no name or other information. A specific entity may be related to multiple other entities however, each Entity must be related to at least one other entity within the model.

Any two entities which have at least one distinct, identifiable relationship between them are connected by a single line. No entity should appear on the model which is unrelated to at least one other entity. There is no differentiation between the number or types of relationships between any two entities. Like the entities themselves, it is a binary condition, it exists or it does not.

An enterprise level model will usually contain from ten to thirty entities and should fit on a single sheet of paper. The number of entities which appear will depend upon the complexity of the corporation, and the type of business and the degree to which differentiation between various subtypes is important to the senior management (or strategic) levels of the firm.

Another determinant of the number of entities is the way in which the entities have been defined. A model which contains the general entity Employee will be less complex than one which contains more specific entities such as sales-, production- and back office-employee. Again, a model which defines the general entity organizational unit will be less complex than one which defines sectors, groups, divisions, or subsidiaries.

There are no rules governing how the entities are defined at this or any other level, except to say that the definitions should be consistent with respect to their level of abstraction (that is, do not define one entity as "all employees," and at the same level, differentiate between all different types of products, or customers).

These definitions should also make sense within the context of the firm, and they should be as specific or as general as is necessary to make the diagram clear and readable.

Entity-Relationship Level

The Entity-Relationship level is an expansion of the work performed at the enterprise level. At the entity-relationship level the entities which were previously identified only at the set level can now be brought into sharper focus. It is at the Entity-Relationship level that differing types or subsets of entities are recognized.

As with the enterprise level, the entities are in reality groups or sets of people, places or things, however these groups may be much smaller and much narrower in definition than those at the enterprise level.

For instance, it may be relevant to a company to differentiate between types of customers, such as between institutional and retail in the brokerage industry, between subscription and mail-order in the publishing industry, between different types of products, such as spare parts and finished items, or between elemental parts and subassemblies.

In some cases the various entities may be differentiated by the role that they play with respect to the firm. For instance the design may distinguish between executive managers, middle managers, clerical workers, professionals, and sales personnel, between full time and part-time workers, between salaried and hourly workers, or between union and nonunion workers.

Another set of distinctions might be by function, such as between production, sales, engineering and back office. In each of these cases, although the people in each category are all employees, they are treated differently by the firm or they play different roles with respect to the business of the firm.

As each distinct subset of each entity set is identified it should be named. Each of the entities (and subset entities) identified at this level should be related to at least one other entity, and may be related to many other entities. Each subset entity should relate to one or more subset entities of the entity sets to which its parent set is related. In addition, at this level the specific relationships which exist between each pair of entities is identified and named.

All models below the Enterprise level business model, are more detailed and may use any of a number of distinct entity subtypes or subsets of the universal entity set in place of the universal entity set. Here, each subset is given a name corresponding to either the entity subtypes which populate it or to the role which the subset member entities play within the firm.

These names are usually something other than that of the entity name assigned to the universal set. These subset are usually created to represent the various roles which the more global entity plays.

In some cases the subset name is different from the role name and may represent the title by which the members or principal members are known within the firm. In these cases, both the role and title names by which that entity is known should be stated.

At the managerial level:

Entity-Relationship level model

The creation of the Entity-Relationship level model is a two step process. Step one, the major entities represented at the set level on the enterprise level model, are differentiated into their meaningful subsets (components, subtypes, subclasses, etc.). It is here that the various types of customers are differentiated, as well as the various types of products, employees, accounts, etc. As each entity set is differentiated, its subsets are named and defined. Once differentiated each subset is treated from then on as a complete entity.

Step two, names and defines each of the distinct, identifiable, relationships between each pair of entities which were determined to be related at the enterprise level. Relationships are represented as a diamond with the relationship name within it. If an entity at the enterprise level has been differentiated into subset entities at the Entity-Relationship level, it is possible that not all subsets may relate to other entities or entity subsets in the same manner as the enterprise level entities.

For each pair of related entities, each distinct relationship is identified, defined and represented on the model using a relationship symbol (Figure 12-4). The relationship symbol is connected on each side to one of the pair of entities which participate in the relationship, and should contain the name of the relationship between these two entities.

The above procedure should be repeated until each distinct relationship between the pair have been named, identified and placed on the model. It is possible for multiple pairs of subsets of each enterprise level pair to have the same named relationship.

When completed, at least one named and defined relationship should replace the line between each two related entities at the enterprise level or between some subset of each of those entities. No entities (or subsets of entities) should be related at this level which weren't related shown to be related at the enterprise level. Each distinct pair of related entities may have one or more named relationships between them. Except for recursive relationships (those where individual instances of the entity are related to other entities of the same type) each relationship should be between two entities of different types.

Additional Rules for ER Model Creation

Regardless of the level being addressed, the following rules apply to the construction of an Entity-Relationship diagram:

  1. Entities:

    1. Each rectangle must represents a single entity, or a homogeneous group of entities or one subset or subtype of the entity.
    2. When developing detailed models, each identified global entity should be decomposed into its component subsets.
    3. The mode of decomposition is dependent upon the characteristics of the component entities and the requirements of the firm for information about those entities.
    4. Regardless of the mode of decomposition, care should be take to ensure that all entity subtypes can be related back to their base global entity. This may be accomplished by special notation or by including the name of the base entity within the entity subtype name
    5. Each unique type of document should be included
    6. Where document include data that must be validated against preexistent reference files, code lists, spreadsheet or other financial tables, etc., the referenced data items should be treated as if they were entities and included in the model along with their relationships to the document

  2. Relationships:

    1. For the sake of clarity and understandability, relationship diamonds are drawn between, and must be connected to no less than one and no more than two entity rectangles
    2. A diamond may be connected back to the same entity, in which case it represents a recursive relationship between unique occurrences of the same entity
    3. Each diamond must represent a single relationship which is known to exist between the two connected entities and is of interest to the firm.
    4. For each line which connects the diamond to a rectangle, at the point where that line joins that rectangle a notation should be made as to whether the two entities being related have a one to one, one to many, many to one, or many to many relationship. If all relationships connected to the entity rectangle do not have the potential to apply equally to the each and every entity occurrences defined to it, then the definition of the entity being used must be changed and a new entity or entity set, or a new entity subset or subsets, must be created until this condition is satisfied.

The above discussion assumed that one and only one enterprise model will be created for the firm as a whole. Since most projects are for specific user areas, it is desirable to create different models for each user area using the enterprise model as a guide to ensure that integration at a higher level can be achieved if desired.

Just as an entity can be viewed from many different perspectives, and may seem to be different from each perspective, so to Entity-Relationship models can be different from the various perspectives of the firm. Each area of the firm defines the entities of the firm in different ways, and relates to them in different ways.

Even at the data framework level, Entity-Relationship models need not contain every entity of the firm. The various models need only contain the entities of interest to the particular area for whom the system is being designed. However in order to serve as a framework, these models should include all of the document entities to be used within the system, the entity sources for those documents, and the relationships between both types of entities.

These Entity-Relationship models are business models, rather than data processing models. That is they reflect the business environment for a data perspective, and depict how the various data files and documents tie together irrespective of how they are used within or by the procedural processing activities of the user. The types of entities and relationships selected to be included in each model, the definitions of those entities all combine to describe the environment, and the nature of the business itself.

Contact Martin Modell   Table of Contents

Data Directed Systems Design - A Professional's Guide
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.