The Entity-Relationship Approach

Contact Martin Modell   Table of Contents

CHAPTER SYNOPSIS

The entity-relationship approach to analysis is a relatively new technique--one which is not widely understood or used. By focusing on the business and data entities of the firm, their relationships to each other, and the attributes which are needed to describe the entities and qualify their relationships, the analyst focuses on the real world, people, places, and things of the business--things and relationships which can be readily observed and documented.

This chapter discusses the theory and concepts behind the entity-relationship approach, and how it can aid in focusing the analysis and in identifying entities, relationships, and attributes.  The various levels of entity-relationship analysis are also discussed.

The Entity-Relationship Approach

The entity-relationship approach consists of an analytical method and a modeling technique.  The entity-relationship (ER) approach was first described by Dr. Peter Chen in a paper which appeared in the first issue of the ACM publication, Transactions on Database Systems in 1976.  It is now recognized as one of the most important tools in the data administration tool kit.  It is incorporated in various forms, into all major CASE tools, and is supported by all dictionaries, repositories and encyclopedias.  Properly understood and used, the ER approach can greatly reduce the time needed during the analysis phase of the development cycle and at the same time greatly increase both the accuracy and completeness of that analysis.   With only a few exceptions all popular development methodologies use the ER approach to model data.

Most popular analytical approaches, or methodologies, focus either on the processes being performed or on data elements presumed to be needed by the user.  Some concentrate on trying to fit lists of data elements into one of the data structure models which can be implemented by a DBMS, others on designing reports, screens, and files, and still others on following trails of transactions through the various processing stages.  From these processes, flows, data elements, and/or outputs, they attempt to re-create the real world.  Many attempt to recreate the processes from the desired results.

Each method approaches the data problem differently, and their results, many times, resemble the results of the blind men who examined the elephant: the one examining the tail thought it felt like a rope, the one examining the sides thought it felt like a wall, the one examining its legs thought it felt like a tree, and the one examining its trunk thought it felt like a snake.  In a sense they were all right, but, at the same time, none was right.

In the business environment, examining only transactions, processes, outputs, or data flows, or even a combination of all four, produces a picture which is correct as far as it goes, but which is not a true or complete picture of the environment.  The business environment is also populated by people using things, and both people and things are located in places.  Any business description must not only include these people, places, and things, but it must also start with them.  These people, places, and things are called entities.  These entities interact with each other in various ways, and those interactions are called relationships.

People, either individually or in groups, work with things or provide services for other people.  Since both the people and the things are real (they physically exist), they must be described and they must be located somewhere (in some place).  Additionally, relationships which exist between people and things, people and places, things and places, different types of things, different types of places, and different types of people themselves must be described.

These entities may be well defined, in that the firm may know a great deal about them, or they may be vaguely defined, in that the firm may know very little about them.  In some cases, such as with either prospective customers or employees, the firm may only know or suspect that they exist, but not who or where they are.

These entities may exist in large homogeneous groups where all members are capable of being described in the same manner, or they may be fragmented into many different subtypes, each with descriptions which are either slightly different, or in some cases radically different, from other members of the same group.

Just as entities are real so are the relationships that exist between them.  And as with entities, the relationships may be well defined in that the firm may know a great deal about them, or vaguely defined in that the firm may know very little about them, as little as first knowing or suspecting that they exist.

The power of the ER approach lies in its ability to focus on describing entities of the real world of the business and the relationships between them.  By describing real world entities through the identification and assignment of attributes to them and their relationships, the analyst is describing how and why the business operates.

Although the business itself may change, sometimes dramatically, these types of changes occur much less frequently than changes in the routine processes and activities.  Regardless of how the business changes, the entities of the business rarely change.  What may change, however, is the firm's perception of which attributes of those entities are currently of interest.  Some relationships between these business entities may also change, but even these relationship changes occur infrequently.  Thus by understanding and properly describing these entities and the relationships between them, the analyst can form a very stable foundation for understanding and analyzing the business itself, and for properly recording the results of, or changes caused by, the processes of the business.

As with any analytical method, the effectiveness of the ER approach is limited, or constrained, by three factors, all of which have to do with the analyst's understanding of the business environment.  These closely related factors are (1) entity identification, (2) entity definition, and (3) business context.

Entity identification consists of recognizing the various entities, determining why they are of interest to the firm, and naming them.  The identification process must specify the entity at the precise level of specificity which ensures that it is not so general as to be meaningless, and yet not so specific that it fragments into too many subsets.  For example, people as an entity would be too general since it includes both customers and employees, among others.  On the other hand, full-time employees and part-time employees would be too specific since both are employees and "full-time" and "part-time" are attributes of employee.

Entity definition consists of identifying which attributes of the identified entities are needed by the firm and why those attributes are of interest. For example, is the firm interested in the attribute "hobbies" or "clothing sizes" for the employees? If the firm deals in sporting goods, the answer to the former might be yes.  If the firm provides uniforms for its employees, the answer to the latter might also be yes.

Business context involves identifying and defining the relationships which exist between the identified and defined entities, and their relative importance to the firm as a whole and to each of its specific parts.   Business context also involves identifying and defining the use or role of each entity within the firm.   An entity's appearance, role, or use in one firm may be entirely different in another firm; yet the entity itself is the same.

Just as an entity may have different roles or uses between firms, so also, each part of the firm may have a different perspective on the business, and, consequently a different perspective on the entities of the firm. This perspective does not change the fact of the entity's existence, only the attributes and relationships of those entities which are of interest to individual portions of the firm and their role or use in that firm.

The specific definitions of these entities and their relationships with other entities within the firm are relevant only within the context of that firm and are totally dependent upon the attributes of the entities which are of interest to the firm.  An entity within one firm may be only an attribute of an entity within another firm, and vice versa.

The importance of identification, definition, and context can be seen when one looks at the formal definitions of the three key elements which form the heart of the ER approach.  These definitions form the basis for both the data analysis method and the data modeling technique of the entity-relationship approach.

A definition

An entity is defined as a person, place, or thing which (a) is of interest to the corporation, (b) is capable of being described in real terms, and (c) is relevant within the context of the specific environment of the firm.

A definition

An attribute is any aspect, quality, characteristic, or descriptor of either an entity or a relationship.  An attribute must also be (a) of interest to the corporation, (b) describable in real terms, and (c) relevant within the context of the specific environment of the firm.

An attribute must be definable in terms of words or numbers.   That is, the attribute must have one or more data elements associated with it, one of which may be the name of the entity or relationship.  An attribute may describe what the entity looks like, where it is located, how old it is, how much it weighs, etc.  It may describe why a relationship exists, how long it has existed, how long it will exist, or under what conditions it exists.

A definition

A relationship is any association, linkage, or connection between the entities of interest to the corporation.  These relationships must also be (a) of interest to the corporation, (b) describable in real terms, and (c) relevant within the context of the specific environment of the firm.

It is important to note at this point that relationships exist only between entities, not between attributes of entities.  To illustrate,

The entity "person" could be anyone.

When the attributes name, age, and sex are added, we can distinguish men from women, adults from children, and one person from another.

When the relationships are added, we know whether we are talking about a group of unrelated people, a family, or a corporation.

Entities and their attributes

To describe an entity, we must describe it in terms of its attributes and its relationships with other entities.   An entity description consists of a series of statements which complete a phrase such as "the entity is...," "the entity has...," "the entity contains...," or "the entity does ...."

Each attribute relates to the entity in hierarchic terms, that is, all attributes of the entity are fully dependent upon the entity itself because individually and together they are the entity.  The question can still be asked, however, "How can we begin to identify these entities?" Is, for example, the entity identified as customer (representing all customers), or is it the specific type of customer (such as mail order or retail), or is it a single customer? The answer is that it can be all of these, none of these, or more than these.

The specific identification of the entity has meaning only within the context of that firm.  However, most businesses can be described using a fairly restricted set of generic entity types such as customer, product, machine, employee, location, organizational unit, etc.

An entity is whatever the business defines it to be, and that definition must make sense within the context of the firm.  Thus, an entity in one firm may be a subset of entities included in the entity definition of another firm, or may be the global definition of the entity used within another firm.  These differences in identification can be illustrated by the following example.

An illustration:

A town planning board, with responsibility for community planning and zoning, would describe that community in terms of its buildings, and further define those buildings into residences, offices, stores, warehouses, and factories.

It might be interested in which people or firms occupy or own those buildings, but for their purposes that information would be an attribute of the building, just as the size of the building, the number of floors, the number of windows and doors, and the cost of the building are attributes.

On the other hand, the local tax assessor doing a census or community directory, would be interested in the people who live and work in the community and firms located there.  The assessor would be interested in the names of the people, their incomes, length of residence, amount of taxes paid, and where they live or are located (the buildings) within the community.  In this case the buildings become attributes of the people.

Neither the buildings nor the people have changed.  They both still exist.  The perspective, however, has changed; the things of interest about those buildings and people have changed.

The town council, would need to know all the information about both the people and the buildings, along with information about roads, utilities, etc.   In this case, both the buildings and the people become entities in their own right, along with the relationships between them (who lives or works where, who owns what, and so on).

This need for both attributes and relationships is consistent with the accepted dictionary definition of an entity: "the fact of existence; being.  The existence of something considered apart from its properties." Thus although the entity exists, its true form and role are only apparent after its attributes are added.

Without attributes all that is known about the entity is that it exists.   The distinction between an entity and its attributes, and the relationship between an entity and its attributes, is so important that the ER diagram distinguishes between an entity and its attributes by using different symbols for each.

The entity-relationship model

The entity-relationship model represents a conceptual view of the world; as such it is independent of any DBMS or data processing considerations.  It is a creature not of data processing but of the business environment.

Although we speak of an entity as if it were singular, in reality it is that set of persons, places, or things which have a common name, a common definition, and a common set of descriptors (properties or attributes).

The entity representation in the model, while it may represent a single instance, usually represents numerous people, places, or things which have a common name and common descriptors and thus can be treated as a set.   These entities interact (relate) with other entities.   These interactions form a complex set of relationships.

An entity, although it exists physically, only has physical substance when it is described in terms of what it looks like, where it is, what it does, and how it relates to other entities.  Each component of that description is a property or an attribute of the entity.   The sum of the properties is the entity.

These entities are physically real and their real properties can be described; these people perform actions, using and transforming both things and information (which is contained on things as data).  The common characteristic of all entities is that we can describe them, and we use words and numbers for that description.   Collectively these words and numbers are data; individually they are data elements.

The fact that entities, especially in the data processing environment, are described by data, does not make them data objects nor is every collection of data elements an entity.

Some writers have suggested that data entities are built from collections of data elements in the same manner that a car is built from a collection of parts.  In fact, an ER model can be complete and meaningful with no traditional data elements at all.   The parts of a car were specifically chosen because each contributes something to the overall design of the vehicle.   Any number of different sets of parts could be assembled and would result in a car, but a specific car can only be built from a specific set of parts.

A car is a thing.  It is a subtype of the larger group of things called vehicles, and part of another subtype called self-powered vehicles which transport people and things.   Just as there are many different types of vehicles, not all of which are cars (some may be boats, planes, or trains), so too there are many different types of entities.

A final type of attribute needs to be discussed: attributes which do not describe the thing itself, but what it does, how it is used, or why it is used.   Those things that an entity does are called activities; collectively they are called processes.   The attributes which describe these entity activities are called processing-related attributes.

The processes, or activities, of the business are in reality the actions that people take with respect to things, places, or other people.  These actions usually result in some change in the physical appearance, state, or condition of one or more entities, or sometimes in the creation of a new entity.

An illustration

The physical characteristics of the car, its size, weight, year, make and model, color, and parts list represent the car itself.   Whatever happens to it, so long as it remains a car, these characteristics (except for possibly color) will never change.   Whether it is owned by anyone or not, new or used, in good repair or falling apart, driven 1 mile or 100,000 miles does not change the fact of its existence.  However, the fact that the car exists is meaningless unless we put it in a context which tells us why the firm is interested in it.

A new and used car dealer, or a company fleet manager, might want to know other things about it, such as ownership, use and usage, options and accessories, etc.   They might also want to know such information as:

  1. How many miles it has been driven
  2. How much gas it uses
  3. How many times it was serviced
  4. How, how many times it was in an accident
  5. How many different people have driven it
  6. What it cost new
  7. What it costs now
  8. How much it costs to maintain

These latter attributes are really process attributes.   They are part of the description of the car, but these attributes tell us what was done to or with the car, not about the car as a thing.

An auto parts dealer, might be interested in the parts of the car themselves, both new and used.  In this case both the year, make, model, and color of the car become attributes of the part, along with its usage characteristics (if it is not a new part); its cost, size, shape, and weight; how many are used in a specific year, make, and model.

A specific part could be elemental such as a bolt, tire rim, windshield, etc., or it could be a complex subassembly such as a transmission, radio, motor, etc.   It could fit one year, make, and model of car, or any car.  By combining several of these parts into a subassembly, we have in effect created a new "part."

All entities and most relationships have these types of process attributes associated with them.  Process attributes are variable in that their values change frequently, and these changes usually involve the participation of some other entity.   Thus, since they relate what one entity did to another, or where or how many of one entity are contained in another entity, they are normally descriptive of the relationship between the two, rather than descriptive of one entity or other, although obviously they could be.

The process of identification, definition, and contextual placement of the entities is vital to any understanding of the business and to any effort directed at either application development or file design.  Processes like data normalization (a much-discussed concept) cannot be meaningful unless we know what those entities are, what the difference is between an entity and an attribute of an entity, and further what relationships exist between those entities.

This process of identification, definition, and contextual placement is greatly assisted by the creation of entity-relationship diagrams as one proceeds from analytical level to analytical level.

Entity-Relationship Analysis

Entity-relationship analysis is a multilevel process where each level produces a clearer and better defined view of the environment.   The complete work products of this analysis result in a series of environmental definitions along with a diagrammatic representation of that level.

Enterprise level

The first, or enterprise, level consists of identifying the major entities of the firm.  Although an entity is usually represented as a single instance, at this level each entity represents a whole class of people, places, or things.   Here the definition is very general and represents all people, places, or things which relate to the firm in the same general manner, or which are viewed by the firm in the same manner.

Potential entities at this level might be employee, customer, organizational unit, order, or product.  Each of these major entities should be related to at least one other major entity.   The definition of the entity at this level does not distinguish between entity types, entity roles, or entity activities.  There is no differentiation between the various subcategories of the entity nor are any other distinctions made.

The definition is made as general as possible without losing the concept itself.  For instance, a customer may be defined as "any person or organization which buys, rents, leases, or otherwise acquires product or services from the company." A product is "any physical thing or service which the company provides to its customers in the course of conducting its business (not necessarily for a price)."

Every entity should be related to at least one other entity.   There is no differentiation between the number or types of relationships between any two entities.   Like the entities themselves, it is a binary condition: it exists or it does not.

The enterprise-level analysis has the broadest scope and the most general definitions of all the analytical levels.  It is intended to serve as a broad brush view of the corporation and as a road map for further analyses.  As the analysis proceeds to each successive level this model may be modified, with entities or relationships being added (the more usual case) or deleted.

An enterprise level analysis will usually identify from 10 to 30 entities.   The number of entities which appear will depend upon the complexity of the corporation and the type of business.

Another determinant is the way in which the entities are defined.   A definition of the general entity "employee" will be less complex than one which defines more specific entities such as sales, production, and back office.   Again, a definition of the "organizational unit" as a general entity will be less complex than one which defines sectors, groups, divisions, or subsidiaries.

There are no rules governing how the entities are defined at this or any other level, except to say that the definitions should be consistent with respect to their level of abstraction (that is, do not define one entity as "all employees" and at the same level differentiate between all different types of products or customers).  These definitions should also make sense within the context of the firm, and they should be as specific or as general as is necessary to make the diagram clear and readable.

Entity-relationship level

The second, or entity-relationship, level is an expansion of the enterprise level.  At this level the entities which were previously identified only at the class level can now be brought into sharper focus.  This level recognizes differing types of entities.

As with the previous level, the entities are in reality groups or sets of people, places, or things; however these groups may be much smaller and much narrower in definition than they were at the enterprise level.   For instance, it may be relevant to a company to differentiate between types of customers, such as between institutional and retail in the brokerage industry; between subscription and mail-order in the publishing industry; between different types of products, such as spare parts and finished items, or between elemental parts and sub-assemblies.

In some cases the various entities may be differentiated by the role they play with respect to the firm.  For instance the analysis might make distinctions among executive managers, middle managers, clerical workers, professionals, and sales personnel, between full-time and part-time workers, between salaried and hourly workers, or between union and nonunion workers.

Distinctions can be made with respect to functions, e.g., among production, sales, engineering, and back office personnel.  In each of these cases, although the people in each category are all employees, they are treated differently by the firm or they play different roles with respect to the business of the firm.

As each distinct subclassification of entity is identified, it should be named.   As with the enterprise level, each of the entities identified at this level should be related to at least one other entity, and may be related to many other entities.   At this level each specific relationship which exists between each pair of entities is identified and named.

Except for recursive relationships (those where individual instances of the entity are related to other entities of the same type), each relationship should be between two entities of different types.  In addition each relationship must have a name that is descriptive of the particular relationship between the two entities being joined.

Entity-relationship-attribute level

The third, or entity-relationship attribute, level is similar to the second-level diagram with the exception that attributes are identified for both the entities and the relationships.   Each attribute is named, and that name is indicative of the type of information which that attribute represents.   For a person entity these attribute names might be family information, residence information, physical description, hobbies, clothing sizes, etc.

These attribute names for an employee entity might be very similar to the section headings on an employment application, or the section headings on the permanent employee record form.  For a customer, they might be very similar to the section headings on a form for opening a new account or a customer record.

Data element level

The final, or data element, level is one which is most familiar to data processing specialists.  This level consists of identifying and defining the specific data elements which are needed to describe each attribute of each entity and each relationship.   Data elements are assigned only to attributes.Since we know what the entity represents and we know what the attribute represents, the addition of elements is normally a relatively straightforward process.

Classification Analysis

Categorization is a Fundamental Process

Categorizing or classifying things is a fundamental process of human  existence.   The world we live in, business or personal, real or  conceptual is composed of myriad of things.  Some of these things  have very real differences between them, others are somewhat similar  and still others are highly similar to each other.   The differences or   similarities between many of these things are sometimes more   artificial than real.   Distinctions are made between groups of things  because it is clearer to do so than it is to refer ungrouped things.   One reason for making distinctions between things is to put them into  groups which are easily manageable or understandable.

In almost every case some common characteristic of these things is   used to make those distinctions.   Sometimes, several characteristics   are required in order to make those distinctions.  Because things in   the real world have many characteristics, any set of characteristics  they have in common can be used to make these distinctions, or to   group the things.  For purposes of illustration only, let us take one  of the largest group of things which we deal with, people.

Obviously the world is full of people and it would be impossible to   deal with or discuss people in general in any meaningful manner.   There are just too many different kinds of people.  There are only a  few things that you can say about people in general without excluding   some of them.

Once you start adding obvious physical characteristics such as age,   sex, race, height, weight, color of hair, color of eyes, etc. you  start to place people into groups which are smaller than the whole   (people).  The more characteristics you use the smaller the number of   members in each group, and the more different combinations of  characteristics you can use to make up each group.  Once you start   using characteristics (or values of characteristics) to group things  you are categorizing or classifying them.

However, using additional characteristics to take a large group and   divide it into smaller groups is only part of how classification  works.   Characteristics can be also be used to help define vague or  complex ideas.   For instance, ideas such as quality, utility, have  long eluded definition.  In many companies the concept of a customer,  or in some cases a product is equally elusive.

 

A definition

To classify is to organize or arrange according to class or category.

A definition

A class is a set, group, collection or configuration containing   members having or believed to have at least one attribute or characteristic in common.

A major process of data modeling is to determine how data required by   the firm to describe each entity of interest to the firm must be  grouped for optimal efficiency, accessibility and usefulness.

Classification techniques are an invaluable method for assisting the   data modeler in constructing those groupings.   Classification   techniques can also assist the data modeler in determining the  dependency relationships between those data groups, and conversely in  determining which data groups are independent of each other.    Classification is also the preferred method, and the most accurate  method for handling that most difficult of data model chores, the   modeling of roles.

Classification of Entities

An entity is a fact of being. Everything that exists in reality, or   in the perception is an entity.  Because entities are devoid of  attributes it is not possible to classify or group them.   However we  can state that since everything is an entity, and everything consists   of persons, places, things, concepts or events, then those things are  entities by definition.

Although the definition of an entity for data modeling purposes   distinguishes them into five groups, even these groups are too general  to work with.

For the purposes of developing a business system design, and in   particular the data models that are an integral part of those designs,  each of these five classes must be divided into two, more restrictive,   classes, one class containing those people, places, things concepts  and events our firm is interested in and one containing those in which   it is not interested.

Normally, data models do not include entities the firm is not   interested in and thus they are discarded after identification.

The remaining members (figure 9-1) of each of the five now restricted   classes can be further classified into two still further restrictive  classes, those the firm  must collect data about and those it does not  have to collect data about.  In this case however many models do not  discard those entities about which the firm does not collect data.  In  those cases they are used for consistency purposes and to provide  context for the remainder of the model.

We can see from this discussion that the term entity (the highest   grouping in the data model) already represents three levels of  categorization or grouping before we begin.

Returning to the people illustration, we now have a group called   people, more specifically people the firm is interested in (for  whatever reason) and more specifically people the firm is interested   in and about which the firm must collect data or maintain records (for  whatever reason).

This is still a fairly large group, because we are interested in   different groups of people for different reasons (figure 9-2), and  each of those different reasons usually dictate that we need to   collect specific kinds of data about each group.   However since they   are all part of a larger group called people they must obviously have  certain characteristics in common.

Just as we group entities into people, places, things, etc., and   because entity was too large and too general to handle in a meaningful  way, so too we categorize each of those groups into smaller groups,  i.e. kinds of people, for ease of handling.   These grouping categories  may be based upon what the entities are, what they look like, what they do, what purpose they serve, how they are used, etc.

Once the classification scheme is known, at various times during the   design process, the designer can use each of these categorizations or  classifications for purposes of discussion, analysis or usage,   recognizing that however they decide to group them for a particular  purpose, the base population remains the same.

A given population may also be grouped or categorized for various uses   by the values of characteristics selected for that purpose.

 When producing a real world business model, entities may be  concurrently grouped by what they do, by the purpose they serve, how  they are used, sometimes by what they look like, and sometimes by what  they are.  It is this ability to and need to concurrently group things  in more than one way that distinguishes classification based data   models.

It is easier to view these smaller groups of the larger population by   group name than it is by naming the individuals which comprise the  group.   Group names are used because although each member of the group  is different and uniquely identifiable, the group's members are  similarly described, act the same way, or are used for the same   purpose.  The group names tend to reflect these actions or usages.    The group name is in many cases identical to the characteristic used  to distinguish the members of the group.

The more general the statement of purpose, the description of the   actions or usage, or the characteristic, the more members the group  will contain.

Conversely, as these statements of purpose, description of action or   usage become more and more restrictive, the narrower the group becomes  and the smaller the number of potential members.

Similarly, in the data model, as the definition of the characteristics   becomes more and more general the group which can be included under  that definition becomes larger.  The more specific the characteristic  definition or the more extensive the list of characteristics, the   smaller the group that can be constructed.

Types, Subtypes and Groups

 In many data modeling texts the terms type and subtype are used.  A  type is a group.   There are broadly defined groups and narrowly  defined groups.   If a type is a broad group, than a subtype is a  narrow group within the broad group.  However both broad and narrow  groups, groups and subgroups, types and subtypes are all still groups

Each group, broad or narrow, large or small, has some number of   entities which have a set of characteristics in common.   The  characteristics may be very general or inclusive, or very specific or  exclusive, or some combination of both.

Each entity of any given group may have many characteristics but only   share some in common with other members of the group.  The number of   potential groups which can be formed is determined by the number of  identifiable characteristics, the number of characteristics selected,  and the number of meaningful combinations of characteristics for each  number selected.

Entity Families

 In the real world entity model and in the data entity model, we  attempt to use the most general, yet most meaningful, classification  or categorization possible.  These broad classifications of entities  are called families.   A real world entity family (figure 9-3)   represents a general class or group consisting of all members who  share some minimum set of characteristics in common.  A data entity  family represents a general class of highly intra-related data about:

  1. All the members of the family,
  2. Each characteristic which distinguishes each group of members from each other group of members at each level of categorization
  3. The progression from general to specific, of the characteristic sequence or string for each of the lowest level groups (those with the most characteristics needed to distinguish them from all their siblings and cousins).
  4. The attributes about the members of the group, over and above the characteristics used to form the group (remembering the inheritance of characteristics, and attributes).

In developing entity models, we seek to identify the broad real world   entity groups (or families) that populate the internal and external  environment of the business (figure 9-4).  Some of the entities will  concern us and some will not.  They all have one thing in common, they  were derived   ultimately from the statements of mission, goals, objectives, etc.,  which were used to define the strategic and tactical direction of the   company and its business processing rules and determinants.

Real world entity reference and usage is within the context of the   business and its concomitant actions.   For representational reasons   ease and clarity of definition, and ease of handling and discussion,  they are segregated into subsets or groups, but all are nothing more   than some aspect of the whole, and therefore unified entity.

Active versus Passive Entities

There are two additional ways in which to classify data entities, as  either active or passive.  Active data entities are the data entities  which change over time, which do things or cause things to be done.

The other category of data entities are of interest, more so because   they describe and/or relate entities.   These are passive data   entities.  These data entities are usually fixed in data content, come   about full blown, or exist more conceptually than in reality.  Some   examples of these are job requisitions, job or other related skills,  education, locations, organizational units, sales territories, sales   offices, and job descriptions, to name a few.  These passive entities   have static data content, and no meaningful life cycle of their own.

These data entities are describable, but in narrative terms, or as   lists of other items, rather than physical things.  They are carriers   of a concept or an idea.   Again, there is overlap between active and  passive entities, mostly dependent upon viewpoint.  There are no hard   and fast rules, but it is important to recognize their existence.  It   is important in the data modeling portion of systems design to  identify both active data (non-static) entities and passive (static)   ones.  From a business systems design standpoint they behave  differently and are used differently.

Process Control Entities

 There is one final class or family of data entity which appears in   many data models.  The family has no name, and thus can be called by  any name.  We will call it the process control family.  Its members  are used to remember sequences of data events and to guide processes,  and later data events, within the organization.

If we scan all of the documentation collected and generated from the   analysis phase, we would probably have a myriad of entity groups, and  probably only a few entity families and no definitive way to tell   which is which.   Entity class, entity set and entity family are   interchangeable terms within this context.

An entity family consists of all individual and groups of member   entities which behave the same way in our organization.  All entities   which have the same role in the organization or which relate to the  environment in the same way usually belong to the same family.   Thus  the entity family customer - contains all entities who fill the role   of customers, that is who order, receive, and or pay for the products  or services of the company.  This family may also include both past  customers, present customers, and potential customers, and these may  be both active and passive.

Entity Definition

 For each entity (family or otherwise) identified by the design team, a   definition must also be created.   These definitions will provide   valuable insight into where the entity belongs, and what level of  generalization it represents.  For an entity family, the definition  should be broad or general enough to include all members of the   family.  We must describe the role that the members of this family   play.  We must define as completely as possible, who and what are the   members of this family, or more succinctly, what is the universe of  this family.  The definition should also permit the determination of  who are and are not members of the family.  This is usually stated in  the form of tests or characteristics.  The definitions of groups  within a family will always be more restrictive than the family.

Entity Family versus Entity Group Reference

 All entity references throughout the system design will either be to a   family as a whole, some group of members (figure 9-5) within some  family or to some individual member.  All relationships are expressed  in terms of an entity relating to another entity.  In the business  system design models, we assume that because entities relate to the   environment, either explicitly or implicitly, they relate to each   other as well.   These relationships may be strong or weak, active or   passive, and in some cases may be of no interest to the company.

Relationships can be viewed in two ways, entity family to entity   family (inter-family), and entity to entity or entity group to entity  group (intra-family).   These relationships, both inter- and   intra-family are another manner in which the classification scheme may  be represented. To illustrate:

A person may be on the faculty of, be a student of, may be an alumnus   of, a trustee of, and a contributor to an educational institution.   Each relationship represents a separate, distinct, noteworthy, and   more importantly definable characteristic.

Likewise, a person may be a depositor of, a lender to and a borrower   from, a mortgagee of, etc., a bank.

Each of these ideas may be represented by either a characteristic used   to form an entity group, or by a relationship.  These relationships   may be direct or thorough some intermediary entity, such as an account  entity.

Thus each group (figure 9-6) is defined in terms of its relationship   to some member of another family, rather than through shared  characteristics.  In this representation however the characteristic is  transferred to the relationship.  The model's descriptions  must  explain as fully as possible:

  1. Each of these intra- and inter-family relationships
  2. The conditions under which each relationship exists,
  3. The group of entity within each entity family which participates in that relationship.

Since the majority of activity within an organization can be expressed   in terms of the relationships between entity and entity, or between  entity and company (also an entity, by the way), we can expect that there will be a large number of intra-family relationships in the  structure and that relationships between entities both intra- and  inter-family will be multiple, conditional, and complex.

Distinguishing between Entities (Entity Roles)

Entity groups, at the family level and below, are primarily developed   from the role which the members of each group play in the  organization.   In some instances however, an entity can play multiple  roles.   For instance, a company can be a supplier and a customer.   While these roles are distinct they are not mutually exclusive.  A  bank's customer can be a borrower for a car, a depositor, and a  mortgagee.  Again, non-mutually exclusive.   The following are general  recommendation's for entity class, or entity family identification,  recommendations which also govern whether similar entities can, or   should be merged into a single family, are as follows:

  1. if the roles are mutually exclusive, i.e. if the entities can play one role in the organization and not any other, define separate entity families. The attributes needed about each role will probably be different   with little in common between roles.   There should be no duplication of individual members in another family.
  2. if the roles are distinct, but not mutually exclusive, merge the entities into a single entity family.

An illustration:

  1. school rules state that no faculty member may be a student of the school.   Although similar in most respects and relationships, these are mutually exclusive entities.
  2. the above school restriction does not apply.  The faculty entity and student entity may be merged.

A third alternative is also possible.  This states that there are entities who play one role exclusively, but there are some which can  play both.   This can be handled in the following manner.

Each different role entity is defined into a different family.   For  each entity that is a member of both families:

  1. define it completely in one family and create a "skeleton" or pseudo-entity in the other.  This pseudo-entity would be related to the real entity.  The pseudo entity would have all of the  characteristics of the second role which are not defined for the  first role.
  2. The entity member can be fully defined in both entity families,   with a relationship and/or indicator in each which notes that the  other exists, and must be kept synchronized.

Entities are assigned to a class, or family, according to the role   they play in the company environment.   Care must be exercised to   restrict the definition of those roles.   All entities in the family or   class play the same role within the organization.  The entities within   each family or class are different, in specific, just as people are  different in specifics, but alike in their general nature and   description.

As with the real world, each entity is unique in that it has its own   distinct set of physical attributes (descriptors) operational  attributes and relationships.  Thus the assumption cannot be made that  all data elements within a given attribute of the entity family will   be present or active for any given member group of the entity family.   Overall however, those elements are needed to describe the entities in   the family.  Since the description of the family can only be in terms  of its family members, the design team must assume that any given   entity family member, may have any and thus all possible attributes  and relationships.

Just as entities are treated as families, so to the entity attributes   are developed for the family.   For instance, the demographic data for   a doctor and a teacher are different (figure 9-7).  These in turn are   different from the demographics of a school which in turn is different  from those of a hospital.  However, all of the above have some  demographic data which we want to record.

The classification of attributes into descriptive, operational or   relational is vague at best.   The categorization of any given   attribute might easily change depending upon usage.  Generally   speaking data that is more stable and infrequently changed is  descriptive, data that is more volatile is operational, and data which   is connective describes relationships.   The categorization does not   affect the usage or structure per se, but assists in the process of  identification, segmentation, partitioning and combination.   We try to  combine entities with like data characteristics.   Each characteristic   of data should be such that it is the only place we have to reference  to obtain data on that aspect of the entity.  If all data  characteristics are such that they could participate in the entity  equation, then they are properly placed.   If the characteristic of  data can appear in the function equation of more than one entity, then  it should be isolated.

 

Exception Attributes

 There is a special category of data characteristics which constitute   an exception to this general rule.   Generally speaking, they can be  termed transient exception data.

An illustration:

The customer family members contain standard ship-to  data and instructions.  On a given order the customer can request a  special nonstandard, ship-to address or instructions.  This special  data applies to and overrides the standard data, and is used only for this one order.   In this case, the customer can override the standard  or default information for a particular order, thus the value of that  attribute is dependent on a specific order and must be recorded as   order operational data as well.   As each order is processed the order   override must be tested for contents, and in absence of override data,  to the customer standard may be used.

The definitions of each characteristic describe their function, or   role - descriptive or operational.   The definitions answer the   questions: What role does the characteristic play in describing the  entity, its actions or our actions against it?  What data would we  expect to find in this characteristic, if we assume that this, and  only this, characteristic described that aspect of the entity.   What  is the definition of this aspect of the entity?   Why is that  particular data characteristic needed?

Data Acquisition and Retention

Acquisition and retention of the data must also be addressed.   If a  characteristic is needed for a given entity, where and how is that   characteristic acquired?

The narratives, for operational characteristics must address the questions, as to what is the minimum level of data necessary to  support the function?  How does the function relate to the entity?

If data characteristics relate to multiple functions, or multiple data   events, what are the identifiers necessary to distinguish a  characteristic from its siblings?  If this is a repeating  characteristic of data, what identifiers do we need, real or internal  to distinguish its twins, its multiple parts?   How many multiples may  there be?

The design team should at this point be capable of creating a   structure or schematic which gives pictorial representation to the  entity families, groups of entities within each family, and the  various facets of classifications of those groups within each family.

They have separated out all of the common data and created separate   characteristics which reflect data which can be used commonly to  describe both the family in general, and each group within the  family.  For these families they should have defined relationships or  structures such that they can relate both the family members to each  other, and members of different families to each other

Some of these new characteristics may themselves be described in terms   of other descriptive characteristics, which are common to still larger  groups.   It is a general characteristic of many of these  families of entities that their description is more narrative than  elemental.  That is, they are conceptual in nature and can only be  described by narrative.

Describing Process Control Entities

 The final family of entity is one which we termed the process control   entity family.  Within this family are members that are kinds of  events, or time sequenced things.

Within any functioning system, with randomly occurring events, there   is a need to "remember" the order in which things happened, or to  store the results of actions such that later actions can be taken   against them.   There is a need to record the results of decisions, or   actions which determine which process is to be taken.  There is a need   to remember the results of tests which once made, govern future  actions and which would be tedious to make again.  In some cases  randomly occurring events must be processed at some later data in a  certain order.  Our process control entity serves this function of  "remembering" time or ordering time.   In essence this entity remembers  sequences of occurrences by recording lists of trigger identifiers, or  entity identifiers.

An illustration:

A process consideration calls for randomly received  orders to be arranged in a priority sequence based upon a set of  variables and for orders to be filled, or attempted to be filled,   based upon FIFO (first in, first out) sequence, by priority.  A member   would be established for each priority and order identifiers recorded  as occurrences of that member according to arrival time and priority.    When orders are filled, the member occurrences are accessed and  identifiers selected to identify the appropriate orders.

To further illustrate, as orders are processed against inventory, they   can be filled and ready to ship, partially filled and partially  back-ordered, or completely back-ordered.  As this determination is  made, order identifier occurrences are created and used to populate  the appropriate members.  As changes occur or as conditions change,  occurrences can be transferred from member to member.   In this manner,   action decisions can be passed simply and easily thorough the system.   Appropriate functions can be performed simply by examining the  relevant member and taking the action indicated.   Sequential and   conditional activities can be taken or not depending upon whether  there are any occurrences of a particular member

This conceptual family of members can be used for system and   operational status checks, work allocation, process control and  procedural control.

Contact Martin Modell   Table of Contents

A Professional's Guide to Systems Analysis, Second Edition
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.