An Entity is defined as a person, place, or thing which is a) of interest to the corporation, b) is capable of being described in real terms and c) is relevant within the context of the specific environment of the firm.
A more precise definition within the context of the classification based data model should have been
For our discussion of entity-relationship-attribute models we must use a different definition
Figure 16-1 How do we describe entities?
The representation of the entity in the data model includes all of the characteristics and attributes of the entity, the actual data elements which must be present to fully describe each characteristic and attribute, and a representation of how that data must be grouped, organized and structured. Although the terms characteristics and attributes are sometimes be used interchangeably, attributes are the more general term, and characteristics are special use attributes.
The definitions of these three terms, attribute, characteristic and data element follow.
An attribute must also be
An attribute must be capable of being defined in terms of words or numbers. That is, the attribute must have one or more data elements associated with it. An attribute of an entity (figure 16-2) might be its name or its relationship to another entity. It may describe what the entity looks like, where it is located, how old it is, how much it weighs, etc. An attribute may describe why a relationship exists, how long it has existed, how long it will exist, or under what conditions it exists.
Figure 16-2 Definition of an attribute
An attribute: (figure 16-3)
Figure 16-3 What is an Attribute
A data element:
The ability to classify the membership of a set of objects (or in this case entities) or to distinguish sets of objects from each other is directly dependent upon the ability to collect data about each member, by which membership in the various classification groups can be assigned.
For example, the classification of companies by type of ownership (figure 16-4) is dependent upon having data bout each company which identifies its type of ownership.
The ability to maintain records about an object is dependent upon the ability to identify what must be kept in those records and to subsequently collect that data. The ability to retrieve that data, especially if it is extensive, is dependent upon the ability to classify, organize or categorize that data into meaningful groupings such that each item of data has a meaningful, logical place, close to data which is closely related to it either by existence dependence or by meaning dependence (the data elements all relate to the same idea, concept or aspect or action of the entity).
In the real world model for purposes of clarity we include all entities regardless of their purpose, and use. The relationships are the real world relationships and describe how the entities relate and interact with each other.
The data model by contrast makes a distinction between entities we are interested in because we collect data about them and entities we are interested in because they are carriers of data or forms and documents we use to collect data about other entities. The real world model describes the relationships between entities, by stating that one or more occurrences of this kind of entity must (or may) be related to one or more occurrences of that kind of entity. The data model describes the data we need to know about those relationships, and how we can determine which actual occurrences of one kind of entity are related to which actual occurrences of another kind of entity and under what conditions or subject to what qualifications.
Normally in the data model we ignore document entities which function purely as carriers of data about other real entities. We instead concentrated on the data they carry and what entities that data describes.
One of the primary techniques for data modeling both in data model development and in procedural development is the Entity-relationship model. The first two of the four Entity-Relationship model levels are used to develop the real world data model. These were the enterprise level model and the entity-relationship level model and were described in earlier chapters. These models can be used at the real world level since they can be developed deductively, and from empirical evidence.
The remaining two models (the entity-relationship-attribute level model and the data element level are inductively developed at the detail design level and are dependent on the full description of the entity family classification structure and a complete set of user views for their completion. The user views are necessary since it is only from them that we can develop the composite data model which is the entity family model and only by examining the complete set can we be sure we have identified every necessary characteristic, and its data content, and every relationship required by every task for data access. All user views are necessary because it is only be examining these that we can determine whether the data we identified deductively (future data needs not currently present, and future relationship needs not currently captured) can be captured and maintained on a practical basis.
Since the data model development at the entity-relationship-attribute level is an inductive process and is dependent upon determining the exhaustive set of user views and the exhaustive classification structure of each entity family it must be the last step in the data model development process.
The development of the Entity-Relationship model is a multilevel process where each level produces a clearer and more well-defined view of the proposed environment. The complete results of this modeling effort results in a series of leveled environmental descriptions, along with a diagrammatic representation of each level. These diagrammatic representations, the ER models, are descriptions of the entities, or real components, of the business environment and how they relate to each other.
On a regular basis the firm deals with customers products, employees, places, orders, shipments, etc. The firm collects data about these things, these entities, and stores that data in files. With few exceptions, it is the common data resource files of the firms which hold the descriptions of those entities in the form of collections of data items.
The ER models are not data structure models, but data descriptions of the entities of interest to the firm. These data descriptions also correspond to the data contents of the files and thus serve well as a mechanism for developing our data model. And, although at their mostdetailed level they contain and identify data elements, and they are not data processing models. They are business models and as such, they model business environments and depict business components.
Entity-Relationship models consist of representations of the various levels and parts of the organization, from the strategic to the operational levels. Each of these leveled models represent the entities and relationships from the perspective of that level, and within a level the Entity-Relationship models represent the perspective of a one or more particular users at that level.
Although there are numerous variations of the Entity-Relationship Approach model notation, the three basic notational components of the Entity-Relationship model consist of symbols representing an entity, a relationship between two entities, and the attributes, or descriptors of either entities or relationships.
These symbols are:
The real world model depicts real world families of entities. At this level the entities have the widest possible definition and scope, while still maintaining the general physical and role characteristics of the individual entities which comprise them. These entity sets are treated as if there were no variations in type and as if each of their component entities were defined in a similar manner and behaved in a similar manner.
Just as we use the general term vendor to represent each (any every) kind of vendor and employee to represent each (and every) kind of employee, so to we use an even more general term to represent all the various kinds of people, places, thing, concepts and events of interest to the firm. Depending upon the model (since a different definition is used for each model - real world and data) the term, entity, thus represents either any one of the real world things in the real world model, or any one of the generalized collections of data we gather, record, and maintain in the data model.
At the real world level:
In the classification model:
The analysis at the third, or Entity-Relationship-Attribute, level combines the work of the real world level with that of the classification model and the data event maps by adding characteristic groupings determined inductively into a classification structure determined deductively, and by adding characteristics and or attributes to both the entities and the relationships. A characteristic or attribute is represented by a circle attached directly to the entity or the relationship which it describes. The circle contains the name of the attribute. Attributes might be: identification information, residence information, physical description, inventory status, packaging information, hobbies, clothing sizes, etc.
The attribute names for an employee entity, might be very similar to the section or item headings on an employment application, or the section or item headings on the permanent employee record form. For a customer, they might be very similar to the section headings on a new account opening form, or on the customer record form.
For an entity, each attribute represents some grouping of data which is necessary, from a business perspective, to describe a physical or logical characteristic of the entity, or to describe some activity of the entity. For a relationship, each attribute represents some grouping of data which is necessary from a business perspective, to describe, qualify or maintain the named relationship between two entities.
The Entity-Relationship-Attribute model is an expansion of the Entity-Relationship model. Until this point the models have only identified the entities and relationships by name and context. For a given entity or relationship little is known about them other than their name, the obvious fact of their existence, and the fact that the firm is interested in them.
At the Entity-Relationship-Attribute level, entities and relationships are described in terms of their attributes, or characteristics. In other words, beyond knowing that the entity exists, we must also know what the entity looks like, how it is identified, and what it does. These descriptors or characteristics are called attributes. An attribute is thus any distinct aspect of the entity or relationship that is necessary to describe the entity or to qualify the relationship. The full description of an entity or relationship consists of the full set of attributes which describe that entity or relationship.
For an entity attribute to be significant it must relate directly to the entity, be completely dependent on the entity for its existence and meaning, and it must be definable in terms of one or more data elements. It is immaterial as to whether there are one or more data elements in an attribute, as long as the attribute applies to all instances of the entity being represented. Seen another way, an attribute is some distinct category of mutually related data, the sum of which describes something of interest about the entity. The identifiers (unique or otherwise) of an entity (figure 16-5) are a special form of attribute.
Figure 8-3 Entity Attributes
Entity attributes (Figure 16-6) represent:
For an relationship attribute (Figure 16-7) to be significant it must relate directly to the relationship, be completely dependent on the relationship for its existence and meaning, and it must be definable in terms of one or more data elements. It is immaterial as to whether there are one or more data elements in an attribute, as long as the attribute applies to all instances of the entity or relationship being represented.
Figure 16-7 Relationship Attributes
Seen another way, an attribute is some distinct category of mutually related data, the sum of which describes something of interest, or some qualifier about the relationship between two entities. A relationship attribute must be dependent upon the connection between both entities and should be incapable of existence in the absence of that relationship. The minimum attributes of a relationship are the necessary identifiers of each entity of the related pair.
Relationship attributes (Figure 16-8) represent some descriptor or qualifier of the relationship such as:
It is possible for the same named attribute to be used to describe many different entities and relationships. Identifier attributes in particular describe both the entities and the relationships between them.
Entity-Relationship-Attribute Level Model
The creation of an Entity-Relationship-Attribute model is a multiple step process. This step produces the most detail model. Step one extracts each entity from the Entity-Relationship model and places it at the top of a separate page. Each distinct relationship between each pair of related entities is extracted from the Entity-Relationship model, and placed at the top of a separate page.
Step two, identifies, names and defines each attribute of each entity. Each attribute, represented by an attribute symbol, is drawn below the entity symbol and is connected to the entity by a single line (Figure 16-9). As each attribute symbol is drawn, the attribute name should be placed within it. Although not a requirement, as each attribute is identified and named, it is helpful to annotate it with a discretenumber or "n" (denoting some unknown number more than one) to indicate how many occurrences of this attribute (Figure 16-10) would necessary to describe the entity.
Step three identifies, names and defines each attribute of each relationship. The attributes of a relationship are those categories of data which are necessary to qualify the relationship, describe when and under what conditions it occurs, and any other information which relates only to the connection between the entities and not to either entity independently. The relationship attributes should include all attributes necessary to clearly and completely identify the any qualifications of that particular relationship between the two entities and the conditions under which the relationship exists (Figure 16-11).
As each attribute is identified and named, it is be drawn below the relationship symbol and connected to the relationship by a single line. As with the attributes of entities, it is helpful to annotate the attribute with a discrete number or with "n", to identify the number of occurrences of this particular attribute which are necessary to fully describe or qualify the relationship.
As the attributes of each relationship are modeled, the relevant attributes of each of the entities of the related pair, which are of interest within the context of the relationship should be extracted from the attributed entity model and added to the entity symbols of the relationship model (Figure 16-12).
In data processing terms, and in a very general sense, the attributes within this model can be considered to be the identification and definition of the record types (or record groupings) which will ultimately contain the data elements. It must be noted that each attribute at either the entity or relationship level represents a mutually exclusive and mutually independent category of data. However, an attribute may or may not represent an actual record type.
In the logical data structure models, created at a later date from these Entity-Relationship-Attribute models, attributes may be combined to form records more general records, they may be kept separately or in some extreme cases because of the complexity of the attribute, an attribute may be split into many records. The names of the entities are the names of the logical data aggregates (or structures) of the environment.
A fourth, or data element, level may be added when the models are developed in conjunction with the data processing systems development projects. This is the level which is most familiar to data processing specialists and consists of identifying and defining the specific data elements which are needed to describe each attribute of each entity and each relationship. Data elements are assigned only to attributes. In a sense, data elements are the attributes of the attributes.
Additional Rules for ER Model Creation
Regardless of the level being addressed, the following rules apply to the construction of an Entity-Relationship diagram:
Under these conditions each occurrence of the attribute symbol should have the same name, and some notation which indicates that it is identical in format to attributes which appear elsewhere.
If all attributes and all relationships connected to the entity rectangle do not have the potential to apply equally to the each and every entity occurrences defined to it, then the definition of the entity being used must be changed and a new entity or entity set, or a new entity subset or subsets, must be created until this condition is satisfied.
Although the above discussion assumed that one and only one model will be created at each level, and for the firm as a whole, most projects are for specific user areas, it may be desirable to create different models for each user area.
Just as an entity can be viewed from many different perspectives, and may seem to be different from each perspective, so to Entity-Relationship and Entity-Relationship-Attribute models can be different from the various perspectives of the firm. Each area of the firm defines the entities of the firm in different ways, and relates to them in different ways.
All Entity-Relationship models need not contain every entity of the firm. The various models need only contain the entities of interest to the particular area being modeled. To illustrate:
This type of model might reflect all the processing stations through which a particular document must travel, or the work stations through which a manufactured part must pass.A process model does not reflect what processing is done, or even how that processing is done, but rather a station where processing of a particular type is done. That processing could be complex or simple.
The various Entity-Relationship Approach models are business models, rather than data processing models. That is they reflect business environments, not methods of processing. The types of entities and relationships selected to be included in each model, the definitions of those entities, and the attributes used to describe those entities and relationships, all combine to describe the environment, and the nature of the business itself.
Data Analysis, Data Modeling and Classification
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.