Data Analysis and the Systems Design Phase
The design phase of the System Development Life cycle is sometimes called the proposed environment analysis phase. Systems analysis is the decomposition of the current environment into its component pieces for examination and study. The design process is development of a plan for the building up of a larger harmonious unit from component parts. Design is also the business systems repair phase. That is, during design, all of the problems, or as many of them as can be addressed given available resources, time and technology are corrected. Although this book makes a distinction between analysis and design it should be obvious that some analysis occurs during the design phase as well.
The systems analysis phase looks at the current environment and dissects it to identify existing problems. During the design phase, additional problems arise which also require analysis. Requirements are uncovered which were not identified during the preceding phases and these requirements must be examined for impact and to determine whether they should be addressed and if so how.
In some cases as the design proceeds, assumptions made during earlier phases must be reexamined and corrected. In other cases the design team finds that it must extend its scope further than originally assumed because of previously unidentified relationships between various processes, data or both.
Most design techniques use a top-down approach, performing successive decompositions of the desired and expected final product to determine the component composition and structure of that final product.
From this decomposition the designer can determine where and how the existing components must fit, which components must be changed and how those components must be changed to fit the new design, what new components must be built, and what old components must be discarded. The final plan shows why and how all the component parts must fit together.
However, decomposition, or the breaking down of a whole into its parts is also the way analysis works. Thus design can be considered a special form of analysis - analysis of the future or proposed environment. Many of the tools and techniques of the analyst are used by the designer as well.
The difference between analysis and design is that the analyst uses these tools to describe what is whereas the designer describes what will be. The analyst works with components as they are today, the designer works with many of the same components, most of which however have little tags on them indicating how they must be changed.
Because of its goals, system design works best in a top down manner. Once completed however, that design is implemented from the bottom up. All of the components are changed on an individual basis, and these changed components are assembled with new components into successively larger and more complex subassemblies until the final product is completed.
During the development of a new system, systems designers, along with analysts and user liaisons, work with business data flow, function and process models and narratives to develop the concepts and ideas, the context, structure, framework and detail plans for the new procedural system. They are the architects, the planners, and controlling force of the design process. They specify design, structure, usage and materials. They translate business context, functional requirements, process, activity and user task requirements, along with business data and information requirements into meaningful design.
Their relationship to the application development groups is one of architect/consultant to contractor, and their relationship to user management is one of architect to employer. They can design, suggest, modify and implement, but the end product, in the final analysis, must meet the needs of its occupants, not vice versa.
When the design is successful, all parts work in harmony, each piece performing its required function.
The data analysts, an integral part of the design team, working with the systems analysts documentation of existing data and data files, along with the documentation of as yet unfulfilled user requirements for new data to determine:
Some of these items are technical issues and it is those which illustrate the principle difference between systems analysis and data analysis. Although the systems designer may (some say must develop) develop a procedural system to satisfy the requirements of the business first and should make technical (automation) decisions only after all business procedural issues have been resolved, data analysts must always consider the automation implications of data storage and access when they develop their design recommendations. This is true whether a process directed or a data directed design approach is take.
Many of the recent developments and much of the research with respect to data arose from the laboratories of vendors of automated data processing equipment and Data Base Management Systems, and Data Management Systems software and from those researches who concentrated in the information sciences. Still other advances and understanding of data came from research in semantics and data modeling.
Almost all system development life cycles (figure 5-1) recognize and allow for design as a multilevel undertaking in the same manner that they must allow for the analysis of the existing environment to occur in a multi-leveled manner. As with the analysis phase, although there can be many iterations of design, for practical purposes we usually restrict ourselves to three (figure 5-2).
The first is usually identified as the general or business environmental design level and it focuses on the firm wide functions, processes and data models. This has been called by some, enterprise design. This level of design has the widest scope in terms of numbers of functions covered, and looks at the highest levels of the corporation. This level of design corresponds to the strategic level of the management pyramid (figure 5-3). It establishes the general context within which the individual functional and system frameworks can be developed.
This design level also establishes the standards which will be followed in the design process, the methodology which will be employed, the technology which may (or must be used), and the financial and other resource constraints which will effect the design team.
Within this level of design. corporate wide data models are developed. These models incorporate data perspectives from all sectors of the firm, all levels of the firm, and most include both internal and external data. The principle focus at this level is identification of data subjects and definition of those data subjects. A data subject is some set of things, persons, places or concepts deemed of critical relevance to the strategic and senior level managers of the firm. In some cases this identification is restricted to highly limited sets of numbers, rations, or other indicators which management feel determine the relative health, stability, and prosperity of the firm. These indicators are sometimes called critical success factors.
This level deal mostly in concepts, relationships, assignment of responsibility and setting priorities. It is at this level that management determines overall strategy and direction. The models and products of this level document policy, goals, objectives and allocation of resources. This level also determines schedules and mileposts for all additional work.
The second design level is identified as detail business or client environmental design, and focuses on the functions, subfunctions, processes and data of the individual client or user functional areas. This design level is narrower in scope than an enterprise level design, and may be limited to a single high level function, such as human resources, finance, operations, marketing, etc. This level of design corresponds to the managerial or administrative level of the managerial pyramid. It defines the specific functional and system framework for the design which is being developed.
Within this level the conceptual data models and the corporate definitions of the high level data concepts and data subjects are decomposed (analyzed), refined and more detail is added. Additional information is added and multiple perspectives are examined.
The single corporate model is usually decomposed into multiple function specific models (in some cases these models may cross or span functional lines). Here again there are differences between the system designer and the data analyst. The system designers must retain focus on the satisfaction of the requirements of the functional user. Their's is a procedural focus, and procedures (whether derived from the data or process perspectives) are specific to a user. The data analyst must retain the corporate focus and continually examine specific user data requirements within the context of the corporation as a whole. The corporate data model, and the corporate data framework dictate for the data analyst how to define, organize and store data. The data analyst must continually look for impact on other users and must continually look for additional perspectives when making data design decisions.
Data is one of the primary mechanisms for integration of procedural systems. Data is a corporate asset, and all data ultimately belongs to all corporate areas. The data analyst is the advocate for those areas who might be impacted by specific functional area system designs, but who are not direct participants in the systems design process. Data models and documentation at this level are more detailed than those of the higher level, but not as specific as those required by the designers of the specific user processing systems.
The third design level is the most detailed, and can be considered as the application level which addresses the design of specific user processing systems. This level has the narrowest scope and usually focuses within a single user functional area. At this level of design individual tasks are addressed. This design level corresponds to the operational level of the managerial pyramid. It defines and describes the specific procedures which will be developed and the resources, in terms of data, information, personnel and materials, which are needed by each procedure.
This level focuses on the data items required for processing, and the data items which require processing. That is, this level focuses on those procedures which import new data into the firm and the existing data which must be retrieved from the firm's files to correctly edit, validate and process this new data. This area also focuses on those procedural systems which maintain or update exist data files with internally generated data (data which results from internal activities which must be added to imported data or external data). All systems, whether they bring new data into the firm, or process internally generated data must also retrieve and present data for information, monitoring and control, or analysis purposes.
This level must also resolve the issues of data conversion and where, how and who should gather the data required by the new system. This level must also resolve the issues related the procedures for changing data which already exists. These issues include:
The data models and documentation at this level contain details of file and record organization, data item definition and interpretation, data responsibility and data use.
The application or detail design phase uses the results from phases one and two to devise and design a proposed environment for the user. This proposed design presents the user with a revised function model, a revised process model and a revised data model. It is this phase which produces the final products of the analysis and design process.
Regardless of the level being addressed - the general business level, the detailed business level or the application level - or the methodology employed, each design level should include the following activities:
In addition, to the above which are predominantly process related items, the following data analysis activities (figure 5-4) must also be included:
One of the terms which is used extensively in systems development projects is "integration." Just what is integration, and what do we mean when we say a system is integrated? How do we design an integrated system?
It would seem then that by definition all systems are integrated since a system is a harmonious whole as well. It would seem further that the process of system design is also the process of integration.
But we have said that integration is the bringing together of all parts - in a system what parts are we talking about? What do we bring them together for? How do we bring them together? And finally why must we bring them together at all? Why do we need to design systems?
Another question and equally important in our discussions is - when is a system fully integrated? At what point is a system complete?
The answer to these questions requires the introduction of a few more concepts.
Goals of system design
There are various goals which are common across most system design projects. These are:
What is an integrated systems?
All systems by definition are integrated. But to paraphrase one of the animals in George Orwell's Animal farm - all systems are integrated but some systems are more integrated than others.
Systemization began with early efforts to apply a concept known as methods and procedures analysis to business activities. These methods and procedures analysts examined the tasks performed in offices and factories and developed improved, that is more efficient and more effective ways of accomplishing them. As new and improved procedures were developed for more and more of these tasks, they were collected into larger procedure sets to reduce the redundant activities, and to make each activity more unified with the whole. Activities were separated and rearranged to accomplish the tasks in a faster, more efficient, and more controlled manner.
As more and more procedures were collected under one control mechanism, and as more and more procedures were developed in a mutually supportive, interrelated and interdependent manner the systems became larger and larger and more and more integrated.
The larger the system became, the more integrated the system became. The more integrated a system became the wider its span of control, and the wider the base of company functions it supported. The drive to integration has led some to design systems which attempt to incorporate all procedures which appeared to be even remotely connected under a single control umbrella.
To accomplish this integration, to design these super large systems, various mechanisms have been developed to assist the system designers in their tasks. These mechanisms have not concentrated on the procedural level, nor on the activity or even the process level, instead they have attempted, with varying degrees of success, to focus on the framework level.
The most completely integrated systems are those which attempt to look at the corporate environment from the highest or broadest perspectives, from a top-down, cross-functional, or cross business unit perspective.
Completely integrated systems recognize the interdependency of user areas and try to address as many of these interrelated, interdependent areas as is feasible. These integrated systems are usually oriented along common functional lines, common data requirements or along some other business thread which can serve to tie all of the separate processes together in a unified manner.
Integrated systems are usually developed using a top-down design since it is easier to determine overall requirements, and because integrated systems development makes it necessary to understand at a very high corporate level, the interdependencies and interrelated nature of the various applications which must be hooked together.
The most completely integrated systems are very difficult to design and generally require more time to implement than if the separate parts were developed in a less integrated manner.
Since integrated systems cross functional, and thus user boundaries, many user areas must be involved the design and implementation phases. A multi-user environment is much more difficult to work with, since although the system will be integrated, the users are normally not. Each user brings his or her own perspective to the environment, problems and requirements, and these differing perspectives may often conflict with each other. The designer must resolve these conflicts during the design process, or during the later review and approval cycles. In addition to the conflicting perspectives, there are normally conflicting system design goals, and more importantly conflicting time frames as well.
There are no hard and fast guidelines which distinguish candidate systems for integration. Systems which appear to be totally Stand-alone, that is systems which may appear to be totally complete in an of them selves, can be integrated with other systems if the correct integration mechanism is found. There are no size or complexity distinguishing characteristics.
The level of integration which a firm can introduce into its systems is dependent upon the following issues:
As businesses change and as the need for systems changes occur on a more and more frequent basis, firms are taking the opportunity to expand the scope, control capabilities, and level of integration of those systems. This desire to move these systems further up into the corporation, further out on the growth and maturity curves, and to incorporate more and more information capability into them, makes it more and more necessary, and in fact imperative to redesign the framework of these systems, the manner in which the individual procedural components operate separately and interdependently, and the manner and type of resources which are used in the performance of those procedural operations.
Thus we see that we cannot repair, nor even in many cases renovate these systems, but we must completely redesign them from all perspectives. Because we are effecting change, we must work with the existing environment, we must include those items of change which have been identified during the analysis as needing to be included, and we must include all of the items which we can anticipate will be needed in the future.
Development versus maintenance and enhancement
All Data Processing projects can be assigned to one of three major categories, development, enhancement, and maintenance. Although the phases involved in each of type of project are approximately the same, they vary in degree of difficulty and scope. Even among projects within the same category, there may be large differences in scope and difficulty. Generally, however, each of these types of projects requires some degree of analysis, design, coding, testing and implementation.
Applications maintenance and enhancement projects differ from Development projects in one substantial way. That being that the analysis and design personnel assigned to these projects must also assess the impact of the proposed change on an existing system.
These maintenance or enhancement projects, usually leave large parts of the base system intact. The remaining parts are either modified, or "hooks" are added to the additional code. The requests for maintenance or enhancement changes normally originate with the user, although they may originate with the development team itself. There are numerous reasons for these system modification requests, among them are changes to the business environment, user requested additional capability, correction of erroneous Processing, and user requested refinements, or cosmetic changes, to the existing system.
Those changes which originate from alterations to the business environment are the most difficult to implement, followed closely by those which add new capability. The implementation difficulties arise because these types of changes not only require new analysis of the user area but also re-analysis of the original system design to determine where, and how, the changes can, and should, be made. maintenance or enhancement which is necessitated by correction of erroneous Processing, User desired refinements, or other cosmetic changes, usually require little in the way of new analysis.
The analysis and redesign efforts required by business environment changes and additions of new functionality can be almost as extensive as that which was required in the original systems development.
The most difficult aspects of any change to an application system are those which are directed either toward changes in underlying system design and the resulting processing logic, or toward changing the structure and contents of the Data base
When the maintenance or enhancement project is directed at a system in a database environment, and the database must be changed, that analysis must not only cover the application in question, but also any other applications which use the same data base and in particular the same data records or data elements. In some cases, the immediate enhancement or maintenance project will require data which should logically be captured by a different, unrelated, application. The "chain reaction" or "cascading" of changes can increase the scope and impact of the initial request by orders of magnitude. The greater the integration or interdependence, the greater the potential impact of any change.
We can see from the above discussions that it is not enough to have analyzed and to understand the individual procedural changes but we must also understand the impact of the changes on the system, the direction in which system design must be taken and the reasons why those directions are valid.
We must understand the various ways in which frameworks for these new systems can be developed and what are the mechanisms for increasing the level of systems integration.
Intangible Nature of Design Components
One final point which needs restating here. Of the three components of system design, only two are observable, and based upon physical reality. Of the two only one is consistently real - the individual procedures. While some of the resources are physical - the people, and money, the data is conceptual - hard to visualize, hard to identify and describe.
Because we are building a plan, in advance of implementation, a set of specifications which we want to be able to verify and validate, we must rely heavily on narratives and even more heavily on diagrams and pictures which describe that which we wish to put in place. These diagrams and narratives form models of the proposed environment, an environment which does not and will not exist until we are sure of what we wish to develop.
These models provide us with something to look at in lieu of reality. They provide us with a mechanism for understanding, testing and validating. There are many different types of models, many different types of diagrammatic notation. We should remember as we use them however that there are limitations to notation, and limitations to any model. No model can fully represent reality or even one aspect of reality. The more conceptual the component to be modeled, the more difficult it will be to model it.
We must therefore remember that while we must stress clarity of notation, we must also remember to separate modeling notation from model content. If we need to change the notation of the modeling technique we are using to improve content and clarity of meaning then we should be free to do so. If by mixing and matching different notational styles we improve clarity of content we are free to do so. Most notational styles exist to assist the designers, not to restrict them.
We must also remember that models are usually built to allow us to isolate certain components within the design for examination and study, and that they also allow us to separate structure from detail. Certain types of models only allow certain levels of detail, and certain types of models will only portray certain aspects of the design. Models show us how the pieces fit together, and what the final structure will look like when all the pieces are in place. they allow us to make changes easily and quickly without impacting the real world.
One last system design concept before we move on to other topics. One term which is used very frequently, and very inconsistently is the term "logical." We have logical data models, logical systems, logical designs, logical this and logical that. Logical has become one of those overused terms which have lost their meaning through inconsistent usage.
The term logical is used extensively to qualify certain aspects an components of the design process. In order to ensure that clarity is maintained and in keeping with the theme of this book which is that the definition of concepts is crucial to all analysis, and especially to data analysis, the following definition will be used for logical, along with the the general context within which the term will be used.
A logical system or logical design is one which is built upon consistency, and based upon a consistent line of reasoning. A systems which is based upon a framework where all of the elements have a reasoned relationship between them and all fit together based upon a common set of reasoning can be considered to be a logical system.
Logic is built upon rules which determine underlying fact and cause and which provides the basis for drawing reasoned conclusions. All logical models must be explainable and consistent. They must be based upon facts and reality, easily discernible, and easily provable. All cause should have effect, all effect should have cause. All components of the model must have a reason for existence.
Thus we can see that logical systems, integrated systems, even the concept of a system itself are all based upon the need and desire to develop a framework within which business activity can be performed in a harmonious, interrelated, interdependent manner, where all components are brought together according to a unified line of reasoning, according to a common set of rules, or because of some unified thread.
Data Analysis, Data Modeling and Classification
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.