Importance of Data Modeling in Data Architecture
Here, let’s see why data modeling is important in the implementation of data architecture for an enterprise application, be it a CRM application or a Trading application. Since, the data is an important asset in an enterprise, it is very important to understand the data usage and access across applications within the enterprise. To achieve this, it is mandatory to maintain data models throughout the life span of applications by following appropriate Data Model Development Methodology. However, the data model development cannot happen in isolation without looking at the other data models already in-place within the enterprise. The fragmented and siloed way of data model development can lead to redundant, inconsistent and overlapping models. Hence, it is very important to identify and catalog all data models across various projects and see any of them can be reused for the current requirement.
Next comes the data modeling methodology, we may adopt “top-down” or “bottom-up” or “re-engineering”. Let’s see various enterprise scenarios to apply each methodology.
The top-down modeling methodology is recommended while building data model for the entire enterprise as per the general information requirements, without considering business requirements. This methodology ensures all the data elements are captured without any gaps and reuse the part of model as per data requirements for an application.
The bottom-up modeling methodology is recommended while building data model for a new application module by as per the defined business/functional requirements.
Finally, the re-engineering methodology works in a scenario where an existing application is to be enhanced as per new business/functional requirements.
Next comes the data model development life cycle to be followed. First, it starts with building a Conceptual Data Model (CDM), followed by a Logical Data Model (LDM) and finally ends with the implementation of Physical Data Model (PDM). Of course, the entire process happens in an iterative manner.
The Conceptual Data Model (CDM) is developed as a first step in case of a new development project that represents only business entities at high level without specifying attributes. The CDM is mainly intended to explain the model to non-technical senior management people.
The Logical Data Model (LDM) is developed as a next step that identifies Entities, Attributes along with the keys (PK, FK,) and its relationships (1:1, 1: M, M: M). The entities and attributes will have business friendly names and M: M relationships are not resolved in LDM. The intended audience for this model is Business Analysts and Application Users to make them understand the data model and get the sign-off.
Finally, the Physical Data Model (PDM) is developed containing database specific implementation of database objects (tables, indexes, constraints). In PDM, the entity and attribute names are shortened by applying glossary of abbreviations and naming conventions and the M: M relationships are resolved. The intended audience for this model is application developers and database administrators.
Next comes the modeling tools used in practice. The most popular data modeling tools being used by IT organizations are: CA’s Erwin Data Modeler, Embarcadero Technologies’ ER/Studio, Sybase’s Power Designer Studio and Quest Software’s Toad Data Modeler.
So far, we have seen how the data model development can happen under various scenarios. However, sometimes we encounter with a situation where we need to review an existing data model. So, it is very important to have a review checklist providing guidelines for critiquing a data model. The sample review checklist could be as follows.
Data modeling notation used. Btw, it is advisable to use Information Engineering (IE) notation out of available notations because of its simplicity
Check whether naming conversions and glossary are defined for the attributes used in PDM
Resolution of M:M relationships in PDM using associative entities
Usage of Generalization Hierarchies (super-types & sub-types) to reduce the usage of redundant attributes
Before I close, let me discuss a few best practices to be followed while developing a data model.
Division of entire model into various subject areas that improves manageability of model
Provide enough documentation for every entity, attribute and the rationale behind the relationship created that makes the model more meaningful to the intended audience.
Make sure to have alternate keys (natural keys) also defined while defining surrogate keys
Leverage Generic Structures as far as possible: use abstraction to increase the types of information a design can accommodate using generic concepts. For example, an entity by name “Party” can be defined that accommodates an Employee or a Customer or an Organization providing service etc.
Avoid circular and redundant relationships
Avoid empty entities (entities with no attributes) which are generally created as placeholders for next phase.
Appreciate your comments.