Modeling Relationships

Exclusive Relationships Usage

  • Use arc construct to reflect that either relationship is valid but not both
  • Always draw an arc to enclose a sungle entity
    • can be either parent or child
  • Each relationship within an arc should have the same degree and optionaility or else it would be an illogical model
  • Relationships can be only in one arc or elsethe model would be illogical
  • No limits to the number of relationships contained within an arc
    • but more than three relationships is unusual
  • Relationship names within an arc are often the same but can be different
  • Arcs that span relationships that do not belong in the arc
    • indicate included relationships with a circle on the intersection of the included relationships

Reading a Diagram With Arcs

  • Validate a a relationship in an arc by reading it both ways
  • Optionality must match all relationships in an arc
  • States of existance must make sense
    • cannot have mandatory existance in one relationship while maintaining optional existance in the other while both are diagrammed as exclusive relations under the same arc
  • Cannot have unrelated parallel entity relationships having no exclusive meaning to each other while diagrammed as exclusive relations under the same arc
  • Cannot have overlapping exclusivity contradict each other

Exclusivity Comparison

  • Exclusive entities and relationshsips can both be in the same model
    • sometimes supertypes and/or arcs can be equally valid

Historical Data

  • Historical data is stored for business query needs
  • Three different reasons to maintain historical data
    • legal requirements
    • to identify when data changed
    • to identify when relationships changed

Modeling Historical Data

  • Holding historical data implies a many to many relationship
  • To resolve this M:M relationship use an intersect entity to hold the historical data
    • Use an appropriate attribute in the intersect entity as part of a unique identifier
    • Main.DennisCoonich - 29 Aug 2005

Characteristics of a Data Warehouse

  • Exist to store and manage large volumes of historical data
  • Cannot buy off the shelf - built from scratch and integrated with various products
  • Very large central database that uses a standard format
    • data is gathered from many systems
    • accessed by end users
  • Decision support users with budget control need to access and analyse data to look for trends
  • Data is available in the originating system but is not in a standard format and meant only for the originating system
  • Centralizes, standardizes, and simplifies data and access
    • centralized single source of historical data integrated from all areas of the business and can be accesses by subject area
  • Bill Inmon and Ted Codd conceived of Data Warehousing in the 1970s
    • Bill Inmon: "A Data Warehouse is a subject-oriented, integrated, time-variant, non-volitilecollection of datain support of management decisions."
  • Many different variations of the definition of a Data Warehouse exists
  • Data Warehousing extracts select data from desparate systems and oragnizes them by subject area
  • Data is rearely if ever updated or deleted from a Data Warehouse
    • Historical data is added on a regular basis - AKA the refresh cycle - data volumes can grow very quickly
  • When considering data to be included in a Data Warehouse, you should first consider how it will be extracted from the source system
    • then consider how the extracted data will be transformed - AKA scrubbed/cleansing
    • transformation usually takes place on the operational server at the staging area or it may occur at the Data Warehouse server
    • finally consider how it will be transported from the operational machine to the Data Warehouse machine
  • After deterining the requirements for Data Warehouse you create the model
    • usually the database is relational but the model is different
    • most common scheme is the star schema
  • Star schema characteristics
    • contains a central fact table
    • surrounded by dimension tables that contain data by which the business is measured such as Supplieers, Contacts, and Customers
    • a time dimension is always present to facilitate analysis
  • Data Warehouse also holds summary data like totals and averages
    • this is different that notmal rules of Relational Data Modeling

Fan Traps

  • Connection traps happen when a model is misinterpreted
    • term trap is used because the misinterpretation occurs because something appears to be possible to infer something from the model's relationships
  • A Fan Trap is a complex relationship between three or more entitiesthat form a ring
    • sometimes called triangulation or the ring of death
    • are not obvious
    • must understand the model thoroughly to identify trap
    • certain relationships and/or data become impossible to ascertain when modeled in a fan trap

Resolving Fan Traps

  • Think of the relationship as a three-way relatinoship instead of a two-way relationship
    • the intersect entity must be an intersect of three entities instead of two entities

Chasm Traps

  • Connection traps happen when a model is misinterpreted
    • term trap is used because the misinterpretation occurs because something appears to be possible to infer something from the model's relationships
  • Is formed when anentity is relied upon to perform an inappropriate connecting task
    • usually because a relatinoship is optional and temporary
  • When an entity with two parents exists, but that entity is not the resolution of a Many to Many Relationship, there is a potential for an Chasm Trap. The cild entity is not an intersect entity and cannot be used as one

Resolving a Chasm Trap

  • Identify the missing M:M Relationship
    • create an appropriate intersect entity
    • create a unique UID the the intersect entity
  • Can be difficult to identify
    • evey child entity of two parents may be a Chasm Trap
    • if unsure, create intersect entity and then analyze its attributes and UID
    • compare with original child entity, if different a Chasm Trap exists, otherwise a Chasm Trap probably does not exist

Convergence

  • Is the process of simplifying a model without losing the depth of understanding required for the business purpose of the model. Opposite of divergence
  • Initial creation of an ERM models as much as possible while convergence simplifies and makes the model generic for wider application
  • Shed irrelevant details but do not compromise the business rules
    • look for overlapping or similar entities
  • One to One Relationships of similar entities may be indicative of one entity over time, these can be converged
  • Convergence often leads to subtyping
  • Very easy to oversimplify

Divergence

  • Process of creating further depth to model without adding unnecessary detail and complexity. The opposite of convergence
  • Process of identifying if an entity or group of entities need further breakdown
  • Common attributes across multiply entities can be coalesced into a new or seperate entity
    • coalese items only of value and tha make sense
  • Extreme divergence makes a model unusable therefore balance is needed between detail and usability

Definition of Transfereable Relationships

  • Moving the relationship from one instance of the parent entity to another instance of the same parent entity
  • The technique of ERM makes relationships transferrable by default, otherwise you must take deliberate action to specify otherwise

Modeling Non-Transferable Relationships

  • A diamond symbolused on an ERM is used to denote non-transferability
    • placed on the child end of the relationship to indicate that the child instance cannot be transferred to another instance of the same parent entity

Relationship Transferability

  • Before making a relationship non-transerable, verify with stake holders that your understanding is correct
  • Unique identifiers that change value can cause the record and it's related records that depend on it to need updates, deletions, or re-creations
    • changing UID values is in-effect transferability even if un-intended
    • verify how the stake holders want these types of relationships to exist. sometimes this mean making this relationship non-transferable
    • to allow tranferability then you may need to select a different and more appropriate UID
    • Main.DennisCoonich - 30 Aug 2005

Principles of Normalization

Introduction to Normalization

  • A pure relational-theory technique
    • its purpose is to examine individual items of data and place them in data groups
    • each group must have a primary identifier and a set of dependant attribute values
    • gives a thorough understanding of the data
    • helps identify the best data group to put the data into
    • helps identify redundant or duplicate data for removal
  • Can build an ERM based on Normalization or can apply Normalization to an existing ERM
  • Normalization is genrally a self auditing technique.
    • later rules help identify incorrect earlier rules

Terminology

  • Following definitions based on common use rather than academic definitions of pure relational theory
  • Normal Form = is a stage through which data passes towards the creation of a set of relational database tables. Data passes through many Normal Forms.
  • Data Group = is a convenient generic name
    • within each Normal Form stage, data is divided into data groups according to the rules being applied.
    • the rules of Normilzation can be applied to data groups or to entities and tables
  • Repeating Data Group = is a group of data in which each item may have more than one value for a given value of the Unique Identifier (UID)
  • Dependancy = when a piece of data is dependanrt on another piece of data
    • in a fully Normalized data group, each data item must be dependant on the full UID
  • Determinant = item of data on which another item of data is dependant
    • UID should always be determinants
    • in a fully Normalized data group, only UIDs should be determinants, not data items

Rules of Normalization

  • Normalzation conssists of mathematical rules you apply to groups of data
    • the first step is create a data group from raw data and select a UID
      • this is known as Zero Normal Form (0NF)
    • the second step, ensure the data is in FNF
      • remove the repeating groups.
      • implies creating a new data group containing the repeated items and select a new UID for it.
    • the third step, ensure data is in SNF
      • remove any data that is dependant on only part of the UID of the data group
      • create a new group with the removed data using the determinant as the UID
    • the fourth step, ensure data is in TNF
      • remove inter-data dependencies
      • create a new group with the removed data using the determinant as the UID
    • test for Boyce Codd Normal Form (BCNF)
      • remove inter-UID dependencies
      • AKA Three and a Half Normal Form, because it is an expansion of the TNF rule
      • Data is completely in TNF only when you have applied this rule as well
    • after applying the four Normalzation rules, must test the data to ensure it is in TNF
      • test that all of the data items depend on the UID, all of the UID, and nothing except the UID
    • after testing the groups, optimize them by combining any data groups that have identical UIDs
      • optimization may create inter-data dependencies, therefore you must reapply the tests to ensure data is fully in TNF
    • Main.DennisCoonich - 02 Sep 2005

Create a UNF Data Group

  • Before normalizing data, you must find the raw data. This include individual items of data that have not been grouped.
    • Group the raw data together to create an UnNormalized Form data group by making a list of the data items from a single source.
    • Select a UID for this group to make a 0NF data group

Creating UnNormalized Data Groups

  • Collect data from all known sources
  • Perform Normalization from a single source on that set and then repeat on next source.
  • Do not attempt to Normalize an entire system at once.
  • List every piece of data from the source document.
    • Record the data items in lower case to differentiate grops names recorded in uppercase.
  • Do not invent names for data items, ask stakeholders for the appropriate name.
  • Individual examination of each individual data item is the strength of Normalization.
  • Names should be self explanatory and unambiguous.
  • Identify the type of data and describe the type clearly.
  • Select UID after naming all the data items.
    • four factors to consider for the UID selection.
    • 1st, UID must be unique
    • 2nd, number of data items needed to construct UID. Use as few as possible.
    • 3rd, select a UID that is non-textual whenever possible.
      • In otherwords use company number instead of company name.
      • Do not invent data items, use only items in source document.
    • 4th, if there should be more than one data itemthat is valid as a UID, then select the best candidate.
    • optionally, select name for data group.
    • select name for UID.

First Normal Form Rule

  • Create a First Normal Form from a Zero Normal Form
  • Move the data in 0NF to FNF by applying the FNF Rule, remove repeating groups
  • The 0NF rule implies that you shoulld examine each data item and remove any data that can have morethat one value for the Unique Identifier (UID)
  • Implementing this FNF rule
    • create a new data group with the items that repeat
    • select a UId for the new data group
    • create a relationship with the original group in which the new group is a child of the first group

Create a FNF Data Group

  • Create FNF data groups from 0NF data group The relationship with the parentnd properly select a UID for any new FNF data groups
  • FNF rule requires that you extract from data group any data item that can have more than one value for a single value of the UID. This data is known as repeating data
  • Remove repeating items and place them in a data group of their own
    • must preserve association between the two groups
    • create a relationship between them in which the new repeating group is a child of the original group
  • Identify a new UID for the new data group
    • the relationship with the parent data group is also always used within the UID of the repeating data group
    • give the repeating data group a name, written in uppercase, to use as a reference (not mandatory but useful - select a name the reflects the UID)
  • Nested repeating data group is a data group that repeats within another group of repeated data
    • always re-analyse repeating groups to discover any nested groups
    • not uncommon, and may be found in reports
    • nested groups of repeating data represents a hierarchy
    • apply the FNF rule until repeating groups are no longer found
  • When data items within a data group cannot have more than one value for a single value of the UID and each group has an unambiguous UID, the data is in First Normal Form (FNF)

Second Normal Form Rule

  • You can move data that is in FNF into SNF by applying the SNF rule
  • This rule states: Remove part-Unique identifier (UID) Dependencies
  • Implies that you examine each data item and remove any data that depends on only part of the UID, not the entire UID
  • You apply the SNF rule by:
    • creating a new data group with the items that are dependent on a part of the UID
    • use the determinant part of the UID from the original data group as the UID for the new data group
    • create a relationship with the original group in which the original group is a child of the new group

Create a SNF Data Group

  • Create a SNF data group from a FNF data group
  • The SNF rule requires:
    • extract from data groups any data item that is dependent on a component of the UID and not the whole UID. These data items form part-UID dependencies
  • The first step when applying the SNF rule is to examine each data item and check if it is possible to find a single value for that item by using only a part of the UID
  • Remove the dependent data items with their determinants and place them in a new group
  • Must preserve the association between the two data groups
  • Create a relationship between them in which the new group is the parent of the original group
  • Make the determinant from the original data group the UId for the new data group
  • Designate the relationship with the new group as part of the UID of the original group
  • Give the new data group a name written in uppercase, to use as a referene (optional but useful)* When no more data items depend on part of the UID, the data is in SNF
  • Data groups with on one component are already in SNF

Third Normal Form Rule

  • Create TNF from SNF
  • Move data from SNF into TNF by applying the TNF rule, remote Inter-Data-Dependencies
  • Examine each data item and remove any data that depends on another item of data and not on the Unique Identifier
  • Remove the dependent data items and also the determinant
  • Create a new data group with the items
  • Make the determinant the UID of the new data group
  • Create a relationship with the original group where the original group is a child of the new group

Create TNF Data Groups

  • Extract data items that depend on another data item and not the UID. These data items form inter-data dependencies
  • Examine data items to determine if it is possible to find a single value for that item buy using a different data item as the determinant instead of the UID
  • Remove inter-data dependencies into a group of their own
  • Preserve the relationship between the two groups of data in which the new group is the parent of the original group
  • Give the new data group a UID using the determinant from the original group as the UID for the new group
  • Name the new group in uppercase. Select a name that reflects the UID
  • When there is no inter-data dependencies then the group is in TNF

BCNF (Boyce Codd Normal Form)

  • Data is completely in TNF only when the BCNF rule Remove inter-UID-Dependencies is applied
  • Extract any component that is dependent on another UID component
  • If the dependant UID component is a single component, you can remove the component from the UID by removing the UID bar from the relationship line
  • If the dependant UID component is a data item, remove the #
  • It is unnecessary to test for BCNF if a data group has a UID with only one component
  • If the component you removed from the UID was a relationship, you should re-examine the UID for the data group.
  • If the new UID is identical to the UID of the parent data group, merge the two groups

Create BCNF Data Groups

  • Extract from the UID of the data groups any component that is dependent on another UID component. These extracted UID components form inter-UID dependencies
  • Examine each UID component and determine if it is possible to find a single value for that component by using a single value for a different component

Test for Third Normal Form

  • After applying the Normalization rules for each of the Normal Forms, apply some simple test to ensure data is fully normalized
    • examine each data item and verify if it depends solely upon the UID.
      • if ant data item does not depend completely on the UID, then that data group is not fully in TNF. Investigate and reapply the relevant Normal Form rule
    • verify that each data item has only one possible value for a given value of UID
      • if ant data item does not depend completely on the UID, then that data group is not fully in TNF. Investigate and reapply the relevant Normal Form rule
    • check if the entire UID is necessary for finding a data item and whether or not each data item can be found using a single value for the UID
    • the Normalization tests can be summarized as follows: Does each data item depend on the UID, the whole UID and nothing but the UID?

Optimization

  • The Normalization techniques is performed on the individual source documments and this often produces identical UIDs from different documents
  • Merges data when UID is identical across data groups
  • Rarely necessary at the document level but usually as a result of Boyce Codd Normal Form violations
  • Transitive dependencies exist when a data item depends on the UID within the data group, but the UID itself depends on a UID in another group
  • Merge data groups the exhibit Transitive dependencies

Optimization Problems

  • Three type of problems exist
    • Similar data names may hide dissimilarities
    • Inter-data dependencies may be introduced
    • Inappropriate data naming may result in the loss of data distinction
  • When merging data groups, revisit the meaning of similarly named data items from each group and clarify/verify the name meanings
  • May introduce inter-data dependencies and reverting the data group state to a Normal Form
  • A data name that does not fully represent the source document may mean that data groups are incorrectly merged.
  • When data groups are incorrectly merged, return them to their previous groups and reconsider the names of the data item(s), change names as needed to clearify

Retest for Third Normal Form

  • Retest groups after merging
  • Ensures that the optimzations are reasonable
  • Merging data groups sometimes introduces an inter-data dependency
  • The Normalization tests can be summarized as follows: Does each data item depend on the UID, the whole UID and nothing but the UID?
Version 1.15 last modified by Administrator on 20/09/2005 at 20:23

Comments 0

No comments for this document

Attachments 0

No attachments for this document
 
KnowledgeBase
Books
SystemWerks
OracleDeveloper
Notes
Liferay
CheatSheets
Apache
MyNotes
MultiMedia
Minix
Integrators
OpenSource
SunAppServer
Languages
HowTo
VCS
SwiftShopper
SugarCRM
Skype
IBM
PocketPC
SSH
Speech
Windows
Software Development
BusinessModels
Oracle
MySQL
Databases
Support
NACHA
Networking
Amiga
Virtualization
WebsiteDesign
Photography
Robotics
Cryptography
LAMP
LinuxBIOS
Management
Security
Bizarre
Apple
OpenOffice
Audio
USBKeys
OnlineStorage
Games
Aeronautics
Military
Employment
Emulation
Freelancing
Video
Electronics

Creator: Administrator on 2005/09/15 23:33
This wiki is licensed under a Creative Commons license
1.1-milestone-3.3909