TY - JOUR
T1 - A grain preservation translation algorithm
T2 - From ER diagram to multidimensional model
AU - Chen, Yen Ting
AU - Hsu, Ping Yu
N1 - Funding Information:
The authors acknowledge the financial support provided by the National Science Council of Taiwan, through Project No. NSC93-2416-H-008-010.
PY - 2007/9/15
Y1 - 2007/9/15
N2 - Many IT practitioners and researchers advocate that data models of data warehouses should incorporate the sources of their data in order to achieve maximum efficiency. As the source data are probably derived from system designed with ER diagrams, a great deal of research has been devoted to the design of methodologies for building multidimensional models based on source ER diagrams. However, to the best of our knowledge, no algorithm has been proposed that can systematically translate an entire ER diagram into a multidimensional model with hierarchical snowflake structures. In this paper, we propose an algorithm that achieves the above goal because it incorporates two features, namely, grain preservation and the minimal distance from each dimension table to the fact table. The grain preservation feature guarantees that the translated multidimensional model will maintain cohesive granularity among the entities. Meanwhile, the minimal distance feature guarantees that if an entity can be connected to the fact table in the multidimensional model by more than one path, the path with the smallest number of hops will always be chosen. The first feature is derived by translating ambiguous relationships between entities into weighting factors stored in bridge tables and enhancing fact tables with unique primary keys. The second feature results from including a revised shortest path algorithm in the translating algorithm, with the distance being calculated as the number of relationships required between entities. A prototype system based on the methodology is also developed, and snapshots of the screens used for the system's execution are presented.
AB - Many IT practitioners and researchers advocate that data models of data warehouses should incorporate the sources of their data in order to achieve maximum efficiency. As the source data are probably derived from system designed with ER diagrams, a great deal of research has been devoted to the design of methodologies for building multidimensional models based on source ER diagrams. However, to the best of our knowledge, no algorithm has been proposed that can systematically translate an entire ER diagram into a multidimensional model with hierarchical snowflake structures. In this paper, we propose an algorithm that achieves the above goal because it incorporates two features, namely, grain preservation and the minimal distance from each dimension table to the fact table. The grain preservation feature guarantees that the translated multidimensional model will maintain cohesive granularity among the entities. Meanwhile, the minimal distance feature guarantees that if an entity can be connected to the fact table in the multidimensional model by more than one path, the path with the smallest number of hops will always be chosen. The first feature is derived by translating ambiguous relationships between entities into weighting factors stored in bridge tables and enhancing fact tables with unique primary keys. The second feature results from including a revised shortest path algorithm in the translating algorithm, with the distance being calculated as the number of relationships required between entities. A prototype system based on the methodology is also developed, and snapshots of the screens used for the system's execution are presented.
KW - Data warehouse
KW - Entity relationship diagram
KW - Grain preservation
KW - Multidimensional models
KW - Star schema
UR - http://www.scopus.com/inward/record.url?scp=34250732358&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2007.03.017
DO - 10.1016/j.ins.2007.03.017
M3 - 期刊論文
AN - SCOPUS:34250732358
SN - 0020-0255
VL - 177
SP - 3679
EP - 3695
JO - Information Sciences
JF - Information Sciences
IS - 18
ER -