Search Site

 

Journal Entries

 

Stay Informed

Sign Up Today to stay informed about HINZ events and relevant health informatics news!

*

 

 
 

Supporting Partners for 2012

Major Sponsors


 

 


 

 


 

 


 

 

Supporting Partners






 


 


 


 


 


 



 


 


 

















 

 
 

International Events 2012

 

 

 

Use of Standards and Metadata in the Design of Adaptable Clinical Data Repositories

Friday, September 1st, 2006
Chris Peck

Department of Information Systems and Operations Management
The University of Auckland
Auckland
New Zealand


David Sundaram

Department of Information Systems and Operations Management
The University of Auckland
Auckland
New Zealand

Abstract
Data is a fundamental part of all computer systems, yet the alignment of data models with higher level business models has been poorly addressed. Our approach uses a high level conceptual model that is mirrored as core model in the data layer, metadata representations and an accommodative data model. Model management is used to maintain linkage between high level semantic models and the implementable data models. This approach complements current efforts towards the facilitation of data sharing, interoperability and rapid development including model drive architectures, domain specific language systems development and service oriented architectures. All of these approaches fail to address the evolution of the data layer, thus omitting a feature that is fundamentally important in an evolvable and adaptable system. Use of this framework is demonstrated with the HL7 reference information model.



Introduction

Computer information systems in a health care enterprise support a large number of administrative processes that assist with the clinical work that is the mainstay of health care. These, often independent, systems serve very specific critical functions such as ward management, admissions and discharge, billing, capitation and supplies ordering. With such a variety of independent systems within an organisation, the development of a cross-enterprise view of all its data is difficult to obtain. There are many business reasons for achieving a unified information view, including improved data quality, the ability to develop composite applications, and reduced developmental effort for new system processes.[1]

The development of such a unified view is being facilitated by a number of technical and technical initiatives and product innovations. Service oriented architectures, and the increasing use of standards such as SOAP (Simple Object Access Protocol) are all contributing factors. Information systems and business networks can support composite business processes by linking their monolithic systems together through a common service layer and an exchange infrastructure which allows these services to be accessed.[2] These approaches all operate on the assumption that the relationship between data in heterogeneous systems is a singular one.[3] But in reality the relationships between data from heterogeneous systems require mapping at the semantic level before data from these sources can be utilised by the business or application logic.[4] Semantic mapping addresses the issue of information impedance mismatch between varied concepts but does not address the data requirements that will naturally come with new concepts.[5]

While large business systems vendors such as SAP and Oracle offer complete cross-enterprise data solutions, these are not an option for many businesses for whom wholesale replacement of existing systems would be too disruptive. Therefore, many companies choose to integrate newer systems with legacy systems. Often, when this happens, data are duplicated and data quality can become an issue. A shared-data service allows the integration of disparate data sources across an enterprise and allows enterprises to migrate incrementally towards the goal of consolidated data.[6]

All of these approaches address the higher level conceptual models while ignoring the lower level persistence or data layer. This paper discusses a method for translating, integrating and supporting higher level business conceptual models with the persistent data layer in order to realise flexible and sustainable application infrastructures. A key requirement in achieving this is an adaptable data repository. We have focused our research on adaptable clinical data repositories. The clinical setting offers the advantages of richness of the data and complexity of information requirements.

In this paper, we look first at the functional requirements of clinical data repositories and then we explore how we could make them adaptable. We conclude with an illustration of these ideas applied to the HL7 Reference Information Model (RIM) as one example of a prototype for building an accommodative and adaptable clinical data repository.


Clinical Data Repositories
In a clinical system, persistent data may be stored in a database called a clinical data repository (CDR). CDRs contain administrative and clinical information from across a health care organisation or practice. Consolidating clinical data across a health care organisation allows its systems to share data which in turn enables the creation of an integrated view of an individual patient, a prerequisite for the creation of an electronic health record (EHR).[7] Hippisley-Cox et al[8] note that in the primary health care (PHC) setting, EHRs are more legible and more understandable than their paper alternatives. Paper-based records found in PHC settings are often illegible, unwieldy, or inaccessible, all of which can adversely affect patient care.[9] Studies indicate that EHRs compare favourably with paper-based records in terms of direct benefits for patient care, and have the secondary benefit of being immediately useful for patient registration, repeat prescribing, population health studies and other research.[8]

A functional CDR requires both an internal model which is able to represent health care information from within the enterprise and the ability to evolve as the conceptualisation of this information changes with time.[10-12] Even when thorough analysis is performed at the time of system design, new requirements often force changes to the system that were not or could not have been anticipated, forcing application and model revision. In addition to the business reasons that might drive model evolution, models must often be modified for technological reasons. For example, the limitations of a particular relational database management system might call for a denormalisation (inverse of normalisation) in order to avoid excessive, performance-limiting table joins during operation.[13] In situations where an entity may have many possible attributes but few actual attributes, modelling the complete set of attributes as specific entities or fields will result in a sparsely populated database which utilises excessive disk space (cost) and suffers from reduced query times (performance).[14]

Adaptable CDR
Our approach to the development of a CDR, as set out in Figure 1, addresses these challenges through:

1. A high level conceptual model that is mirrored as the core model of the data layer.
2. Semantics and conceptual model details which are defined through metadata representations.
3. A data layer core model that can accommodate new concepts directly or in optimised peripheral models.
4. Model management that maintains linkage between the conceptual model and optimised data layer models.

A conceptual model supported by metadata offers an approach to creating a model that can represent a large number of concepts with a relatively simple structure. Often in this approach, only the metadata need be modified when new concepts are introduced. The conceptual model is supported at the implementation level by its mirror, the core schema in the data layer. Since operational and technological constraints exist at the data layer, a model management layer is put in place to allow this model to undergo sustainable and reversible evolution. This paper discusses these concepts and presents an application scenario that utilises the HL7 RIM as the conceptual model.



The Conceptual Model
Domain specific languages, model driven architectures and software factories (as defined by Greenfield[15]) are all attempts to encapsulate business concepts and knowledge into computable artefacts and toolsets.[15] The ability of developers to work directly with these business concepts increases both software production efficiency and alignment with the business model. All of these approaches require the development of a conceptual model of the domain. In order to be sufficiently extensible, the conceptual model should represent business concepts at a sufficient level of abstraction. At the conceptual level, model abstraction allows for greater reuse, but requires an expert to create the model, something we have not taken into consideration in this preliminary work.[16]


Metadata Defined Semantics
High level abstraction is coupled with very explicit vocabulary and ontologies in order to effectively represent distinct concepts. In the repository, external tables serve as the data dictionaries which define the semantics. This can be practically utilised in the health care domain since there already exist a large number of standardised vocabularies such as SNOMED and LOINC. Application behaviour can be guided and data integrity preserved by utilising the large number of freely available  ontologies that reflect the underlying semantics in these controlled vocabularies.[17,18]

The data layer
Database schema design choices are made on the basis of whether the system is to be used for transaction processing (OLTP) or for reporting, analysis, or decision support (OLAP).[19] Schemas for the former tend to be highly normalised, while those for the latter tend to be highly aggregated.[20] Post-relational systems like the Intersystems Cache database allow for both standard SQL and object access to data, but are no more flexible at the persistent data level than a standard database.[21]

The process of normalisation provides a high degree of control over the integrity of the data, and for this reason is preferred for transactional systems. However, when new concepts must be introduced into a system, highly normalised schemas can be difficult to modify to support the desired concept. In health care, rapidly changing processes and terminology make inflexible designs undesirable. To support flexibility, a highly abstract schema should be used use as the core schema model for the repository.[22,23] Base semantics are conveyed through these models. More detailed model semantics are defined through relational linkages with metadata dictionaries.

Entity–attribute–value (EAV) tables provide a way to effectively represent in a relational system an entity that has many potential attributes but only a few actual attributes.[14] Queries to EAV tables can be slower than queries to the same information in a fully normalised design. It has been shown than optimisation of queries can overcome some of this limitation.[24] We have extended the EAV basic design by adding a recursive element to the EAV table itself (refer Figure 2). This design allows the use of complex and composite datatypes as attributes.


Model Management
Being a mirror of the conceptual model, the core schema model can presumably represent on a data level all concepts in the conceptual model, although not always in a form that has been optimised for data access. If the new concept is simply inserted into the model, the model schema is said to have accommodated the concept. However, if a new schema is created to support the concept in an optimised manner, then the core schema has evolved to provide for the new concept. Both the core schema and the evolved schema can be directly mapped back to the conceptual model. However, it is important that the schemas be mappable between themselves, since application operations will occur on this level.

Bernstein[25] has outlined the framework that the model management layer uses to record changes to models. This is done through a series of defined operators.

Match: takes two models as input and returns a mapping between them.

Compose: takes a mapping between models A and B and a mapping between models B and C, and returns a mapping between A and C.

Diff: takes a model A and a mapping between A and some model B, and returns the sub-model of A that does not participate in the mapping.

ModelGen: takes a model A, and returns a new model B based on A (typically in a different data model than A’s) and a mapping between A and B.

Merge: takes two models A and B and a mapping between them, and returns the union C of A and B along with mappings between C and A, and C and B.

(Bernstein 2003)

Changes made in the core schema are tracked and maintained within this model management layer. This allows reconstruction of the standards-based information, despite a different internal structure. The use of metadata to define semantics allows the conceptual model to accommodate new concepts without structurally changing its form. This is important since the conceptual model is mirrored at the data layer. However, it is unrealistic to expect that a conceptual model would never need to evolve. In the case of the evolution of either the conceptual model or the core data model (which are mirrors of each other), tracked information within the model management layer would be used to support the change to these, as well as the optimised models within the data layer if the change were applicable to those.

We have presented a framework of components that would allow a system or network to adapt to change. Used together, the four components of this architecture enable the evolution of the data persistence layer as the business requirements change and in turn as the conceptual model evolves.


Application Scenario: HL7 Reference Information Model
To illustrate our ideas, it is useful to outline how these concepts might play out in a well-known health care conceptual model, namely the HL7 RIM. The HL7 RIM is a class model produced and supported by HL7.[26] The RIM is used to derive the HL7 Version 3 message patterns which are being utilised in several large-scale health care initiatives. All objects within the HL7 space are defined by this model. The RIM is composed of the following six “backbone classes” which represent high level abstractions of concepts found within health care:

1. Act: which represents the actions that are executed and must be documented as health care is managed and provided.

2. Participation: which expresses the context for an act in terms of who performed it, for whom it was done, where it was done, etc.

3. Entity: which represents the physical things and beings that are of interest to, and take part in, health care.

4. Role: which establishes the roles that entities play as they participate in healthcare acts.

5. ActRelationship: which represents the binding of one act to another, such as the relationship between an order for an observation and the observation event as it occurs.

6. RoleLink: which represents relationships between individual roles.

(HL7 Version 3 Ballot)[27]

The abstract nature of these classes requires use of both the classes’ attributes and the use of external vocabularies in order to differentiate instances of the same class, eg, distinguishing one Act class from another.

Figure 2: Base V3 HL7 RIM entity relationship model for MSSQL server 2000

While the HL7 RIM is capable of representing a large number of health care constructs it does not readily translate into an implementable relational data model. While it is theoretically possible to use the RIM core model as a CDR (see Figure 3), doing so would likely result in excessive table joins and undesirable performance for even the most simple queries. Put another way, the core model can accommodate new concepts, but it is likely to be a better option to evolve the core model into an optimised model that can effectively serve the operational requirements of the concept. If the requirement were for a patient registry, then this optimised model might resemble something like the model in Figure 3. This model focuses on those core classes to that might be utilised to create a patient data representation. By avoiding the use of the whole model for a particular application, additional or required efficiencies may be realised. If the goal is really semantic interoperability and evolvable systems, the implementations need to both support and be able to evolve with the conceptual models.

 


Conclusion
We have presented an outline for managing the data layer in a flexible and semantically rich system. Data is a fundamental part of all systems, yet methods to manage extensible data models have been poorly addressed. This approach complements current efforts towards the facilitation of data sharing, interoperability and rapid development including model drive architectures, domain specific language systems development and service oriented architectures. All of these approaches fail to address the evolution of the data layer, thus omitting a feature that is fundamentally important in an evolvable and adaptable system. By unifying the business model and the data model we provide a means by which changes to the business model can be more easily realised at the implementation level. Our method also encourages the adoption of potentially useful conceptual models such as the HL7 RIM, the use of which has tended to be avoided because of implementation difficulties.

References

  1. Gendron MS, D’Onofrio MJ. Data quality in the healthcare industry. Data Quality J 2001; 7 (1).
  2. SAP AG. Components & tools of SAP NetWeaver: SAP composite application framework. 2006. http://www.sap.com/platform/netweaver/components/caf/index.epx. Accessed 7 June, 2006.
  3. Oracle Corp. Semantic data integration for the enterprise white paper. March 2006. http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_grid_wp_0603.pdf  Accessed 10 June 2006.
  4. Pantero Corp. Scaling SOA through EAI Enhancement and Use of Model based Standards http://www.pantero.com/Downloads/Details.asp?DownloadID=15&DownloadTypeID=2. Accessed June 2006.
  5. Schadow G, Russler DC, McDonald CJ. Conceptual alignment of electronic health record data with guideline and workflow knowledge. Int J Med Inform 2001; 2001(64):259–74.
  6. Niranjan V, Sriram A, Krishnendu K. Shared data services: an architectural approach. IEEE International Conference on Web Services; 2005: 690.
  7. Sittig DF, Pappas J, Rubalcaba P. Building and using a clinical data repository. 1999. http://www.informatics-review.com/thoughts/cdr.html. Accessed June 2006.
  8. Hippisley-Cox J, Pringle M, Cater R, Wynn A, Hammersley V, Coupland C et al. The electronic patient record in primary care– regression or progression? A cross sectional study. Br Med J 2003 (28 June); 326: 1439-1443. 
  9. Bates DW, Ebell M, Gotlieb E, Zapp J, Mullins HC. A proposal for electronic medical records in U.S. primary care. J Am Med Inform Assoc 2003 Jan-Feb; 10(1):1–10.
  10. Pittas N, Jones AC, Gray WA. Evolution support in large-scale interoperable systems: a metadata driven approach. Proceedings of the 12th Australasian Database Conference 2001: 161–8.
  11. Nah FF-H, Lau JL-S, Kuang J. Critical success factors for ERP implementation. Bus Process Manage J 2001; 7(3):285 - 296.
  12. Kajal TC, Jing J, Elke AR. SERF: schema evolution through an extensible, re-usable and flexible framework. Proceedings of the 7th international conference on information and knowledge management. Bethesda, Maryland. United States: ACM Press; 1998.
  13. Pascal F. Practical issues in database management: a reference for the thinking practitioner. Boston, Mass: Addison-Wesley; 2000.
  14. Nadkarni PM, Brandt C. Data extraction and ad hoc query of an entity–attribute – value database. J Am Med Inform Assoc 1998; 5(6):511-527.
  15. Greenfield J, Short K. Software factories: assembling applications with patterns, models, frameworks, and tools. Indianapolis, IN: Wiley Publishing Inc; 2004.
  16. Verelst J. The influence of the level of abstraction on the evolvability of conceptual models of information systems. Empirical Software Engineering 2005; 10(4):467–94.
  17. Heard S, Beale T, Freriks G, Mori AR, Pishev O. Templates and archetypes: how do we know what we are talking about? Version 1.2: OpenEHR; February 2003.
  18. http://www.openehr.org/downloads/archetypes/templates_and_archetypes.pdf.  Accessed June 2006.
  19. NLM. UMLS Semantic Network – Fact Sheet. 2003. http://www.nlm.nih.gov/pubs/factsheets/umlssemn.html. Accessed June 2006.
  20. Codd EF, Salley CT, Codd SB. Providing OLAP (online analytical processing) to user-analysts: An IT mandate (Technical Report); 1993. http://dev.hyperion.com/resource_library/white_papers/providing_olap_to_user_analysts.pdf#search=%22Providing%20OLAP%20(online%20analytical%20processing)%20to%20user-analysts%3A%20An%20IT%20mandate%22. Accessed June 2006.
  21. Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Record 1998; 26(1):65–74.
  22. Intersystems. CACHÉ Multidimensional Database. 2006. http://www.intersystems.com/. Accessed June 2006.
  23. Beale T, Heard S, Kalra D, Lloyd D. The open EHR Reference Model, Revision: 4.3.2. 2003. http://www.openehr.org. Accessed June 2006.
  24. Beeler G, Case J, Curry J, Hueber A, Mckenzie L, Schadow G et al. HL7 reference information model version:02-10 (6/30/2005). http://www.hl7.org/library/data-model/RIM/modelpage_mem.htm. Accessed June 2006.
  25. Chen RS, Nadkarni P, Marenco L, Levin F, Erdos J, PhD et al. Exploring performance issues for a clinical database organized using an entity-attribute-value representation. J Am Med Inform Assoc 2000; 7(5): 475-487.
  26. Bernstein PA. Applying model management to classical meta data problems. Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2003.
  27. HL7.a. HL7 Home Page, http://www.hl7.org. Accessed May 2005.
  28. HL7.b. Version 3 Ballot. Last Published Version: 07/31/2005. http://www.hl7.org/. Accessed 21 July 2005.