The Standards Review Council (SRC) recently reviewed the SDTM conformance rules ("Rules") produced by the SDTMV. After having painstakingly combed through the SDTM v1.4 and SDTMIG v3.2, the team identified 400+ rule candidates. At the time of this blog post, the SRC is working with the sub-team to address some reviewer comments before making the package available for Public Review. As you can preview here, the construct is not very different from those published by the FDA SDTM Validation Rules and OpenCDISC Community: Rules have identifier, context, rule description in some pre-specified lexicons, condition, and citation of the rule's source.
As a Metadata Curator, I need to ask myself what the Rules mean to SHARE, as metadata. The text and description are, by definition, not metadata. Extra steps are needed to tease out the metadata. I thought to first illustrate a typical rule construct, or a model, shown here:
Furthermore, I formulated these objectives to help me devise solutions (my philosophy to innovate: first understand the what's before bother with the how's):
Additionally, I self-imposed some scoping limitations, i.e., a list of "won't do's" to keep implementation simple so this can be completed within a reasonable amount of time:
Having done some research along with inputs from volunteers and peers, two choices were available. They are both open standards and fit my objectives:
At first, I found HL7 GELLO fascinating, supporting a huge range of medical and healthcare data. After all, it is designed to be a clinical decision support system. That said, having required to understand HL7 RIM and specialized toolset, it will be very difficult to find a sustainable workforce to develop and maintain using the GELLO framework.
A little bit more research revealed GELLO is in fact created based on OMG OCL. Here are a few characteristics that resonate with me and my Objectives:
This diagram nicely depicts the information architecture we use and how the CDISC product family stack up in terms of overall model framework.
Those said and illustrated, OMG OCL represents a no-brainer choice to me. UML, hence OCL, is the next logic step to further with (and, complete) the architectural blueprint.
I have only recently begun studying the OCL specifications to solidify my thinking. I hope the little work I attempted helps demonstrate this proposal. Below is a subset of the SDTM Findings class drawn using Enterprise Architect:
I added a couple of OCL to --TESTCD:
Their OCL expressions are as follows:
Imagine we will be able to run test data through the whole series of OCL as an exercise to validate the correctness of the constraints. This will enable us to run example data to test their validity prior to including them in Implementation Guides or User Guides. As a matter of fact, they are not a far-fetched ideas. This Youtube video posted by a third party modeling tool, called MagicDraw, adequately demonstrates the power of test automation using OCL functionality. At 6:00, the video shows how easy it is to validate an OCL using some XML data: prepare an XML file guaranteed to trigger a constraint violation, run it against the rules in a compiled Java code and the auto-generated schema file. Pretty nifty.
The vision of this proposal:
In conclusion, SHARE influences a certain discipline and conduct toward the standards development process. Engineering SDTM with an UML model and refitting validation rules using OCL are not only logical, but essential to lead the industry with technical innovation. Furthermore, this will address a lot of model and implementation ambiguities currently exist. Lastly, I'd like to make a call for volunteers to further implement this proposal. Perhaps, a proof of concept project to create a testbed to apply model constraints and rules metadata toward submission data validation and other uses.