To generate evidence, we need to understand the types of data that are important and available within the patient’s “real world”. The generation of useful disease specific data requires an understanding of the relevant information surrounding the patient and their illness to gain a well-rounded view of the cause of disease, its characteristics and treatment options. But in order to fully harness the potential of your evidence generation platform (which may often be in the form of a patient/clinical registry), you must make sure you are collecting the relevant data to answer the questions of interest. Collecting as much of the full set of necessary data will enable you to provide the answers for the most pressing questions in your disease-area of interest, while failing to collect the right information will leave you with a data set reminiscent of Swiss cheese—full of holes from not asking the right questions or not having the right data to answer important questions. This is where a data dictionary becomes vital to the eventual success of your evidence generation goals.
What Actually Is a Data Dictionary?
Data dictionaries define the key data variables that are to be collected within an evidence generation platform, along with these variables’ “meta-data”. The meta-data for each key variable provides information about what constitutes an acceptable value that can be entered into the variable field while collecting real-world data. For example, if the results of lab tests are being collected, the meta-data for those lab tests would indicate that lab values must be numeric values and specify a valid range for each type of test. The implementation of meta-data in an evidence generated platform is crucial for data validation at the time of data entry. When a value is entered that does not satisfy meta-data requirements (e.g., entering a lab value that is out of range or trying to enter text instead of numeric values) then this can be detected and rejected from entry into the platform at the time of data entry. This means that within each collected patient journey, there is confidence that the right data is being characterized and that that data is clinically valid.
To produce a high-value data dictionary, a rigorous editorial process should be applied to gain a comprehensive understanding of relevant factors from the disease and patient perspectives and to identify the relevant and necessary clinical and patient endpoints. Data dictionaries are crucial for registries with global reach, as it is vital to ensure that the data and terminology used can be harmonized and are compatible with regulatory agencies, healthcare providers, researchers, and pharmaceutical companies. The entire process of building out a corpus of knowledge that is well-accepted among the community for being relevant and useful starts with the data dictionary. A data dictionary is representative of what truly important from each stakeholder’s perspective. When your platform is built on the foundations of a strong, well-thought-out data dictionary that supports identifying and aggregating complete patient journeys, from diagnosis to medical interventions to follow-ups to remission or death, your organization can begin to answer the questions most important to you in your disease-area of interest.
How Is a Data Dictionary Used?
Once the variables are defined, the data dictionary will be deployed into an electronic format, such as an eCRF (electronic Case Report Forms) or an easy-to-use, intuitive interface for data capture. Meta-data contained in the data dictionary allow the electronic system to restrict the type of information that can be submitted for each field. This allows for data validation at the time of data entry so malformed or non-sensical data cannot be entered on accident. The forms that are created based on the data dictionary are what will ensure users input the vital information in a standardized way to ensure harmonization of high-quality data. Implementing the data dictionary in an electronic data capture system can simplify the data input process, which will increase completion rate, and can provide benefits for patients whose motor skill may be affected by their condition as well as support interactions in languages other than English.
How We Create a Data Dictionary
At Pulse Infoframe, we have a rigorous process for creating a data dictionary, where we elicit feedback from various stakeholders, including patients, Key Opinion Leaders (KOLs), medical professionals, and relevant committees.
The basis of our process is the following:
- Scheduling a series of qualitative interviews with KOLs and Primary Investigators to understand their data requirements, and to identify relevant endpoints for any research objectives.
- Our clinical epidemiologists will conduct a systematic scientific review. This is especially important where there may be several research initiatives being carried out for the same disease. A comprehensive review of requirements will ensure the necessary data is available to support existing, planned and future studies.
- To ensure that the evidence generated from the platform answers the relevant research questions, we implement a governance structure with feedback from scientific, patient and industry collaborators and other key investigators to determine a minimal data set (to collect) applicable to the entire platform as well as location-specific data elements to account for differences in variables captured depending on Site, Region or Country.
- Once finalized, our team of experts will map the data elements being collected to standardized vocabularies recognized by regulatory agencies This allows you to easily submit regulatory standard data to the FDA or EMA.
- Recognizing that not all data comes from a single source, we also identify additional data sources that can be added to a platform to supplement the data that being collected. These data can be from publicly available data sources or private repositories and can include lab reports, various -omics data, electronical medical records, etc.
It is important to remember that a data dictionary is a living document which may have updates throughout the lifecycle of evidence generation. Like medical advances, data does not stand still.
The data dictionary should be reviewed and approved by all parties. All major revisions, defined as any changes that affect the data collection, will require sponsor review and sign-off and institutional review board (IRB) approval prior to implementation.
Looking for Inspiration?
If you are looking for a starting point for the information your data dictionary should include the National Health Service (NHS) in the United Kingdom has published its dictionary online, and it is freely available.
Should you require more advice, or if you are wondering if you even need your own data dictionary, we can help – for more information, email contact@pulseinfoframe.com.