AI and Workers’ Compensation Data Collection Issues Ricky Ricardo … “Lucy we have a problem.”

Artificial intelligence relies on data to do its job. Some of the known problems with the use or management of AI are the lack of quality data and the embedded biases in the existing data. 

Timely and accurate data collection using a common data glossary would allow employers, insurance companies, claims administrators, and vendors to perform better internal analysis of their staff productivity, compare claims results, compare benefit provision, determine underwriting results, and determine the success and expense of their vendors. 

Standardized data will also allow jurisdictions to compare laws rules and regulations and determine the best practices for oversight and management of the system. 

If all systems captured (essentially) the same data, using the same data definitions it will be easier for companies to comply with various jurisdictional (DOI) reporting requirements. 

Currently, the worker’s compensation insurance industry has: 

  • No comprehensive data dictionary or data glossary of data elements that are available, or that can or should be captured. 
  • A limited consensus concerning the definitions of the data elements that should be captured. (The rating bureaus and Medicare set-asides have created de-facto consensus in their specific areas of influence). 
  • Years of unstructured and unverified data in its underwriting, claims, financial, and RIMIS systems. 
  • Data full of biases due to the medical providers’ and examiners’ biases in providing medical care and claims handling. 
  • No single oversight body that sets data definition or collection standards for the industry. Though the Rating Bureaus, Medicare system, WCIR, CWCI, NAIC, and the IAIABC, occasionally do perform this function within their various areas of influence. 
  • No established secure protocol to transmit, receive or share the collected data with all who need or can use the information. (Though the Federal Government has set standards for medical records transmission. HIPAA and NCPDP – ANSI X-12). 
  • No centralized repository of worker’s compensation data. (This may not be a terrible thing.) There are competing and cooperative locations where data is captured or stored. 
  • Over 3000 different medical practice management systems in use by medical providers. These practice management systems are primarily focused on the provision of the group health systems and little data or processes designed for worker’s compensation. 
  • A problem with the lack of data glossary/data definitions is that it impairs necessary consistent compliance with the patchwork of all the privacy laws which apply to worker’s compensation. 

Barriers to standardizing data collection:

  • Though the workers’ compensation industry may benefit from standardization of data definitions and data collection, there are significant administrative and economic barriers to making this happen. 
  • Collecting Data is expensive. Therefore, any changes must be justified by the benefits produced. 
  • The projected benefits may not directly affect or apply to those who must incur the costs to make the changes. 
  • The cost of any changes to the definitions and collecting the data will primarily fall directly on the employer, the medical provider, the insurance, and TPA community. 
  • Legacy systems do not include data elements that are necessary for the AI process to be maximized. 
  • Replacement, changes, or modifications of existing data elements in the current systems can be expensive (particularly on older legacy systems which are technologically more difficult to modify. 
  • Even minimal changes in data definitions within the companies can result in (an expensive) need for changes in the claims metrics reports as well as the claims financial reports. 
  • Unless carefully managed any changes in existing data elements in claims systems can result in changes to the actuarial outcomes. 
  • Data integrity can be problematic if the front-line professional is responsible for input: This is especially true for data elements that are not always immediately germane for the front-line professional. 
  • Data integrity can be increased through internal verification programs, however the “tighter” the system the more difficult it is to do the work for the front-line users. 
  • Data gaps are always created when there is any system migration. 
  • When changing data definitions or implementing new data fields or definitions it significantly impacts legacy claims reports. 
  • There are always problems with the integrity and integration of vendor data into payer systems. 
  • There is no standard database software or database design to facilitate usable and valuable reports and analysis. 
  • TPAs Insurance Companies and Self-Administered claims shops have different operational and financial needs. Therefore, their data and reporting needs are not the same. 
  • The use of forms, formats, and systems that are licensed proprietary property of various organizations can be problematic. Examples are AMA, NCPDP, and ODG. 
  • It may take fifteen years for a standardized data dictionary or data glossary to be accepted and fully implemented because upgrades to claims and underwriting systems are not done annually. 
  • Any standardization process involving achieving consensus between the States or State agencies may be problematic. 
  • With specific data elements, when the data is collected or when the event takes place can be as important as the actual event. (When the indemnity payment was made can be as valuable information as the amount of indemnity payment). 
  • Pure research does not always require that every transaction be collected and reported. A pure random sample of a statistically significant sample is enough for most research. 
  • Even when claims systems are upgraded, the integrity (and comprehensive collection) of the data can still be problematic. 
  • Even if there are standards for data definitions unless the claims systems are designed to verify the accuracy of the data fields (i.e., date of injury must be after date of birth) the results may be inconsistent, lacking, or inaccurate data. 
  • Claims systems should also provide immediate feedback to the front-line users concerning missing data elements, or misuse of the data elements or the data accuracy will be compromised. 
  • Many organizations do not hold their claims staff accountable for data accuracy. Most examiners do not consider accurate data input to be an essential function of their job. Examiners want to “adjust files and settle claims” and “do not consider themselves to be data input clerks.” The more input of data elements demanded by the claims system, the more difficult it is to retain staff. Some temporary workers refuse to work on claims systems that they perceive as inefficient and menial. 
  • If the claims systems have too many data elements for the examiners to use, or if there is a poor understanding of the data fields or if the data fields are poorly defined; then there is a deterioration of the quality of the data. 
  • Some of the newer systems which use PDF images or JPG images instead of data elements may have fewer verifiable or extractable data elements. 
  • Data can also be in the form of “free form file notes” which word detection software can mine. 
  • As insurance companies have migrated from claims systems to newer versions or new claims systems or as self-insured companies have changed their TPAs, each conversion of data can impair the integrity and accuracy of the legacy data. 
  • If the claims systems rely on data imports to pre-populate fields, the imported data may not be adequate or accurate. 
  • If the claims systems are too tight with their design and confirmation process the examiners may be unable to adjust files. 
  • System design should limit unnecessary activity by limiting the collection of data that will be useful. 
  • Automated collection of data should be encouraged. Direct input from medical providers, investigators, HR systems, bill review systems, ISO, NCCI, and payroll systems, should be encouraged. 
  • Most legacy systems do not have the data fields and were not designed to allow integrated indemnity benefit administration for LTD, STD, and WC. 
  • There is a significant difference between raw data and the information needed to make decisions. Unless the organization has the ability to put the data into the hands of the decision-makers in a manner that will result in changed behavior the exercise is wasted. 
  • Most organizations in the worker’s compensation industry underestimate the importance and understaff their analytics positions. 
  • Data needs and data definitions may change as technology and worker’s compensation systems change. 

Proposal 

It would be great to have a single data glossary and a single data dictionary for the nation to use. There is no easy solution to this problem. There is no one person who has the skills, the financial support, or administrative and legal independent veritas to do the job. 

Some incremental solutions are possible:

1. Begin the research or analysis to determine what kind of data is needed for AI to do its job. 

2. Define the system problems and challenges that currently cannot be analyzed or solved because we do not have the necessary data. 

3. Identify all the data elements which are not collected but are needed for AI and or the analysis of system problems. 

4. Look for and collect and publicly list all the data glossary discrepancies between States, or self-insured to insured, group health, and wc. 

5. Use and celebrate those areas where there are already common definitions. Do what can be done to expand these lists. 

6. Engage with companies that have made their living taking in the data and putting it into databases for analysis and use. (Claims system designers (guidewire) CWCI, WCRI IAIABC,) 

7. From the above, create a ‘wish list” of data elements and present them to those bodies that have influence over the system. (NAIC, IAIABC, The Rating Bureaus, NASI, NCPDP, CURES, Medicare, 

Even with the problems and expenses, if the industry does not take the steps to standardize the definitions it will weaken the ability to optimize AI, optimize benefit provision and reduce administrative system costs.