Wednesday, April 28, 2010

Top Ten Characteristics of a Software Safety Culture

This is a follow-on to my previous post regarding the safety issues of EHRs…. But first, one comment and point of emphasis before I dive into the Top Ten details: Tightly-integrated, monolithic EHR solutions that rely on a single data model are much less prone to safety risk scenarios than those associated with loosely integrated, best-of-breed HL7-based systems. That’s my personal observation-- no lengthy study required to prove it to me. There are, of course, other drawbacks to purchasing a monolithic EHR from a single vendor, but from a safety standpoint, they are a better choice than buying disparate pieces and trying to glue them together with HL7. The unfortunate downside to HL7 is its fragility—errors are frequent and occur easily-- leading to mismatched patient records, delayed delivery of vital results, and lost delivery of the same. I’m not a fan of message oriented architectures (MOA)—never have been-- but especially so in healthcare where HL7 has made it insidiously easy to hang onto old school, unreliable MOA designs.

Now on to the Top Ten…

My information systems and software development background started in the U.S. Air Force, an environment in which software safety (and security) is an embedded part of the culture. Military weapons control and targeting systems, flight control systems, and intelligence data processing systems are all developed with an inherent sense of safety. That background predisposed me towards an interest in the subject, especially so as I watched software and firmware play a larger and larger role in those areas of our lives that were once controlled by levers, gears and links. I have a gloomy sense of self-satisfaction in knowing that I predicted-- and tried on numerous occasions to affect the prevention of—the software system failures that affect many of our computerized voting systems and now seem also to affect our automotive control systems. I find myself in yet another Groundhog Day moment with this topic and EHRs.

As described by Debra Herrmann in “Software Safety and Reliability”, software is safe if… (I added the points of emphasis):

1. It has features and procedures which ensure that it performs predictably under normal and abnormal conditions

2. The likelihood of an undesirable event occurring in the execution of that software is minimized

3. If an undesirable event does occur, the consequences are controlled and contained

Below are the Top Ten Characteristics of a Safe Software Environment, at the CIO-level, based upon my 20+ years of watching and studying this topic. There are numerous lower level programming techniques that I might cover later, but these are the top characteristics that a healthcare CIO can and should directly lead. If you were a Software Safety Auditor for ONCHIT, CCHIT, HHS, or the FDA, you could use this checklist as a preliminary assessment of an EHR vendor or healthcare organization undergoing an EHR implementation.

1. Does the organization have an internal committee to whom software safety concerns and events can be escalated and reported?

2. Is there an Information Systems Safety Event Response process documented and well-known in the culture?

3. Is there a classification and metrics system in-use for categorizing and measuring the events and risks associated with software safety? Such as:

Type 1: Catastrophic. Patient life is in grave danger. The probability for humans to recognize and intervene to mitigate this event is very low or non-existent. Intervention is required within seconds to prevent the loss of life or major body function.

Type 2: Severe. Patient health is in immediate danger. The probability for humans to recognize and intervene to mitigate this event is low, but possible. Intervention is required within minutes to prevent serious injury or degradation of patient health that could lead to the loss of life or major body function.

Type 3: Moderate. Patient health is at risk. However, the probability for humans to recognize and intervene to mitigate this event is probable. Intervention is required within hours or a few days to prevent a moderate degradation in patient health.

Type 4: Minor. Patient health is minimally at risk. The probability for humans to recognize and intervene to mitigate this event is high. Corrective action should occur within days or weeks to avoid any degradation in patient health.

4. Are software applications, functions, objects, and system interfaces classified and tested according to their potential impact on safety risk scenarios? Such as:

a. Data which appears timely and accurate, but is not
b. Data that is not posted—incomplete record
c. Valid data that is accidentally deleted
d. Data posted to the wrong patient record
e. Errors in computerized protocols and decision support tools and reports

5. Do software developers follow independent validation and verification (IV&V) of their safety-critical software?

6. Is there a process of certification for programmers who are authorized to work on safety-critical software?

7. Are fault tree and failure modes effects analysis an embedded element of the software development and software safety culture?

8. Is scenario-based software testing utilized?

9. Are abnormal conditions and/or inverse requirements testing performed?

10. Is there a more risk-averse software change control and configuration management process in-place for safety-critical software?

In sharing these Top Ten Characteristics, based upon my experience, it is an attempt to contribute to the solution, not just complain about the problem. If you have any thoughts, suggestions, or personal lessons, I’d enjoy hearing from you.

No comments:

The Death of Risk, Adventure, and Accountability in Our Lives

This article , entitled, "23 Dangerous Things You Should Let Your Kids Do", prompted me to pause and think. Here are the 23 things...