1
Challenges of Evaluation in Biomedical Informatics
This chapter develops, in a general and informal way, the many topics that are explored in more detail in later chapters of this book. It gives a first def- inition of evaluation, describes why evaluation is needed, and notes some of the problems of evaluation in biomedical informatics that distinguish it from evaluation in other areas. In addition, it lists some of the many types of information systems and resources, the questions that can be asked about them, and the various perspectives of those concerned.
First Definitions
Most people understand the term “evaluation” to mean measuring or describing something, usually to answer questions or help make decisions.
Whether we are choosing a holiday destination or a web browser, we eval- uate the options and how well they fit key objectives or personal prefer- ences. The form of the evaluation differs widely according to what is being evaluated and how important the decision is. Therefore, in the case of holiday destinations, we may ask our friend which Hawaiian island she prefers, and why, before searching for more specific information about hotels. For a web browser, we may focus on more technical details, such as the time to open a web site or its compatibility with our disability. Thus, the term “evaluation” describes a range of data-collection activities designed to answer questions ranging from the casual “What does my friend think of Maui?” to the more focused “Is web browser A quicker than web browser B on my computer?”
In biomedical informatics, we study the collection, processing, and com- munication of information related to health care, research, and education.
We build “information resources,” usually consisting of computer hardware and software, to facilitate these activities. For healthcare applications, such information resources include systems to collect, store, and communicate data about specific patients (e.g., clinical workstations and electronic patient records) and systems to assemble, store, and reason using medical knowl-
1
edge (e.g., medical knowledge bases and decision-support systems). For basic and clinical research applications, information resources include genomic and proteomic databases, tools to analyze these data and correlate genotypes with phenotypes, and methods for identifying patients eligible for clinical trials.* In health education, there are Internet-based programs for distance learning and virtual reality simulations for learning medical procedures. There is clearly a range of biomedical information and com- munication resources to evaluate.
To further complicate the picture, each information resource has many aspects that can be evaluated. The technically minded might focus on inher- ent characteristics, asking such questions as: “How many columns are there per database table?” or “How many probability calculations per second can this resource sustain?” Clinicians, researchers, and students might ask more pragmatic questions, such as: “Is the information in this resource completely up-to-date?” or “How much time must I invest in becoming proficient with this resource, and will it do anything to help me personally?” Those with a broader perspective might wish to understand the impact of these resources on organization or management, asking questions such as: “How well does this electronic patient record support clinical audit?” or “Will sophisticated educational simulations change the role of the faculty in teaching?” Thus, evaluation methods in biomedical informatics must address not only a range of different types of information resources, but also a range of ques- tions about them, from the technical characteristics of specific systems to their effects on people and organizations.
In this book, we do not exhaustively describe how each possible evalua- tion method can be used to answer each kind of question about each kind of information resource. Instead, we describe the range of techniques avail- able and focus on those that seem most useful in biomedical informatics.
We introduce, in detail, methods, techniques, study designs, and analysis methods that apply across a range of evaluation problems. The methods introduced in this volume are applicable to the full range of information resources directed at health care, biomedical research, and education. The examples we introduce and discuss will come primarily from clinical and educational application. While there is a great deal of interest in a rapidly developing set of information resources to support bioinformatics and clin- ical research, there is relatively little experience with the evaluation of these tools.
In the language of software engineering, our focus is much more on soft- ware validation than software verification. Validation means checking that the “right” information resource was built, which involves both determin- ing that the original specification was right and that the resource is per-
* In this volume, the term “bioinformatics” will be used to refer to the use of infor-
mation resources in support of biological research. In this parlance, bioinformatics
is a subset of the broader domain of biomedical informatics.
forming to specification. By contrast, verification means checking whether the resource was built to specification. As we introduce evaluation methods in detail, we will distinguish the study of software functions from the study of their impact or effects on users and the wider world. Although software verification is important, this volume will only summarize some of the rel- evant principles in Chapter 3 and refer the reader to general computer science and software-engineering texts.
Reasons for Performing Evaluations
Like any complex, time-consuming activity, evaluation can serve multiple purposes. There are at least five major reasons why we evaluate biomedical information resources.
11. Promotional: To encourage the use of information resources in bio- medicine, we must be able to reassure clinicians, patients, researchers, and educators that these resources are safe and bring benefit to persons, groups, and institutions through improved cost-effectiveness or safety, or perhaps by making activities possible that were not possible before.
2. Scholarly: If we believe that biomedical informatics exists as a disci- pline or scientific field, ongoing examination of the structure, function, and impact of biomedical information resources must be a primary method for identifying and confirming the principles that lie at the foundation of the field.
2In addition, some developers of information resources carry out evaluations from different perspectives out of basic curiosity in order to see if the resources are able to perform functions that were not in the original specifications.
3. Pragmatic: Without evaluating the resources they create, developers can never know which techniques or methods are more effective, or why certain approaches failed. Equally, other developers are not able to learn from previous mistakes and may reinvent a square wheel.
4. Ethical: Before using an information resource, clinicians, researchers, educators, and administrators must be satisfied that it is functional and be able to justify its use in preference to alternative information resources and the many other innovations that compete for the same budget.
5. Medicolegal: To reduce the risk of liability, developers of an informa-
tion resource should obtain accurate information to allow them to label it
correctly
3and assure users that it is safe and effective. Users need evalua-
tion results to enable them to exercise their professional judgment before
using these resources, thus helping the law regard each user as a “learned
intermediary.” An information resource that treats the users merely as
automatons, without allowing them to exercise their skills and judgment,
risks being judged by the strict laws of product liability instead of the more
lenient principles applied to provision of professional services.
4Every evaluation study should be motivated by one or more of these factors; otherwise, it risks being what has been called a “triple blind study,”
in which neither evaluators, participants, nor readers of the report can fathom why it was done.
†Awareness of the major reason for conducting an evaluation often helps frame the major questions to be addressed and avoids any disappointment that may result if the focus of the study is mis- directed. We return in Chapter 3 to the problem of deciding how much to emphasize each of the many questions that arise in most evaluation studies.
Who Is Involved in Evaluation and Why?
We have already mentioned the range of perspectives in biomedical infor- matics, from the technical to the organizational. With specific regard to the clinical domain, Figure 1.1 shows some of the actors involved in paying for (solid arrows) and regulating (shaded arrows) the healthcare process. Any of these actors may be affected by a biomedical information resource, and each may have a unique view of what constitutes benefit. More specifically, in a typical clinical information resource project, the key “stakeholders” are the developer, the user, the patients whose management may be affected, and the person responsible for purchasing and maintaining the information resource. Each of these individuals or groups may have different questions
†