From Data to Action: Integrating Program Evaluation and Program Improvement 20

(1)

20

From Data to Action:

Integrating Program Evaluation and Program Improvement

Thomas J. Chapel, M.A., M.B.A., and Kim Seechuk, M.P.H.

466 While program evaluation is widely recognized as a core function of public health, differences in definition of “good evaluation practice” often lead to eval- uations that are time consuming and expensive, and, most importantly, produce findings that are not employed for program improvement. This chapter offers simple, systematic guidelines to maximize the likelihood that the time and effort to evaluate will be translated into program improvement. The goal that findings be used for program improvement is fundamental to the discipline of program evaluation. An old adage says it best: “Research seeks to prove; evaluation seeks to improve.” And evaluators have responded with a variety of approaches/frame- works whose central premise is “utilization-focused” evaluation—that no evalu- ation is good unless its results are used (1,2). This chapter emphasizes how early steps of a good evaluation process can build the conceptual clarity about the pro- gram that is needed to choose the right evaluation focus. It reinforces these points with case-specific advice for those doing STD interventions.

Programs can be “pushed” to do evaluation by external mandates from fun- ders or authorizers or they can be “pulled” to do evaluation by an internally felt need to examine and improve the program. STD programs are likely no different. State and local STD programs are pushed to evaluate by a mix of eval- uation mandates in cooperative agreements or foundation mandates—which in turn reflect demands on foundations by their boards or on funding agencies like the Centers for Disease Control and Prevention (CDC) by the Office of Management and Budget and the Government Performance and Results Act (GPRA) and Performance Assessment and Rating Tool (PART) processes.

^*

Using the STD world as an example, CDC’s Division of STD Prevention (DSTDP) now explicitly lists program evaluation as an essential activity within the Comprehensive STD Prevention Systems (CSPS) framework, and recent DSTDP Performance Measures Guidance (3) commits CDC’s efforts to measuring performance and aligning with goals. This CDC emphasis is translated

Behavioral Interventions for Prevention and Control of Sexually Transmitted Diseases.

Aral SO, Douglas JM Jr, eds. Lipshutz JA, assoc ed. New York: Springer Science + Business Media, LLC; 2007.

*

See the following for more discussion of the relationship of program evaluation to the Government Performance and Results Act (GPRA): http://www.gao.gov/new.items/gpra/

gpra.htm, and to the Performance Assessment and Rating Tool (PART), http://www.

whitehouse.gov/omb/part/

(2)

into pressure on states to evaluate; the Program Operations Guidelines require that programs monitor progress toward achievement of goals and objectives (4).

While external mandates such as these can be effective in motivating evalua- tion, it is preferable that programs be “pulled” by the internally felt need to evaluate, even when it is not required. And, indeed, more and more STD pro- grams see the need for good evaluation as problems become more complex, efforts emphasize behavioral interventions with hard-to-reach audiences, and programs must deal with the complexities of communities and institutional struc- tures. Community-wide surveillance measures tell only part of the story, and determining whether program efforts are effective—and why or why not—means delving into the innards of program efforts, understanding the sequence of mile- stones and markers for success, and unraveling the relationships between activi- ties and outcomes. STD programs might be evaluated for the following reasons:

●

to help prioritize activities and guide resource allocation;

●

to inform funders of the program whether their contributions are being used effectively;

●

to inform community members and stakeholders of the project’s value;

●

to provide information that can be useful in the design or improvement of similar projects.

Framework for Program Evaluation in Public Health:

The CDC Example

CDC’s Framework for Program Evaluation in Public Health (5) is a six-step approach to evaluation whose core assumption is that use of findings is most likely when the evaluation focus and design match the purpose and the poten- tial use and user of the specific evaluation situation. CDC’s framework inten- tionally employs broad definitions of both “evaluation”—“examination of merit, worth, significance of an object” (6)—and “program”—“any set of intentional, interrelated activities that aim for a common outcome”(5) so that practitioners at all levels would see program evaluation as something they needed and had the capacity to undertake.

The CDC framework includes six steps (Figure 1): 1) engage stakeholders;

2) describe the program; 3) focus the evaluation and its design; 4) gather credible evidence; 5) justify conclusions; and 6) use findings and share lessons learned.

The rationale underlying these steps is as follows: No evaluation is good just because the methods and analysis are valid and reliable, but because the results are used; getting use means paying attention to creating a “market” before you create the “product”—the evaluation itself. The evaluation focus is key to developing this market by ensuring the evaluation includes questions that are relevant, salient, and useful to those who will use the findings. Determining the right focus requires identifying key stakeholders (those besides the pro- gram who care about our efforts and their success) and understanding the program in all its complexity.

The steps are sequenced in a way that reinforces the idea that planning, per- formance measurement, and evaluation are integrated in a continuous cycle of continuous quality improvement loop:

●

Planning—What do we do?

(3)

●

Performance measurement—How are we doing?

●

Evaluation—Why are we doing well or poorly?

●

Planning—What should we do?

A set of four evaluation standards (7) complement the six steps. They help broaden or constrain our thinking at any step by asking: 1) Who will use the information and how (utility)? 2) How many resources are available for eval- uation (feasibility)? 3) What must be done to be proper and ethical (propri- ety)? 4) What approaches will produce the most accurate results, given the intended use (accuracy)?

The remainder of this chapter presents key insights at each step of the framework and illustrates them with some cross-cutting STD examples.

Applying Key Insights from the Framework

Engaging Stakeholders

Turning evaluation results into program improvement is often not under the control of evaluators or even of program staff members. Hence, programs that are committed to “use” of evaluation findings must pay attention to engaging

“stakeholders,” the array of people and organizations with vested interests in the program and its results.

STEPS

Engage Stakeholders

STANDARDS

Describe the Programthe

Focus the Evaluation Design

Gather Credible Evidence Utility Feasibility

Propriety Accuracy Ensure Use

and Share Lessons Learned

Justify Conclusions

Figure 1 Evaluation framework.

(4)

Stakeholders for the typical public health program fall into three overlap- ping categories: 1) those involved in program operations; 2) those affected by the program; and 3) those who will use evaluation results (who may be part of the first two groups). For a state or local STD program, these categories might comprise the following stakeholders:

●

STD program management

●

STD program staff

●

Other public health partners: family planning, laboratory, epidemiologists, etc.

●

STD program clients

●

Federal, state, and local funders of the program

●

Private providers

●

Community- or faith-based organizations that serve affected communities

●

Schools

●

Departments of corrections or jails

●

Businesses that cater to the target community (e.g., gay baths or bars)

●

Other public health partners (family planning, laboratory, epidemiologists, etc.)

●

Community members at large

●

Professional organizations (local chapter of AMA, NCSD)

●

HIV care providers

●

HIV community planning groups

These categories are broad; if the program desires use of evaluation findings, then within these three broad groups, the most important stakeholders are those who 1) will enhance the credibility of the evaluation or results, 2) will implement the evaluation’s recommendations for program improvement, and 3) will help with or are responsible for the continued authorization or funding, or some combination of these.

Following are two STD case examples. Note who the key stakeholders are and the differences in the parts of the program of most interest to them:

●

In a large metropolitan area, increases in infectious syphilis have been con- centrated in the men who have sex with men (MSM) community. Most syphilis cases in MSMs are diagnosed by private providers rather than the STD clinic and most MSMs with syphilis have reported frequenting particu- lar gay baths or bars in the metro area. As the STD program thinks about design and evaluation of prevention efforts, it might consider the following:

❍

Physicians’ concerns about patients’ confidentiality and Health Insurance Portability and Accountability Act (HIPAA) may deter physicians from working with the health department or influence which types of data col- lection are acceptable or not, or both.

❍

Business members may be concerned about prevention activities in their venues hurting business, but they may also desire to help protect their cus- tomers from STDs or from co-infection with HIV.

●

A rural community is experiencing high chlamydia (CT) rates in adoles- cents. As the STD program thinks about design and evaluation of prevention efforts, it might consider the following:

❍

Faith-based organizations may not participate unless prevention empha-

sizes safe-sex messages that include abstinence.

(5)

❍

Parents may fear that prevention activities will teach and induce their children to experiment with risky behaviors.

❍

Schools may fear the reaction of parents and the community to sex and drug education in the schools or schools may resist the disruption of the curriculum and the demands on teacher time.

❍

The community-at-large may fear that the evaluation will spread bad publicity about the community, thus hurting investment.

Knowing these needs, fears, and preferences of stakeholders early in the evaluation helps in a number of ways. If known early enough, this information can inform the design of the intervention and not just the evaluation. But even after the intervention is underway, the stakeholder information reminds us of outcomes that must be measured in the evaluation in order to keep these nec- essary stakeholders engaged in the process. For example, in the first case, busi- ness owners as citizens and potential members of the targeted community want to decrease STD rates in their community, but to keep them engaged in our interventions and evaluation as business owners, we must be attentive to the impact on their business, since cooperation or lack of cooperation from them will undermine prevention. Likewise, in the second case, since we must have schools, faith-based organizations, and parents engaged for CT prevention to be effective, then from the start we must be attentive to the outcomes that mat- ter to them and we must include them in the evaluation. Note that including the stakeholders’ needs and priorities in the evaluation does not ensure the answers they want, but only that the evaluation will include the questions that are most relevant and salient to them.

“Engaging stakeholders” sounds more complicated than it is. We may iden- tify a host of stakeholders but conclude that only a few are essential for cred- ibility, implementation, or continuation of the program. And most stakeholders may not want to be involved in every step of the evaluation. Determining the needs, opinions, and preferences of the few who must be engaged need not require extensive data collection; qualitative and simple methods are often enough.

Describing the Program

Before jumping into evaluation or planning, we want clarity on the following aspects of our program:

●

Need for the program: The big public health problem on which the program hopes to make some impact.

●

Target groups: Those people or organizations—other than the program and its staff—who need to change in some way to achieve the intended impact.

●

Outcomes: The way(s) in which they need to change.

●

Activities: The actions of the program and its staff that are intended to cause the target groups to change.

●

Inputs: The necessary resources to mount the activities effectively, such as staff members, funds, and legal authority.

In the syphilis case, the public health need is to contain the sudden surge of

syphilis cases in men; this appears from widening male-to-female case ratios

as well as patient-identified risk to be in MSM. Target groups in this case

include, among others, MSMs, the private providers who are diagnosing cases

(6)

in the MSM community, and some of the businesses the targeted men frequent.

We need interventions that will produce the following outcomes: MSMs will reduce risky behavior/adopt protective behavior; private providers will consis- tently report cases of syphilis to the HD to assure adequate treatment, and offer partner services, counseling, and follow-up; and businesses will participate in communication campaigns or screening events to encourage safe behaviors by their customers. Some key activities will include 1) conducting provider visits to inform providers about reporting regulations, services provided, etc.;

2) conducting grand rounds for targeted providers; and 3) conducting outreach to selected businesses and developing materials for distribution. Key inputs would include, among others, sufficiently trained staff and time to conduct visits and follow up on reports.

By contrast, in our rural community, the needs are identifying the best ven- ues (e.g., high schools) for screening to detect early any increase in CT in ado- lescents and to prevent new cases that, if left untreated, might cause infertility.

The target groups include the adolescents at risk, and, just as importantly, the schools who will host the screening and parents and those who can influence the school and the adolescents’ behavior. The outcomes we need to cause in these groups are as follows: Adolescents need to agree to screening, adopt protective behaviors and avoid risky behaviors, and, if positive, complete treat- ment and partner counseling. Schools need to agree to sponsor screening on site and during school hours, and parents and community influencers need to endorse and encourage participation of adolescents in screening, or at least not publicly oppose these efforts. The activities to move these target group out- comes include outreach to schools and community organizations, campaigns with parents and adolescents, screening clinics that are set up in the schools, and referral and follow-up for counseling and treatment. Key inputs for this intervention include trained staff members and time, an inventory of appropri- ate materials, and existing relationships with the schools and community organizations. These components of any program are implemented against a backdrop that includes:

●

Stage of development: How long the program has been underway.

●

Context: The trends and forces in the larger environment that may affect the program’s success or failure, such as history, demographics, competition, economics, and technology.

This backdrop will influence how or whether the intervention can be imple- mented, and, later on, which parts of it are suitable for evaluation. Both pro- grams in our case examples are just getting underway; we would not yet expect significant progress in public health outcomes. Some important com- ponents of the context might include changing demographics of affected com- munities, which makes engaging and building trust with community organizations and neighborhoods more complicated; competition of STD pro- grams for public health resources, especially with newly felt emergency pre- paredness needs; new technology—urine-based tests, self-administered risk assessments—that might make interventions simpler to implement; and the political or legislative climate of the community regarding such things as minor consent or condom distribution.

Logic models are a common way of depicting visually the relationship

among some or all of these elements of the program description, focusing

(7)

especially on showing the relationship between a program’s activities and intended outcomes. There are many ways to construct logic models, but most start by generating of a list of activities—actions taken by the program and its staff—and outcomes—ways in which people or organizations other than the program need to change. The next step is to try to depict any logical sequenc- ing with the list of activities and outcomes, i.e., the changes in knowledge, atti- tude, and belief (KAB) usually would precede behavior change, and formulation of materials would logically precede distribution of them. The resulting four-to-six column table may be all that is needed to lend clarity to discussions about the program and its evaluation. But, more often, final logic models will add columns for inputs and outputs. Or the content of the original four-to-six column table may be converted into a “flow chart” format that adds arrows that connect activities to their intended outcomes, or early outcomes to the later ones they are intended to influence.

Below, in Tables 1 and 2, the narrative program descriptions for our two case examples have been converted into simple logic models that depict the activi- ties and outcomes. In listing the activities and outcomes, we have been sure to include outcomes that were identified as important to stakeholders in the dis- cussions conducted earlier (these outcomes are marked with asterisks). When an evaluation focus is chosen in the next step, these serve as reminders of important outcomes that may need to be included in the evaluation.

These logic models are “snapshots” of the program and will change over time as evaluation, research, and daily experience show what is working and what is not. Also, these models could be made more detailed or less detailed, depending on the purpose for which they were drawn. In general, the dictum

“less is more” is good advice. Keep the model simple; construct a macro-level

Table 1 Logic model: Preventing syphilis in MSMs in a large metro area.

Inputs Activities ST outcomes MT outcomes LT outcomes

If we have in place ... And, if we do ... Then ... Then ... Then ...

Trained staff for Outreach to and Private providers Positive patients and Disease transmission provider and education for will report all partners will will be interrupted business outreach providers cases counsel at- complete treatment earlier to prevent

and health risk patients further spread

communication promptly

Funds Development of Patients will agree Patients and partners Prevalence and provider to HD treatment, will adopt protective incidence of information and counseling behaviors and avoid syphilis are and campaign and to partner risky behaviors reduced materials services

Treatment and Outreach to and Businesses Customers of targeted service capacity information will display businesses will adopt

for business campaign protective behaviors

owners information and avoid risky

behaviors

Relevant, supportive Businesses will allow Cooperating businesses

governmental on-site screening and practices not

regulation as needed adversely affected**

Patient confidentiality

not compromised**

(8)

(i.e., “global”) as the starting point, and use it as a template to “zoom in” for more detail on specific aspects of the program.

Focusing the Evaluation and Its Design

While the evaluation plan for a program may include indicators and data sources for every activity and outcome, the evaluation focus step identifies the specific parts of the whole program that need to be part of this evaluation this time. This focus will change over time as the purpose, use, and user of evalu- ation findings evolve. As noted, being attentive to changes in the purpose, use, and user of evaluation findings over time, ensures that the evaluation “prod- uct” has a ready “market.”

Over the life of a program, all of the following types of questions are likely to be asked of a program:

●

Implementation/process: Have the activities been implemented as intended?

●

Effectiveness/outcome: Have the outcomes occurred as hoped?

●

Efficiency: What level of resources was necessary to mount the activities and outputs?

●

Cost-effectiveness: What level of resources was necessary to produce a change in any outcome or all outcomes?

●

Causal attribution: Were any observed changes in outcomes due to our pro- gram and its efforts as distinguished from other factors?

Two of the four evaluation standards are used to determine which parts of the program need be part of the current evaluation focus. The “utility” standard asks:

●

What is the purpose of the evaluation?

●

Who will use the evaluation results?

Table 2 Logic model: Identifying and preventing CT in adolescents in a rural community.

Inputs Activities ST outcomes MT outcomes LT outcomes

If we have in place ... And, if we do ... Then ... Then ... Then ...

Staff Funds Inventory Outreach to schools Schools accept and Students are Prevalence and of prevention and community sponsor clinics on screened incidence of

education organizations site and during CT are reduced

materials school hours

Space, supplies, etc. Information Parents and community Early ID of CT Prevalence and to conduct testing campaigns with influencers encourage is enhanced incidence of

parents and screening campaign infertility are

students information reduced

Relationships with Screening clinics in Positive students Reputation of

schools, parents, the schools seek and town not

and community Referral and follow- complete adversely

organizations up for treatment treatment affected**

and counseling Students adopt

protective behaviors and avoid risky behaviors

School day not

adversely

affected**

(9)

●

How will they use the evaluation results?

●

Were key stakeholder needs identified in Step 1 that must be addressed to keep the stakeholders engaged?

The “feasibility” standard acts as a “reality check” to ensure that “useful”

questions are realistic ones based on:

●

Stage of development of the program: Is it too early in the program’s life to expect the specific program component of interest to have occurred.

●

Program intensity: The program is not intensive or strong enough to produce the program’s outcome of interest.

●

Resources for measurement: Easy-to-access data sources to collect informa- tion on the program component of interest do not exist nor do the resources to devise them.

By applying the utility and feasibility standards, the program can identify which components—i.e., what parts of the logic model—need to be part of the current evaluation. These components are converted into specific evaluation questions—

i.e., implementation, efficiency, effectiveness, and causal attribution.

Because our two case examples are new programs, an early purpose and user of evaluation may be the program itself that wants to determine whether it could implement the many components of the program as intended. This evaluation would include mainly activities and inputs in the focus; indeed, at this early stage no outcomes may be included at all. Did the outreach to the various target audi- ences happen, and happen did they happen as extensively as was desired? Were campaign materials developed and screening clinics established? Were the nec- essary numbers and types of staff members available to mount this program as intended? Implementation questions such as these are called “process evalua- tion.” Among other benefits, when outcomes are not achieved, good process evaluation helps determine whether the program was not the right intervention or whether it was a good intervention but poorly or inadequately implemented.

Both programs are addressing high rates of STD, either a sudden upsurge in incidence (MSM syphilis) or high-prevalence rates found in other venues with the same population (adolescents), and the interventions themselves are not without controversy. An early evaluation purpose may be to sustain the sup- port of the community for the intervention; the department may still be the chief user of findings, but also may need to show somewhat reluctant partners that the effort is paying off. In this scenario, partners may care little about inputs or activities, but they need proof that early outcomes are occurring, such as providers’ reporting, schools agreeing to host clinics, and parents and com- munity organizations endorsing the prevention efforts. The partners may want to see such proof of some mid-term outcomes as positive patients being iden- tified and referred to treatment, and patients reporting the adoption of protec- tive behaviors. They also may want proof that their fears or special needs are being addressed: Was business adversely affected by the prevention activities on site? Was the curriculum or teacher workday significantly hurt by the clin- ics? Was the intervention not too intrusive to their practice, and was only appropriate information shared? Did the reputation of the town suffer?

Over time, certainly funders and authorizers will want some evidence that the program and these interventions are meeting accountability standards.

In this scenario, the department is again the user of the evaluating findings, but

(10)

the purpose is to prove to funders that their money has been spent well or is making progress on intended public health outcomes. Such an evaluation might include some measures of efficiency of activities: Is this a good use of limited resources, or are there other activities that are less resource-intensive that would achieve similar or acceptable results? This focus will almost cer- tainly include long-term outcomes such as prevalence and incidence. It may include showing that reductions in prevalence and incidence are due to the efforts of the program, although demonstrating this conclusively is very hard in field settings and may require special research studies.

By understanding the needs and preferences of key stakeholders and the full complexity of the program before proceeding to evaluation, the early steps of the CDC evaluation framework ensure that the evaluation includes questions that are most important and relevant to those who can make program improvements.

Design choice, as with all elements of the evaluation focus step, will vary with purpose, use, and user, and with the time, resources, and expertise that can be brought to bear. In general, research and evaluation studies employ one or more of the following three designs:

●

Experimental designs;

●

Quasi-experimental design;

●

Observational designs.

When the program is being asked not only whether outcomes have occurred but also whether those outcomes are attributable to the program and its efforts, and there is need to make this “causal attribution” case with a high degree of certainty, then research studies using experimental or quasi-experimental designs may be appropriate. Components and options for these studies are dis- cussed elsewhere (8). But the emphasis of this chapter is on choosing the best design for more customary program evaluation studies where nonexperimen- tal designs are either the only feasible choice or even a better choice than experimental designs. As the World Health Organization (WHO) has noted,

“the use of randomized control trials to evaluate health promotion initiatives is, in most cases, inappropriate, misleading, and unnecessarily expensive” (9).

Some of the obstacles in implementing experimental or quasi-experimental

designs are illustrated by our two cases. One can imagine choosing a compar-

ison or control community for the syphilis intervention or assigning some but

not all schools to in-school CT screening. However, even if the time,

resources, and expertise were available to implement this design, it is fraught

with potential problems. A comparison community would need to be one also

experiencing an upsurge in syphilis in MSMs and it is unlikely that that com-

munity would do nothing to address the problem while it awaited the outcome

of our intervention. Also, efforts in our community might spill over to the

comparison community unless it were located far away, in which case cultural

and geographic factors might make it a poor choice for comparison. If CT

screening is implemented in some but not all schools, the adolescent grapevine

is sure to spread word of the intervention, leading either to resistance in

the schools with the intervention or ethical questions about withholding

it from the other schools. While good design can address these problems,

nonexperimental designs are more practical and may work adequately for the

purpose at hand. If the community has good information on syphilis and CT

rates, and, better yet, can disaggregate the data for demographic or geographic

(11)

groups, then changes in rates after implementation of the intervention offer some initial “proof” that efforts are working. While circumstantial, this evi- dence can be supplemented by targeted surveys or other information to bolster the case. For example, do patients remember campaign messages? Do patients attribute their decision to be screened or to adopt protective behaviors to ele- ments of the intervention? Do parents and organizations attribute their coop- eration to outreach efforts of the health department?

The early steps of the framework are not the end of the story, but ensure that the remaining steps—selecting indicators and data sources, analyzing and reporting the data—are informed by clarity and consensus on what the pro- gram is and what is most important to evaluate. This ensures that the time and energy spent on data collection and analysis result in use of the findings.

Gathering Credible Evidence and Justifying Conclusions

Program components are often expressed in global or abstract terms.

Indicators are specific, observable, and measurable statements that help define exactly what we mean. Indicators are needed for both the outcomes and the activities in the logic model. Outcome indicators provide clearer definitions of our global statement and help guide the selection of data collection methods and the content of data collection instruments. For example, “Positive students complete treatment” and “Parents and community organizations encourage screening campaign” are two outcomes in the CT logic model. The treatment indicator might specify the type of medical treatment, duration, or adherence to the regimen. Likewise, the parent and community indicator might include specific behaviors that indicate encouragement.

Indicators for program activities—usually called “process” indicators—

provide specificity on what constitutes “good implementation” of the activi- ties—not just “outreach,” but “good outreach” or “enough outreach with the right organizations.”

If the logic model listed “outputs,” then some of the work has been done, since, as noted earlier, outputs are tangible, countable ways of documenting that the activities tool place. For outreach, outputs might include the number of visits made to a certain mix of community organizations, or the number of memoranda of agreement signed. These serve well as process indicators.

Similarly, a program with a strategic plan may already have developed process or outcome objectives. If the objectives were written to be specific, measura- ble, action-oriented, realistic, and time-bound (so-called “SMART” objec- tives), then they may serve as indicators as well.

Programs can sometimes use indicators developed by others. Some large

CDC programs have developed indicator inventories that are tied to major

activities and outcomes for the program. An advantage of these indicator inven-

tories is that they may have been pre-tested for “relevance” and accuracy, define

the best data sources for collecting the indicator, and include many potential

indicators for each activity or outcome, ensuring that at least one will be appro-

priate for your program, and, because many programs are using the same indi-

cator(s), you can compare performance across programs or even construct a

national summary of performance.† For example, the Division of STD

Prevention has a performance indicator that measures the timely treatment of

women with chlamydia at certain family planning sites. This indicator may be

(12)

very useful to the Office of Population Affairs (OPA) that oversees Title X fam- ily planning clinics. Conversely, measures used by OPA may be of use to DSTDP as it looks at STD prevention and care in family planning clinics.

In selecting data collection methods and sources for indicators, the primary decision is whether existing data sources—secondary data collection—are available or whether new data must be collected—primary data collection. As was the case in choosing an evaluation focus, the program must balance “util- ity” (how useful the information is), against “feasibility” (how hard or expen- sive it will be to collect). Often, programs have limited funds for evaluation, and unless a particular outcome is of widespread interest and requires very accurate data, they will rely as much as possible on existing data sources. For STD programs, several secondary sources might exist such as laboratory or provider reports of reportable diseases, interview records for patients’ syphilis, HIV or other STD, and laboratory reports of positive and negative tests com- pleted by provider. However, these secondary data sources must be appropri- ate to the indicators. Some surveillance systems have the advantages of uniform definitions and ability to compare across jurisdictions, but do not allow for adding questions to the survey or disaggregation of data at the level of geography needed to examine the performance of the intervention.

Primary data collection methods fall into several broad categories:

●

Surveys, including personal interviews, telephone, or instruments completed in person or received through the mail or e-mail;

●

Group discussions or focus groups;

●

Observation;

●

Document review, such as medical records, but also diaries, logs, minutes of meetings, etc.

These methods may yield quantitative or qualitative data, or both, and, where evaluation questions are abstract or data quality is poor, programs are often advised to use multiple methods. The following checklist—based on the four evaluation standards—can reduce the data collection options to a manageable number:

●

Utility:

❍

Purpose and use of data collection—Do you seek a “point in time”

determination of a behavior, or to examine the range and variety of expe- riences, or to tell an in-depth story?

❍

Users of data collection—Will some methods make the data more credi- ble with skeptics or with key users than with others?

●

Feasibility:

❍

Resources available—Which methods can you afford?

❍

Time—How long until the results are needed?

❍

Frequency—How often do you need the data?

❍

Your background—Are you trained in the method, or will you need help from an outside consultant?

† While such an indicator inventory does not currently exist for DSTDP’s grant pro-

gram, the 12 new performance measures discussed earlier (CDC, NCHSTP, DSTDP,

2005) may play a similar role.

(13)

●

Propriety:

❍

Characteristics of the respondents—Will issues such as literacy or lan- guage make some methods preferable to others?

❍

Degree of intrusion upon to program or participants—Will the data col- lection method disrupt the program or be seen as intrusive by participants?

❍

Other ethical issues—Are there issues of confidentiality or safety of the respondent in seeking answers to questions on this issue?

●

Accuracy:

❍

Nature of the issue—Is it about a behavior that is observable?

❍

Sensitivity of the issue—How open and honest will respondents be in responding to the questions on this issue?

❍

Respondent knowledge—Is it something the respondent is likely to know?

In our two cases, surveillance data are a likely source to measure changes in long-term outcomes of incidence and prevalence, so long as those data can be disaggregated for just the communities with the interventions. By contrast, the short-term and mid-term outcomes will need to rely on primary data col- lection: surveys, interviews, or document reviews. The final methods chosen will depend on factors cited earlier, such as time, money, and credibility with stakeholders.

In our MSM syphilis scenario, data collection sources might include surveil- lance data to measure number and promptness of case reports; patient or part- ner interviews regarding self-reported, risky behaviors; reports of businesses that participate in outreach; numbers of condoms requested for distribution at these venues; number of clients accepting screening; or risk surveys with cus- tomers of the businesses. Physician outcomes might include physicians’ sur- veys. But if there were doubts about the reliability of self-reported physician data or if heavy physician schedules would likely lead to low response rates, then direct observation or reports (logs) of the numbers of physicians who con- tact the HD or allow provider visits to be conducted might be a better choice.

With respect to CT screening of adolescents in schools, data sources would include logs or reports of the number of schools in a district that agree to par- ticipate; the number of students screened; number of parental consents received; and prevalence data from the site. For monitoring changes in students’

risky behavior, risk questionnaires would be a good choice. But, if parents were hesitant about such risk questionnaires or were likely to demand to see the data, the threat to confidentiality would affect how honestly students would answer or would reduce the students’ participation. Hence, some other source might be a better choice.

We talk of “justifying conclusions” and not “analyzing data” in order to emphasize that the evidence does not stand on its own, but is judged and inter- preted through the prism of (potentially different) values that each stakeholder brings to the evaluation. Fortunately, the identification of any significant dif- ferences in values and standards was a core part of Step 1 and, as a result, the evaluation design should already reflect their priority outcomes and prefer- ences for credible data collection. In this step, those values and priorities are used to interpret the evidence and judge the success of the program.

In our two cases, for example, while all parties might agree that a 50%

reduction in syphilis rates is a significant achievement, if the bar or bath

(14)

owners experienced a decrease in business or reputation, they may not see a 50% reduction as worth the loss to their livelihood. Others, such as advocacy organizations, may want to know which 50% experienced the reduction. Was the reduction across the board or was it confined to some income and ethnic groups but not to others? In the CT example, overall reduction in risky behav- ior is likely to be applauded by all stakeholders, but, as noted, faith-based organizations or parents may want to see increases in abstinence as distin- guished from condom use as a safe-sex behavior or they might see increases in condom use as a bad thing.

Ensuring Use and Sharing Lessons Learned

Because the evaluation has been based on the six steps of the CDC evaluation framework, most of the seeds for ensuring use were sown earlier and are ready for harvest at this step. Key actions are obvious ones and include making recommendations and ensuring the recommendations are acted upon.

Making Recommendations

Remember, the underlying rationale of the framework is that using this approach is more likely to lead to use of findings. That is, if we choose ques- tions of interest to stakeholders and measure activities and outcomes in a way that is both useful and feasible, then the findings will be used for program improvement. How might this play out in our two cases?

For the urban syphilis case, achieving our public health outcomes requires engagement of private providers, patients, and local businesses. And our eval- uation focus included some outcomes of potential interest to them—”patient confidentiality is not compromised” and “businesses and practices are not adversely affected.” Thus, our evaluation findings will not only determine whether we met outcomes on adoption of protective behaviors and reduction in disease transmission or not, it will determine, even if we did not, what fac- tors we might address in the next cycle to achieve our outcomes. We might find that providers did their part, but that we did not convince businesses that there would be no adverse effect. As a result, we reached only those who pre- sented for care. Bad as that sounds, these findings guide our action in the next round. Of all the elements of this intervention, we would put time and atten- tion in the next round to reassuring business owners and gaining their support.

Findings would be used in a similar way in our rural CT case. By including in our focus a range of intermediate outcomes and not just the ultimate public health outcome we seek, the evaluation can direct strategic action in the next round. Our findings may show that positive students completed treatment, but that students as a whole did not adopt protective behaviors. Why? Other find- ings show that parents and community influencers opposed the content of the screening campaign. These findings tell us in the next cycle of program activ- ity we may need to do some or all of the following: convince parents to accept our messages; change the messages even if they may be less powerful; or find some way to get information to students outside the school setting.

In both examples, the findings provide guidance because we have been care-

ful to include an array of activities and outcomes from the logic model in the

evaluation focus and because we made most of those selections based on the

explicit purpose, user, and intended use of the findings. The result was relevant

findings that could inform action.

(15)

Following are some examples from real STD projects where applying steps in the CDC evaluation framework led to findings that could guide program improvement:

●

Because a state chlamydia screening program included partners in its evalu- ation focus and evidence-gathering decisions, all parties agreed in advance to the screening criteria, agreed on data to be collected, monitored data, and shared it periodically. When it appeared that several clinics were not adher- ing to the criteria, partners were able and willing to work with the clinics to help them adhere to the criteria, thus better using limited resources to screen populations most at risk.

●

A syphilis elimination project employed a community partnership approach but wanted to ensure that the task force component was being implemented as intended. By including good process measures in their evaluation plan, they were able to identify several gaps between intention and reality. The findings were used to add activities such as an open house to recruit members from a target population, and helped the task force create an action plan with goals, objectives, and timelines to assure they stayed on track. Likewise, because they had a comprehensive logic model for their education and screen- ing event component and had included stakeholders in choosing where to focus their education and screening evaluation, the findings were able to guide them in revising materials to include more information on STDs other than syphilis, increase the emphasis on risk-reduction messages, modify the content of the brochure so that the community found it less offensive, target locations of screening events based on prevalence and incidence data, and add HIV testing to the screening events.

Acting on Recommendations

Because the evaluation focus and data collection were decided on in conjunc- tion with key stakeholders, the remaining steps to maximize use are as follows:

●

Preparation: Giving early warning about themes and results to key evalua- tion audiences to prevent “blind-siding” them;

●

Feedback: Allowing for review and response to early versions of results to encourage buy-in and utility and to get better sense of best format and emphasis;

●

Follow-up;

●

Dissemination: Sharing the results and the lessons learned from evaluation.

Of these, dissemination has been most enhanced by the work done in earlier steps. The market for the evaluation was created earlier; dissemination deci- sions are simple ones of working from audience to format. And much about this is known from the stakeholder engagement step where we should have asked what messages and delivery method would be of most value to them.

In the syphilis example, health care providers may likely be interested in

receiving information in a relatively brief, nonintrusive way (routine faxes or

mailings), or receiving the information in a format such as a “report card” that

shows numbers of cases that they reported and timeliness of their reports com-

pared to the mean of others. Likewise, we know that business owners want to

know how their patrons react to the health communications campaign, and per-

haps to understand the changing risky behaviors of their clients so that they

may serve them better. But, as with providers, they probably want this in a simple,

straightforward format.

(16)

In the second example, we may have determined in stakeholder engagement that students will be interested in disease rates and risky behaviors reported by their peers and will use these to establish social norms. Here, group venues such as a school assembly or student health fair might be best. School admin- istrators will be interested in rates and risky behaviors reported so that they might begin to address the issues in the classroom regarding STD information, but are likely to prefer private channels so that the school reputation is not adversely affected.

Summary: The “Payoff”

The CDC framework and similar “utilization-focused” approaches arose from the observation that most evaluations did not lead to program improvement.

By thinking about use and user from the start, these approaches aim to ensure that evaluation time and effort make a difference. Still, six steps are a lot, and a dogmatic approach to this or any framework can lead to wasted energy. The evaluation standards serve as a reminder that all evaluations are case-specific.

Where the program is the only stakeholder and the intended outcome is clear and easy to measure, then we can zip through the early steps. But clear and easy evaluations are the exception, and paying at least cursory attention to engag- ing stakeholders and understanding the program will yield insights that ensure evaluation is focused on the parts of the program that matter most and result in findings that are used to improve the program.

References

1. Goodman RM. Principles and tools for evaluating community-based prevention and health promotion programs. Journal of Public Health Management and Practice. 1998;4:37–47.

2. Patton MQ. Utilization-Focused Evaluation: The New Century Text, 3rd Ed.

Thousand Oaks, CA: Sage; 1997.

3. Centers for Disease Control and Prevention, National Center for HIV, STD, and TB Prevention, Division of STD Prevention. Measures Companion Guidance:

Comprehensive STD Prevention Systems, Prevention of STD-Related Infertility, and Syphilis Elimination Program Announcement. June 2005.

4. Centers for Disease Control and Prevention, National Center for HIV, STD, and TB Prevention, Division of STD Prevention. Program Operations Guidelines for STD Prevention. Program Evaluation. January 2001.

5. Centers for Disease Control and Prevention. Framework for program evaluation in public health. MMWR 1999;48:1–40.

6. Scriven M. Minimalist theory of evaluation: the least theory that practice requires.

American Journal of Evaluation. 1998;19:57–70.

7. Joint Committee on Standards for Educational Evaluation. The Program Evaluation Standards: How to Assess Evaluations of Educational Programs, 2nd Ed. Thousand Oaks, CA: Sage Publications; 1994.

8. Campbell DT, Stanley J. Experimental and Quasi-experimental Designs for Research. New York: Houghton Mifflin; 2005.

9. WHO European Working Group on Health Promotion Evaluation. Health promo- tion evaluation: Recommendations to policy-makers: Report of the WHO European working group on health promotion evaluation. Copenhagen, Denmark: