7.5 Evaluation methodology

The evaluation design must detail a step-by-step plan of work that specifies the methods the evaluation will use to collect the information needed to address the evaluation criteria and answer the evaluation questions, analyse the data, interpret the findings and report the results.

Evaluation methods should be selected for their rigour in producing empirically based evidence to address the evaluation criteria and respond to the evaluation questions.  The evaluation inception report should contain an evaluation matrixthat displays for each of the evaluation criteria, the questions and sub-questions that the evaluation will answer, and for each question, the data that will be collected to inform that question and the methods that will be used to collect that data (see Box 40). In addition, the inception report should make explicit the underlying theory or assumptions about how each data element will contribute to understanding the development results—attribution, contribution, process, implementation and so forth—and the rationale for data collection, analysis and reporting methodologies selected.

Box 40. Questions for evaluators
The commissioning office should, at a minimum, ensure that the evaluation methods detailed in the evaluators’ inception report respond to each of the following questions:

  • What evidence is needed to address the evaluation questions?
  • What data collection method(s) will be used to address the evaluation criteria and questions? Why were these methods selected? Are allocated resources sufficient?
  • Who will collect the data?
  • What is the framework for sampling? What is the rationale for the framework?
  • How will programme participants and other stakeholders be involved?
  • What data management systems will be used? That is, what are the planned logistics, including the procedures, timing, and physical infrastructure that will be used for gathering and handling data?
  • How will the information collected be analysed and the findings interpreted and reported?
  • What methodological issues need to be considered to ensure quality?

 
Data collection methods 

The data to be collected and the methods for collecting the data will be determined by: the evidence needed to address the evaluation questions; the analyses that will be used to translate the data into meaningful findings in response to the evaluation questions; and judgements about what data are feasible to collect given constraints of time and resources. UNDP evaluations draw heavily on data (performance indicators) generated through monitoring during the programme or project implementation cycle. Performance indicators are a simple and reliable means to document changes in development conditions (outcomes), production, or delivery of products and services (outputs) connected to a development initiative (see Chapter 2).

Performance indicators are useful but have limitations. Indicators only indicate; they do not explain. Indicators will not likely address the full range of questions the evaluation seeks to address. For example, indicators provide a measure of what progress has been made. They do not explain why that progress was made or what factors contributed to the progress. UNDP evaluations generally make use of a mix of other data sources, collected through multiple methods, to give meaning to what performance indicators tell us about the initiative.

Primary data consists of information evaluators observe or collect directly from stakeholders about their first-hand experience with the initiative. These data generally consist of the reported or observed values, beliefs, attitudes, opinions, behaviours, motivations and knowledge of stakeholders, generally obtained through questionnaires, surveys, interviews, focus groups, key informants, expert panels, direct observation and case studies. These methods allow for more in-depth exploration and yield information that can facilitate deeper understanding of observed changes in outcomes and outputs (both intended and unintended) and the factors that contributed by filling out the operational context for outputs and outcomes.

Secondary data is primary data that was collected, compiled and published by someone else.  Secondary data can take many forms but usually consists of documentary evidence that has direct relevance for the purposes of the evaluation. Sources of documentary evidence include: local, regional or national demographic data; nationally and internationally published reports; social, health and economic indicators; project or programme plans; monitoring reports; previous reviews, evaluations and other records; country strategic plans; and research reports that may have relevance for the evaluation. Documentary evidence is particularly useful when the project or programme lacks baseline indicators and targets for assessing progress toward outputs and outcome measures. Although not a preferred method, secondary data can be used to help recreate baseline data and targets. Secondary information complements and supplements data collected by primary methods but does not replace collecting data from primary sources.

Given the nature and context of UNDP evaluations at the decentralized level, including limitations of time and resources, evaluators are often likely to use a mix of methods, including performance indicators, supplemented relevant documentary evidence from secondary sources, and qualitative data collected by a variety of means.

Table 28 presents brief descriptions of data collection methods that are most commonly applied in evaluations in UNDP for both project and outcome evaluations. 

Table 28. Summary of common data collection methods used in UNDP evaluations53

Method

Description

Advantages

Challenges

Monitoring and Evaluation  Systems

Uses performance indicators to measure progress, particularly actual results against expected results.

Can be a reliable, cost-efficient, objective method to assess progress of outputs and outcomes.

Dependent upon viable monitoring and evaluation systems that have established baseline indicators and targets and have collected reliable data in relation to targets over time, as well as data relating to outcome indicators.

Extant Reports and Documents

Existing documentation, including quantitative and descriptive information about the initiative, its outputs and outcomes, such as documentation from capacity development activities, donor reports, and other evidentiary evidence.

Cost  efficient.

Documentary evidence can be difficult to code and analyse in response to questions.

Difficult to verify reliability and validity of data.

Questionnaires

Provides a standardized approach to obtaining information on a wide range of topics from a large number or diversity of stakeholders (usually employing sampling techniques) to obtain information on their attitudes, beliefs, opinions, perceptions, level of satisfaction, etc. concerning the operations, inputs, outputs and contextual factors of a UNDP initiative.

Good for gathering descriptive data on a wide range of topics quickly at relatively low cost.

Easy to analyse.

Gives anonymity to respondents.


Self-reporting may lead to biased reporting.

Data may provide a general picture but may lack depth.

May not provide adequate information on context.

Subject to sampling bias.

Interviews

Solicit person-to-person responses to pre-determined questions designed to obtain in-depth information about a person’s impressions or experiences, or to learn more about their answers to questionnaires or surveys.

Facilitates fuller coverage, range and depth of information of a topic.

Can be time consuming.

Can be difficult to analyse.

Can be costly.

Potential for interviewer to bias client's responses.

On-Site Observation

Entails use of a detailed observation form to record accurate information on-site about how a programme operates (ongoing activities, processes, discussions, social interactions and observable results as directly observed during the course of an initiative).

Can see operations of a programme as they are occurring.

Can adapt to events as they occur.

Can be difficult to categorize or interpret observed behaviours.

Can be expensive.

Subject to (site) selection bias.

Group Interviews

A small group (6 to 8 people) are interviewed together to explore in-depth stakeholder opinions, similar or divergent points of view, or judgements about a development initiative or policy, as well as information about their behaviours, understanding and perceptions of an initiative or to collect information around tangible and non-tangible changes resulting from an initiative.

Quick, reliable way to obtain common impressions from diverse stakeholders.

Efficient way to obtain a high degree of range and depth of information in a short time.

Can be hard to analyse responses.

Requires trained facilitator.

May be difficult to schedule.

Key Informants

 

Qualitative in-depth interviews, often one-on-one, with a wide-range of stakeholders who have first-hand knowledge about the initiative operations and context. These community experts can provide particular knowledge and understanding of problems and recommend solutions.

Can provide insight on the nature of problems and give recommendations for solutions.

Can provide different perspectives on a single issue or on several issues.

 

Subject to sampling bias.

Must have some means to verify or corroborate information.

 

Expert Panels

 

A peer review, or reference group, composed of external experts to provide in-put on technical or other substance topics covered by the evaluation.

Adds credibility.

Can serve as added (expert) source of information that can provide greater depth.

Can verify or substantiate information and results in topic area.

Cost of consultancy and related expenses if any.

Must ensure impartiality and that there are no conflicts of interest.

Case Studies

Involves comprehensive examination through cross comparison of cases to obtain in-depth information with the goal to fully understand the operational dynamics, activities, outputs, outcomes and interactions of a development project or programme.

Useful to fully explore factors that contribute to outputs and outcomes.

Requires considerable time and resources not usually available for commissioned evaluations.

Can be difficult to analyse.

Commissioning offices need to ensure that the methods and the instruments (questions, surveys, protocols, checklists) used to collect or record data are: consistent with quality standards of validity and reliability,54 culturally sensitive and appropriate for the populations concerned, and valid and appropriate for the types of information sought and the evaluation questions being answered. In conflict-affected settings, factors such as security concerns, lack of infrastructure, limited access to people with information and sensitivities and ethical considerations in terms of working with vulnerable people should be considered in determining appropriate data collection methods.

Issues of data quality

UNDP commissioning offices must ensure that the evaluation collects data that relates to evaluation purposes and employs data collection methodologies and procedures that are methodologically rigorous and defensible and produces empirically verified evidence that is valid, reliable and credible.

Reliability and validity are important aspects of quality in an evaluation. Reliability refers to consistency of measurement—for example, ensuring that a particular data collection instrument, such as a questionnaire, will elicit the same or similar response if administered under similar conditions. Validity refers to accuracy in measurement—for example, ensuring that a particular data collection instrument actually measures what

it was intended to measure. It also refers to the extent to which inferences or conclusions drawn from data are reasonable and justifiable. Credibility concerns the extent to which the evaluation evidence and the results are perceived to be valid, reliable and impartial by the stakeholders, particularly the users of evaluation results. There are three broad strategies to improve reliability and validity that a good evaluation should address:

  • Improve the quality of sampling
  • Improve the quality of data gathering
  • Use mixed methods of collecting data and building in strategies (for example, triangulating or multiple sources of data) to verify or cross-check data using several pieces of evidence rather than relying only on one
Improve sampling quality

UNDP evaluations often gather evidence from a sample of people or locations. If this sample is unrepresentative of a portion of the population, then wrong conclusions can be drawn about the population. For example, if a group interview only includes those from the city who can readily access the venue, the concerns and experiences of those in outlying areas may not be adequately documented. The sample must be selected on the basis of a rationale or purpose that is directly related to the evaluation purposes and is intended to ensure accuracy in the interpretation of findings and usefulness of evaluation results. Commissioning offices should ensure that the evaluation design makes clear the characteristics of the sample, how it will be selected, the rationale for the selection, and the limitations of the sample for interpreting evaluation results. If a sample is not used, the rationale for not sampling and the implications for the evaluation should be discussed.

Ensure consistency of data gathering

Whether using questionnaires, interview schedules, observation protocols or other data gathering tools, the evaluation team should test the data collection tools and make sure they gather evidence that is both accurate and consistent. Some ways of addressing this would be:

  • Train data collectors in using observation protocols to ensure they record observations in the same way as each other
  • Check the meaning of key words used in questionnaires and interview schedules, especially if they have been translated, to make sure respondents understand exactly what is being asked
  • Consider how the characteristics of interviewers (especially age, gender and whether they are known to the informants) might improve or reduce the accuracy of the information provided
‘Triangulate’ data to verify accuracy: Use multiple data sources

Good evaluation evidence is both consistent and accurate. Building in strategies to verify data will enhance the reliability and ensure valid results.

  • Use a mix of methods to collect data rather than relying on one source or one piece of evidence. For example, triangulate the evidence from once source (such as the group interview) with other evidence about the experiences of those in rural areas (this might be documentary evidence from reports or key informant interviews with people who are credible and well-informed about the situation).
  • Use experts to review and validate evidence.

The challenge for UNDP evaluations is to employ rigorous evaluation design methods that will produce useful information based on credible evidence that is defensible in the face of challenges to the accuracy of the evidence and the validity of the inferences made about the evidence.

Ethical considerations

Evaluations should be designed and conducted to respect and protect the rights and welfare of people and the communities of which they are members, in accordance with the UN Universal Declaration of Human Rights55 and other human rights conventions. Evaluators should respect the dignity and diversity of evaluation participants when planning, carrying out and reporting on evaluations, in part by using evaluation instruments appropriate to the cultural setting. Further, prospective evaluation participants should be treated as autonomous, be given the time and information to decide whether or not they wish to participate, and be able to make an independent decision without any pressure. Evaluation managers and evaluators should be aware of implications for doing evaluations in conflict zones. In particular, evaluators should know that the way they act, including explicit and implicit messages they transmit, may affect the situation and expose those with whom the evaluators interact to greater risks.56 When evaluators need to interview vulnerable groups, evaluators should make sure interviewees are aware of the potential implications of their participation in the evaluation exercise and that they are given sufficient information to make a decision about their participation. All evaluators commissioned by UNDP programme units should agree and sign the Code of Conduct for Evaluators in the UN System.57 For more information on ethics in evaluation, please refer to the ‘UNEG Ethical Guidelines for Evaluation’.58

Box 41. Human rights and gender equality perspective in evaluation design
Evaluations in UNDP are guided by the principles of human rights and gender equality. This has implications for evaluation design and conduct, and requires shared understanding of these principles and explicit attention on the part of evaluators, evaluation managers and evaluation stakeholders. For example, in collecting data, evaluators need to ensure that women and disadvantaged groups are adequately represented. In order to make excluded or disadvantaged groups visible, data should be disaggregated by gender, age, disability, ethnicity, caste, wealth and other relevantdifferences where possible.

Further, data should be analysed whenever possible through multiple lenses, including sex, socio-economic grouping, ethnicity and disability. Marginalized groups are often subject to multiple forms of discrimination, and it is important to understand how these different forms of discrimination intersect to deny rights holders their rights.

 
Analysis and synthesis of data

Data collection involves administering questionnaires, conducting interviews, observing programme operations, and reviewing or entering data from existing data sources. Data analysis is a systematic process that involves organizing and classifying the information collected, tabulating it, summarizing it, and comparing the results with other appropriate information to extract useful information that responds to the evaluation questions and fulfills the purposes of the evaluation. It is the process of deciphering facts from a body of evidence by systematically coding and collating the data collected, ensuring its accuracy, conducting any statistical analyses, and translating the data into usable formats or units of analysis related to each evaluation question.

Data analysis seeks to detect patterns in evidence, either by isolating important findings (analysis) or by combining sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each element of evidence and a synthesis of all sources in order to examine patterns of agreement, convergence or complexity.

Analysis plan

Data analysis and synthesis must proceed from an analysis plan that should be built into the evaluation design and work plan detailed in the inception report. The analysis plan is an essential evaluation tool that maps how the information collected will be organized, classified, inter-related, compared and displayed relative to the evaluation questions, including what will be done to integrate multiple sources, especially those that provide data in narrative form, and any statistical methods that will be used to integrate or present the data (calculations, sums, percentages, cost-analysis and so forth). Possible challenges and limitations of the data analysis should be described. The analysis plan should be written in conjunction with selecting data collection methods and instruments rather than afterward.

Interpreting the findings

Interpreting findings is the process of giving meaning to the evaluation findings derived from the analysis. It extracts from the summation and synthesis of information derived from facts, statements, opinions, and documents and turns findings from the data into judgments about development results (conclusions). On the basis of those conclusions, recommendations for future actions will be made. Interpretation is the effort of figuring out what the findings mean—making sense of the evidence gathered in an evaluation and its practical applications toward development effectiveness.

Drawing conclusions

A conclusion is a reasoned judgement based on a synthesis of empirical findings or factual statements corresponding to specific circumstances. Conclusions are not findings; they are interpretations that give meaning to the findings. Conclusions are considered valid and credible when they are directly linked to the evidence and can be justified on the basis of appropriate methods of analysis and synthesis to summarize findings. Conclusions should:

  • Consider alternative ways to compare results (for example, compared with programme objectives, a comparison group, national norms, past performance or needs)
  • Generate alternative explanations for findings and indicate why these explanations should be discounted
  • Form the basis for recommending actions or decisions that are consistent with the conclusions
  • Be limited to situations, time periods, persons, contexts and purposes for which the findings are applicable59
Making recommendations

Recommendations are evidence-based proposals for action aimed at evaluation users. Recommendations should be based on conclusions. However, forming recommendations is a distinct element of evaluation that requires information beyond what is necessary to form conclusions. Developing recommendations involves weighing effective alternatives, policy, funding priorities and so forth within a broader context. It requires in-depth contextual knowledge, particularly about the organizational context within which policy and programmatic decisions will be made and the political, social and economic context in which the initiative will operate.
Recommendations should be formulated in a way that will facilitate the development of a management response (see Chapter 6 and Annex 6 on Management Response System). Recommendations must be realistic and reflect an understanding of the commissioning organization and potential constraints to follow-up. Each recommendation should clearly identify its target group and stipulate the recommended action and rationale.

Lessons learned

The lessons learned from an evaluation comprise the new knowledge gained from the particular circumstance (initiative, context outcomes and even evaluation methods) that is applicable to and useful in other similar contexts. Frequently, lessons highlight strengths or weaknesses in preparation, design and implementation that affect performance, outcome and impact.