7.5 Evaluation methodology
The evaluation design must detail a step-by-step plan of work that specifies the methods the evaluation will use to collect the information needed to address the evaluation criteria and answer the evaluation questions, analyse the data, interpret the findings and report the results.
Evaluation methods should be selected for their rigour in producing empirically based evidence to address the evaluation criteria and respond to the evaluation questions. The evaluation inception report should contain an evaluation matrixthat displays for each of the evaluation criteria, the questions and sub-questions that the evaluation will answer, and for each question, the data that will be collected to inform that question and the methods that will be used to collect that data (see Box 40). In addition, the inception report should make explicit the underlying theory or assumptions about how each data element will contribute to understanding the development results—attribution, contribution, process, implementation and so forth—and the rationale for data collection, analysis and reporting methodologies selected.
Box 40. Questions for evaluators
The commissioning office should, at a minimum, ensure that the evaluation methods detailed in the evaluators’ inception report respond to each of the following questions:
The data to be collected and the methods for collecting the data will be determined by: the evidence needed to address the evaluation questions; the analyses that will be used to translate the data into meaningful findings in response to the evaluation questions; and judgements about what data are feasible to collect given constraints of time and resources. UNDP evaluations draw heavily on data (performance indicators) generated through monitoring during the programme or project implementation cycle. Performance indicators are a simple and reliable means to document changes in development conditions (outcomes), production, or delivery of products and services (outputs) connected to a development initiative (see Chapter 2).
Performance indicators are useful but have limitations. Indicators only indicate; they do not explain. Indicators will not likely address the full range of questions the evaluation seeks to address. For example, indicators provide a measure of what progress has been made. They do not explain why that progress was made or what factors contributed to the progress. UNDP evaluations generally make use of a mix of other data sources, collected through multiple methods, to give meaning to what performance indicators tell us about the initiative.
Primary data consists of information evaluators observe or collect directly from stakeholders about their first-hand experience with the initiative. These data generally consist of the reported or observed values, beliefs, attitudes, opinions, behaviours, motivations and knowledge of stakeholders, generally obtained through questionnaires, surveys, interviews, focus groups, key informants, expert panels, direct observation and case studies. These methods allow for more in-depth exploration and yield information that can facilitate deeper understanding of observed changes in outcomes and outputs (both intended and unintended) and the factors that contributed by filling out the operational context for outputs and outcomes.
Secondary data is primary data that was collected, compiled and published by someone else. Secondary data can take many forms but usually consists of documentary evidence that has direct relevance for the purposes of the evaluation. Sources of documentary evidence include: local, regional or national demographic data; nationally and internationally published reports; social, health and economic indicators; project or programme plans; monitoring reports; previous reviews, evaluations and other records; country strategic plans; and research reports that may have relevance for the evaluation. Documentary evidence is particularly useful when the project or programme lacks baseline indicators and targets for assessing progress toward outputs and outcome measures. Although not a preferred method, secondary data can be used to help recreate baseline data and targets. Secondary information complements and supplements data collected by primary methods but does not replace collecting data from primary sources.
Given the nature and context of UNDP evaluations at the decentralized level, including limitations of time and resources, evaluators are often likely to use a mix of methods, including performance indicators, supplemented relevant documentary evidence from secondary sources, and qualitative data collected by a variety of means.
Table 28 presents brief descriptions of data collection methods that are most commonly applied in evaluations in UNDP for both project and outcome evaluations.
Table 28. Summary of common data collection methods used in UNDP evaluations53
Commissioning offices need to ensure that the methods and the instruments (questions, surveys, protocols, checklists) used to collect or record data are: consistent with quality standards of validity and reliability,54 culturally sensitive and appropriate for the populations concerned, and valid and appropriate for the types of information sought and the evaluation questions being answered. In conflict-affected settings, factors such as security concerns, lack of infrastructure, limited access to people with information and sensitivities and ethical considerations in terms of working with vulnerable people should be considered in determining appropriate data collection methods.
Issues of data quality
UNDP commissioning offices must ensure that the evaluation collects data that relates to evaluation purposes and employs data collection methodologies and procedures that are methodologically rigorous and defensible and produces empirically verified evidence that is valid, reliable and credible.Reliability and validity are important aspects of quality in an evaluation. Reliability refers to consistency of measurement—for example, ensuring that a particular data collection instrument, such as a questionnaire, will elicit the same or similar response if administered under similar conditions. Validity refers to accuracy in measurement—for example, ensuring that a particular data collection instrument actually measures what
it was intended to measure. It also refers to the extent to which inferences or conclusions drawn from data are reasonable and justifiable. Credibility concerns the extent to which the evaluation evidence and the results are perceived to be valid, reliable and impartial by the stakeholders, particularly the users of evaluation results. There are three broad strategies to improve reliability and validity that a good evaluation should address:
Improve sampling quality
UNDP evaluations often gather evidence from a sample of people or locations. If this sample is unrepresentative of a portion of the population, then wrong conclusions can be drawn about the population. For example, if a group interview only includes those from the city who can readily access the venue, the concerns and experiences of those in outlying areas may not be adequately documented. The sample must be selected on the basis of a rationale or purpose that is directly related to the evaluation purposes and is intended to ensure accuracy in the interpretation of findings and usefulness of evaluation results. Commissioning offices should ensure that the evaluation design makes clear the characteristics of the sample, how it will be selected, the rationale for the selection, and the limitations of the sample for interpreting evaluation results. If a sample is not used, the rationale for not sampling and the implications for the evaluation should be discussed.
Ensure consistency of data gathering
Whether using questionnaires, interview schedules, observation protocols or other data gathering tools, the evaluation team should test the data collection tools and make sure they gather evidence that is both accurate and consistent. Some ways of addressing this would be:
‘Triangulate’ data to verify accuracy: Use multiple data sources
Good evaluation evidence is both consistent and accurate. Building in strategies to verify data will enhance the reliability and ensure valid results.
The challenge for UNDP evaluations is to employ rigorous evaluation design methods that will produce useful information based on credible evidence that is defensible in the face of challenges to the accuracy of the evidence and the validity of the inferences made about the evidence.
Evaluations should be designed and conducted to respect and protect the rights and welfare of people and the communities of which they are members, in accordance with the UN Universal Declaration of Human Rights55 and other human rights conventions. Evaluators should respect the dignity and diversity of evaluation participants when planning, carrying out and reporting on evaluations, in part by using evaluation instruments appropriate to the cultural setting. Further, prospective evaluation participants should be treated as autonomous, be given the time and information to decide whether or not they wish to participate, and be able to make an independent decision without any pressure. Evaluation managers and evaluators should be aware of implications for doing evaluations in conflict zones. In particular, evaluators should know that the way they act, including explicit and implicit messages they transmit, may affect the situation and expose those with whom the evaluators interact to greater risks.56 When evaluators need to interview vulnerable groups, evaluators should make sure interviewees are aware of the potential implications of their participation in the evaluation exercise and that they are given sufficient information to make a decision about their participation. All evaluators commissioned by UNDP programme units should agree and sign the Code of Conduct for Evaluators in the UN System.57 For more information on ethics in evaluation, please refer to the ‘UNEG Ethical Guidelines for Evaluation’.58
Box 41. Human rights and gender equality perspective in evaluation design
Evaluations in UNDP are guided by the principles of human rights and gender equality. This has implications for evaluation design and conduct, and requires shared understanding of these principles and explicit attention on the part of evaluators, evaluation managers and evaluation stakeholders. For example, in collecting data, evaluators need to ensure that women and disadvantaged groups are adequately represented. In order to make excluded or disadvantaged groups visible, data should be disaggregated by gender, age, disability, ethnicity, caste, wealth and other relevantdifferences where possible.
Further, data should be analysed whenever possible through multiple lenses, including sex, socio-economic grouping, ethnicity and disability. Marginalized groups are often subject to multiple forms of discrimination, and it is important to understand how these different forms of discrimination intersect to deny rights holders their rights.
Analysis and synthesis of data
Data collection involves administering questionnaires, conducting interviews, observing programme operations, and reviewing or entering data from existing data sources. Data analysis is a systematic process that involves organizing and classifying the information collected, tabulating it, summarizing it, and comparing the results with other appropriate information to extract useful information that responds to the evaluation questions and fulfills the purposes of the evaluation. It is the process of deciphering facts from a body of evidence by systematically coding and collating the data collected, ensuring its accuracy, conducting any statistical analyses, and translating the data into usable formats or units of analysis related to each evaluation question.
Data analysis seeks to detect patterns in evidence, either by isolating important findings (analysis) or by combining sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each element of evidence and a synthesis of all sources in order to examine patterns of agreement, convergence or complexity.
Data analysis and synthesis must proceed from an analysis plan that should be built into the evaluation design and work plan detailed in the inception report. The analysis plan is an essential evaluation tool that maps how the information collected will be organized, classified, inter-related, compared and displayed relative to the evaluation questions, including what will be done to integrate multiple sources, especially those that provide data in narrative form, and any statistical methods that will be used to integrate or present the data (calculations, sums, percentages, cost-analysis and so forth). Possible challenges and limitations of the data analysis should be described. The analysis plan should be written in conjunction with selecting data collection methods and instruments rather than afterward.
Interpreting the findings
Interpreting findings is the process of giving meaning to the evaluation findings derived from the analysis. It extracts from the summation and synthesis of information derived from facts, statements, opinions, and documents and turns findings from the data into judgments about development results (conclusions). On the basis of those conclusions, recommendations for future actions will be made. Interpretation is the effort of figuring out what the findings mean—making sense of the evidence gathered in an evaluation and its practical applications toward development effectiveness.
A conclusion is a reasoned judgement based on a synthesis of empirical findings or factual statements corresponding to specific circumstances. Conclusions are not findings; they are interpretations that give meaning to the findings. Conclusions are considered valid and credible when they are directly linked to the evidence and can be justified on the basis of appropriate methods of analysis and synthesis to summarize findings. Conclusions should:
Recommendations are evidence-based proposals for action aimed at evaluation users. Recommendations should be based on conclusions. However, forming recommendations is a distinct element of evaluation that requires information beyond what is necessary to form conclusions. Developing recommendations involves weighing effective alternatives, policy, funding priorities and so forth within a broader context. It requires in-depth contextual knowledge, particularly about the organizational context within which policy and programmatic decisions will be made and the political, social and economic context in which the initiative will operate.
The lessons learned from an evaluation comprise the new knowledge gained from the particular circumstance (initiative, context outcomes and even evaluation methods) that is applicable to and useful in other similar contexts. Frequently, lessons highlight strengths or weaknesses in preparation, design and implementation that affect performance, outcome and impact.
55. United Nations, ‘Universal Declaration of Human Rights’. Available at: http://www.un.org/en/documents/udhr/.