AI-Powered Evaluation Report Framework Mapping and Synthesisa
What is Evaluatr?
Evaluatr is an AI-powered system that automates mapping evaluation reports against structured frameworks while maintaining interpretability and human oversight. Initially developed for IOM (International Organization for Migration) evaluation reports and the Strategic Results Framework (SRF), it transforms a traditionally manual, time-intensive process into an efficient, transparent workflow.
The system maps evaluation reports against hierarchical frameworks like the SRF (objectives, enablers, cross-cutting priorities, outcomes, outputs, indicators) and connects to broader frameworks like the Sustainable Development Goals (SDGs) for interoperability.
Beyond automation, Evaluatr prioritizes interpretability and human-AI collaboration—enabling evaluators to understand the mapping process, audit AI decisions, perform error analysis, and build training datasets over time, ensuring the system aligns with organizational needs through actionable, transparent, auditable methodology.
The Challenge We Solve
IOM evaluators possess deep expertise in mapping evaluation reports against frameworks like the Strategic Results Framework (SRF), but face significant operational challenges when processing reports that often exceed 150 pages of diverse content across multiple projects and contexts.
The core challenges are:
Time-intensive process: Hundreds of staff-hours required per comprehensive mapping exercise
Individual consistency: Even expert evaluators may categorize the same content differently across sessions
Cross-evaluator consistency: Different evaluators may interpret and map identical content to different framework outputs
Scale vs. thoroughness: Growing volume of evaluation reports creates pressure to choose between speed and comprehensive analysis
Understanding Evaluation Mapping in the UN Context
UN evaluation work encompasses several interconnected domains:
Quality Check: Assessing evidence quality and methodological rigor in evaluation reports
Mapping/Tagging: Identifying which standardized framework themes are central to each report
Impact Evaluation: Measuring program effectiveness using RCTs, quasi-experimental designs, etc.
Synthesis: Aggregating findings across reports on specific themes/regions to generate insights
Mapping/tagging is a foundational step that identifies which themes from established evaluation frameworks (like IOM’s Strategic Results Framework or the UN Global Compact for Migration) are central to each report. These frameworks provide agreed-upon nomenclature covering all relevant themes, ensuring common terminology across stakeholders and enabling interoperability for UN-wide aggregation and communication.
Rather than extracting evidence for specific themes, mapping creates a curated index enabling evaluators to retrieve the most relevant reports for subsequent synthesis work, maximizing both precision (finding all relevant reports) and recall (avoiding irrelevant ones).
Note
Throughout this documentation, we use “mapping” and “tagging” interchangeably.
Key Features
1. Document Preparation Pipeline ✅ Available
Repository Processing: Read and preprocess IOM evaluation report repositories with standardized outputs
Automated Downloads: Batch download of evaluation documents from diverse sources
OCR Processing: Convert scanned PDFs to searchable text using Optical Character Recognition (OCR) technology
Content Enrichment: Fix OCR-corrupted headings and enrich documents with AI-generated image descriptions for high-quality input data
2. AI-Assisted Framework Mapping ✅ Available
Multi-Stage Pipeline: Three-stage mapping process that progressively narrows from broad themes ( SRF Enablers, Cross-cutting Priorities, GCM objectives) to specific SRF outputs. Each stage enriches context for the next—for example, knowing a report is cross-cutting in nature helps accurately map specific SRF outputs
Cost Optimization: Leverages LLM prompt caching to minimize token usage and API costs during repeated analysis
# Clone the repositorygit clone https://github.com/franckalbinet/evaluatr.gitcd evaluatr# Install in development modepip install -e .# Make changes in nbs/ directory, then compile:nbdev_prepare
Note
This project uses nbdev for literate programming - see the Development section for more details.
Environment Configuration
Create a .env file in your project root with your API keys:
Note: Evaluatr uses lisette, LiteLLM and DSPy for LLM interactions, giving you flexibility to use any compatible language model provider beyond the examples above.
Quick Start
Reading & standardizing evaluations repository
For IOM evaluators working with the official evaluation repository, download the most recent evaluations from evaluation.iom.int/evaluation-search-pdf as .csv file, then preprocess/standardize it:
from evaluatr.readers import IOMRepoReaderfname ='files/test/evaluation-search-export-11_13_2025--18_09_44.csv'reader = IOMRepoReader(fname)evals = reader()evals[0]
EVALUATION OF IOM’S MIGRATION DATA STRATEGY
Year: 2025 | Organization: IOM | Countries: Worldwide
Documents: 2 available ID:9992310969aa2f428bc8aba29f865cf3
To find a particular evaluation by title or url:
from evaluatr.readers import find_evaltitle ='Evaluation of IOM Accountability to Affected Populations'find_eval(evals, title, by='title')
Evaluation of IOM Accountability to Affected Populations
Year: 2025 | Organization: IOM | Countries: Worldwide
Documents: 4 available ID:6c3c2cf3fa479112967612b0baddab72
--stages: Comma-separated stages to run (default: 1,2,3)
Stage 1: SRF Enablers & Cross-cutting Priorities
Stage 2: GCM Objectives
Stage 3: SRF Outputs
--force-refresh: Force refresh specific stages (comma-separated: sections,stage1,stage2,stage3)
Examples:
# Run all stagesevl_tag example-report# Run specific stages onlyevl_tag example-report --stages 1,2# Force refresh certain stagesevl_tag example-report --force-refresh stage1,stage3# Combined optionsevl_tag example-report --stages 2,3 --force-refresh sections
Output: Results stored in ~/.evaluatr/traces/ with complete audit trails of AI decisions.