Mappr

Scale up evaluation report mapping against evaluation frameworks using agentic workflows

Warning

This notebook is a work in progress.

Manually mapping evaluation reports against IOM’s Strategic Results Framework (SRF) is time-consuming and resource-intensive with ~150 outputs to analyze. Additionally, the mapping process needs transparent and human-readable traces of LLM decision flows that both reflect natural reasoning patterns and allow human evaluators to audit the mapping logic.

A three-stage async pipeline leveraging Global Compact for Migration (GCM) UN General Assembly Resolution objectives as SRF Outputs pruning mechanism:

Stage 1: SRF Enablers & Cross-cutting Analysis

Async parallel analysis of Enablers (7 categories) and Cross-cutting Priorities (4 categories) using shared semaphore for rate limiting
Purpose: Identify if report is primarily meta-evaluation/transversal in nature
Fast processing: ~11 items total with concurrent execution, provides context for subsequent stages

Stage 2: Informed GCM Analysis

Rate-limited parallel processing of GCM Objectives (23 items) informed by Stage 1 results
Condensed representations: UN General Assembly Resolution formulation simplified for retrieval efficiency
Concurrent theme analysis with API quota management

Stage 3: Targeted SRF Analysis

SRF Filtering: Use GCM results + gcm_srf_lut lookup table to prune ~150 SRF outputs to ~20-50 relevant ones
Deep parallel analysis: Full hierarchy context (objective → outcome → output → indicators)
Async batch processing: Final targeted analysis of pruned SRF outputs with retry logic and error handling

Exported source

from pathlib import Path
from functools import reduce
from toolslm.md_hier import *
from rich import print
import json
from fastcore.all import *
from enum import Enum
import logging
import uuid
from datetime import datetime
from typing import List, Callable
import dspy
from asyncio import Semaphore, gather, sleep
import time
from collections import defaultdict
import copy

from evaluatr.frameworks import (EvalData, 
                                 IOMEvalData, 
                                 FrameworkInfo, 
                                 Framework,
                                 FrameworkCat,
                                 find_srf_output_by_id)

#from evaluatr.db_traces import TraceDB, Trace
from fastlite import database

Exported source

from dotenv import load_dotenv
import os

load_dotenv()
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')

Exported source

cfg = AttrDict({
    'lm': 'gemini/gemini-2.0-flash',
    'api_key': GEMINI_API_KEY,
    'max_tokens': 8192,
    'track_usage': False,
    'call_delay': 0.1, # in seconds
    'semaphore': 30,
    'dirs': AttrDict({
        'data': '.evaluatr',
        'trace': 'traces'
    }),
    'verbosity': 1,
    'cache': AttrDict({
        'is_active': False,
        'delay': 0.05 # threshold in seconds below which we consider the response is cached
    }),
    'max_iter': 5
})

Exported source

traces_dir = Path.home() / cfg.dirs.data / cfg.dirs.trace
traces_dir.mkdir(parents=True, exist_ok=True)

class Trace:
    id: int
    timestamp: str
    event: str
    report_id: str
    stage: str
    framework: str
    framework_category: str
    framework_theme_id: str
    data: dict

class TraceDB:
    def __init__(self, db_path=None):
        if db_path is None:
            # db_path = Path.home() / cfg.dirs.data / "traces.db"
            db_path = Path.home() / ".evaluatr/data/traces.db"
            
        self.db = database(db_path)
        self.traces = self.db.create(Trace, pk='id', transform=True)

db_traces = TraceDB()

The Kernel crashed while executing code in the current cell or a previous cell. 

Please review the code in the cell(s) to identify a possible cause of the failure. 

Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. 

View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.

Exported source

lm = dspy.LM(cfg.lm, api_key=cfg.api_key, cache=cfg.cache.is_active)
dspy.configure(lm=lm)

doc = Path("../_data/md_library/49d2fba781b6a7c0d94577479636ee6f/abridged_evaluation_report_final_olta_ndoja_pdf/enriched")
pages = doc.ls(file_exts=".md").sorted(key=lambda p: int(p.stem.split('_')[1]))
report = '\n\n---\n\n'.join(page.read_text() for page in pages)
print(report[:1000])

# PPMi .... page 1

**Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa**

Final Evaluation Report, 17 March 2023

!(img-0.jpeg)

**EU-IOM** Joint Initiative for Migrant Protection and Reintegration

Project funded by the European Union
Project implemented by IOM

---

This Final Evaluation Report was commissioned by IOM and developed by the evaluation team of PPMI Group, including:
Loes van der Graaf, Rimantas Dumcius, Radvilė Bankauskaitė, Anna Kiss-Pal and Laura Daukšaitė, as well as by 
external expert Anthony Roger Plant. The evaluation team is grateful to all IOM staff and stakeholders to the 
JI-HoA for their time taken to participate in interviews. The team is especially grateful to the returnees, 
migrants, and community members who participated in Focus Group Discussions.

This publication was funded by the European Union. Its contents are the sole responsibility of PPMI Group and do 
not necessarily reflect

Hierarchical report navigation

Thanks to toolslm.md_hier and a clean markdown structure of a report markdown, we can create a nested dictionary of section, subsection, … as follows:

hdgs = create_heading_dict(report); hdgs

{'PPMi .... page 1': {},
 'CONTENTS .... page 3': {},
 '1. Introduction .... page 4': {},
 '2. Background of the JI-HoA .... page 5': {'2.1. Context and design of the JI-HoA .... page 5': {},
  '2.2. External factors affecting the implementation of the JI .... page 7': {}},
 '3. Methodology .... page 8': {},
 '4. Findings .... page 10': {'4.1. Relevance .... page 10': {'4.1.1. Relevance of programme activities for migrants, returnees, and communities .... page 10': {}},
  'Overall performance score for relevance: $3.9 / 5$ <br> Robustness score for the evidence: $4.5 / 5$': {'4.1.1.1 Needs of migrants .... page 10': {},
   '4.1.1.2 Needs of returnees .... page 10': {},
   '4.1.1.3 Needs of community members .... page 12': {},
   "4.1.2. Programme's relevance to the needs of stakeholders .... page 12": {'4.1.2.1 Needs of governments .... page 12': {},
    '4.1.2.2 Needs of other stakeholders .... page 13': {}},
   '4.2. Coherence .... page 13': {"4.2.1. The JI-HoA's alignment with the objectives and standards of IOM, and objectives of the EU .... page 14": {},
    '4.2.2. Alignment with other initiatives .... page 14': {}}}},
 '4.3. Effectiveness .... page 16': {'4.3.1. Specific Objective 1: Partner countries and relevant stakeholders developed or strengthened evidence-based return and reintegration procedures .... page 16': {'4.3.1.1 Achievement of outputs and results .... page 16': {}},
  'Data availability .... page 16': {'4.3.1.2 Achievement of Specific Objective 1 .... page 17': {'4.3.1.2 Achievement of Specific Objective 1 .... page 17': {}}}},
 '4.3.2. Specific Objective 2: Safe, humane, dignified voluntary return processes are enhanced along main migration routes .... page 18': {'4.3.2.1 Achievement of outputs and results .... page 19': {},
  'Outreach and awareness .... page 19': {},
  'Assistance to stranded migrants .... page 19': {'4.3.2.2 Achievement of the Objective .... page 20': {},
   '4.3.3. Specific Objective 3: Returnees are sustainably integrated in host communities, and host communities are better able to create living standards that address drivers of migration. .... page 20': {'4.3.3.1 Achievement of outputs and results .... page 20': {}}},
  'Individual and community-based reintegration .... page 20': {'4.3.3.2 Achievement of Specific Objective 3 .... page 21': {}}},
 '4.3.3.2 Achievement of Specific Objective 3': {'Overall achievement of reintegration .... page 22': {'4.3.4. Functioning of the Integrated Approach .... page 23': {'4.3.4. Functioning of the Integrated Approach .... page 23': {}}}},
 '4.4. Efficiency .... page 24': {'4.4.3. Did the programme receive sufficient resources to achieve its objectives? .... page 24': {},
  '4.4.2. Cost-effectiveness and efficiency of the programme .... page 25': {}},
 '4.5. Sustainability .... page 26': {'Overall performance score for sustainability: $2.5 / 5$ <br> Robustness score for the evidence: $4 / 5$ .... page 26': {}},
 '5. Conclusions and Recommendations .... page 27': {'5.1. Conclusions .... page 27': {},
  '5.2. Recommendations .... page 28': {'5.2.1. Increase attention on building partnerships with service providers who can function without (significant) funding channelled by IOM. .... page 29': {},
   '5.2.2. Explore opportunities to extend the scope of support provided to returnees, with a focus on longer-term integration. .... page 30': {}}}}

	Type	Details
hdgs	dict	The nested dictionary structure
target_section	str	The section name to find
Returns	list	The nested key path for the given section name

	Type	Details
hdgs	dict	The nested dictionary structure
keys_list	list	The list of keys to navigate through
Returns	str	The content of the section

	Type	Details
theme	EvalData	The theme object
Returns	str	The formatted theme string

	Type	Details
theme	EvalData	The theme object
Returns	str	The formatted theme string

Hierarchical report navigation

find_section_path

get_content_tool

Formatters

format_enabler_theme

format_crosscutting_theme

format_gcm_theme

format_srf_output

Signatures

Overview

Exploration

Assessment

Phase

TraceContext

Synthesis

Reasoning & Acting (ReAct)

setup_logger

setup_trace_logging

ThemeAnalyzer

ThemeAnalyzer.aforward

ThemeAnalyzer.get_overview

ThemeAnalyzer.explore_iteratively

ThemeAnalyzer.make_exploration_decision

ThemeAnalyzer.should_stop_exploring

ThemeAnalyzer.process_section

ThemeAnalyzer.synthesize_findings

Single theme

Multiple themes in parallel

Pipeline Orchestrator

PipelineResults

PipelineResults.__call__

PipelineOrchestrator

PipelineOrchestrator.run_stage1

get_stage1_covered_context

PipelineOrchestrator.run_stage2

get_filtered_srf_output_ids

get_combined_context

PipelineOrchestrator.run_stage3

PipelineResults.call