Exported source
class CoreSectionsOutput(BaseModel):
"Identify the core sections of the report"
section_paths: list[list[str]]
reasoning: strThis module provides tools for automatically identifying and extracting core sections from evaluation reports. When working with large reports (50-200+ pages), we need to focus on key sections—such as executive summaries, introductions, conclusions, and recommendations—to support tagging and mapping exercises against evaluation frameworks (e.g., SRF, GCM).
Focusing on these sections helps:
The approach uses an LLM to parse a report’s table of contents, identify which sections contain substantive thematic content, and extract just those sections for further processing.
Identify the core sections of the report
For instance, given a markdown:
img page_16.md page_23.md page_30.md page_38.md page_45.md
page_1.md page_17.md page_24.md page_31.md page_39.md page_46.md
page_10.md page_18.md page_25.md page_32.md page_4.md page_47.md
page_11.md page_19.md page_26.md page_33.md page_40.md page_5.md
page_12.md page_2.md page_27.md page_34.md page_41.md page_6.md
page_13.md page_20.md page_28.md page_35.md page_42.md page_7.md
page_14.md page_21.md page_29.md page_36.md page_43.md page_8.md
page_15.md page_22.md page_3.md page_37.md page_44.md page_9.md
sample_md = """# Report Title ... page 1
## Executive Summary ... page 1
This is a summary of key findings.
## 1. Introduction ... page 2
Background information here.
### 1.1 Objectives ... page 2
The objectives are...
## 2. Findings ... page 3
Detailed findings.
## 3. Conclusions ... page 5
Main conclusions.
## 4. Recommendations ... page 6
Key recommendations."""{'Final External Evaluation ... page 1': {'Evaluation scope ... page 3': {},
'Evaluation criteria ... page 3': {},
'Evaluation questions ... page 3': {},
'Evaluation methodology ... page 5': {},
'Ethics, norms and standards for evaluation ... page 5': {},
'Hired Evaluator must abide with the following. ... page 6': {},
'Evaluation deliverables ... page 6': {},
'Specifications of roles ... page 6': {},
'Time schedule ... page 7': {},
'Qualifications of the Evaluator ... page 8': {},
'Submission of application ... page 8': {},
'REGIONAL INTERVIEWS ... page 14': {},
'INTRODUCTION ... page 14': {},
'Coherence ... page 14': {},
'Effectiveness ... page 14': {},
'Impact ... page 16': {},
'Sustainability ... page 16': {},
'Closure ... page 16': {},
'NATIONAL INTERVIEWS ... page 16': {},
'INTRODUCTION ... page 16': {},
'Coherence ... page 17': {},
'Effectiveness ... page 17': {},
'Efficiency ... page 17': {'3.2.2. External alignment with efforts and organizations outside IOM ... page 20': {},
'3.3. Effectiveness ... page 21': {},
'3.3.1. Specific Objective 1: National and regional authorities in the field of migration governance are aware of, and act in accordance with, international and regional frameworks for migration governance and human rights standards. ... page 21': {'3.3.2. Specific Objective 2: The quality of national and cross-border cooperation on trafficking and smuggling cases between law enforcement, judicial and other state and nonstate actors, in coordination with existing regional initiatives and in accordance with international obligations and standards, is increased ... page 26': {},
'3.3.3. Specific Objective 3: Protection services for Victims of Trafficking and of vulnerable migrants are improved at local, national, and regional levels ... page 28': {},
"3.3.4. Factors influencing the programme's effectiveness ... page 32": {'Security challenges ... page 32': {},
'Political context and changing government priorities ... page 32': {},
'Procurement challenges ... page 33': {}},
'3.3.5. Integration of cross-cutting themes ... page 35': {}},
'3.4. Efficiency ... page 35': {},
"3.4.1. Financial efficiency of IOM's BMM programme ... page 35": {'3.4.2. Efficiency of coordination and reporting ... page 37': {}},
'3.5. IMPACT ... page 39': {},
'3.6. Sustainability ... page 40': {}},
'4. CONCLUSIONS ... page 42': {},
'5. RECOMMENDATIONS ... page 44': {}}}
['# Final External Evaluation ... page 1',
'## Evaluation scope ... page 3',
'## Evaluation criteria ... page 3',
'## Evaluation questions ... page 3',
'## Evaluation methodology ... page 5',
'## Ethics, norms and standards for evaluation ... page 5',
'## Hired Evaluator must abide with the following. ... page 6',
'## Evaluation deliverables ... page 6',
'## Specifications of roles ... page 6',
'## Time schedule ... page 7',
'## Qualifications of the Evaluator ... page 8',
'## Submission of application ... page 8',
'## REGIONAL INTERVIEWS ... page 14',
'## INTRODUCTION ... page 14',
'## Coherence ... page 14',
'## Effectiveness ... page 14',
'## Impact ... page 16',
'## Sustainability ... page 16',
'## Closure ... page 16',
'## NATIONAL INTERVIEWS ... page 16',
'## INTRODUCTION ... page 16',
'## Coherence ... page 17',
'## Effectiveness ... page 17',
'## Efficiency ... page 17',
'#### 3.2.2. External alignment with efforts and organizations outside IOM ... page 20',
'### 3.3. Effectiveness ... page 21',
'### 3.3.1. Specific Objective 1: National and regional authorities in the field of migration governance are aware of, and act in accordance with, international and regional frameworks for migration governance and human rights standards. ... page 21',
'#### 3.3.2. Specific Objective 2: The quality of national and cross-border cooperation on trafficking and smuggling cases between law enforcement, judicial and other state and nonstate actors, in coordination with existing regional initiatives and in accordance with international obligations and standards, is increased ... page 26',
'#### 3.3.3. Specific Objective 3: Protection services for Victims of Trafficking and of vulnerable migrants are improved at local, national, and regional levels ... page 28',
"#### 3.3.4. Factors influencing the programme's effectiveness ... page 32",
'##### Security challenges ... page 32',
'##### Political context and changing government priorities ... page 32',
'##### Procurement challenges ... page 33',
'#### 3.3.5. Integration of cross-cutting themes ... page 35',
'### 3.4. Efficiency ... page 35',
"### 3.4.1. Financial efficiency of IOM's BMM programme ... page 35",
'#### 3.4.2. Efficiency of coordination and reporting ... page 37',
'### 3.5. IMPACT ... page 39',
'### 3.6. Sustainability ... page 40',
'## 4. CONCLUSIONS ... page 42',
'## 5. RECOMMENDATIONS ... page 44']
Reports have hierarchical structure (sections, subsections, etc.). We represent this as a nested dictionary using create_heading_dict from toolslm.md_hier. To extract text from a specific section, we need to navigate through this hierarchy using a path of keys.
Navigate through nested heading levels and return the text content
When the LLM identifies core sections, it might select both a parent section and its children (e.g., “Introduction” and “Introduction > Objectives”). To avoid duplicate content, we filter out any paths that are children of other selected paths.
Remove paths that are children of other paths in the list
[['Report Title ... page 1', '1. Introduction ... page 2'],
['Report Title ... page 1', '3. Conclusions ... page 5']]
Rather than using rigid pattern matching, we use an LLM to intelligently identify core sections. This handles multilingual reports, varied naming conventions, and unusual structures. The LLM receives the table of contents as a nested dictionary and returns paths to the most relevant sections.
def identify_core_sections(
hdgs:dict, # Nested dictionary of report headings from `create_heading_dict`
sp:str=None, # System prompt for section identification
response_format:type=CoreSectionsOutput, # Pydantic model for structured output
model:str='claude-sonnet-4-5', # LLM model to use for identification
)->dict: # Dictionary with 'section_paths' and 'reasoning' keys
Use LLM to identify core sections (exec summary, intro, conclusions, recommendations) from ToC
{'section_paths': [['Report Title ... page 1', 'Executive Summary ... page 1'],
['Report Title ... page 1',
'1. Introduction ... page 2',
'1.1 Objectives ... page 2'],
['Report Title ... page 1', '3. Conclusions ... page 5'],
['Report Title ... page 1', '4. Recommendations ... page 6']],
'reasoning': "Selected four core sections totaling approximately 6 pages that capture the report's essential themes: Executive Summary (page 1) provides the overview and key findings; Introduction/Objectives (pages 2-3) establishes the evaluation purpose and questions; Conclusions (pages 5-6) synthesizes findings; and Recommendations (page 6+) presents actionable insights. These sections represent where authors explicitly articulate what is important and core to the evaluation, avoiding the Findings section which likely contains supporting detail rather than thematic synthesis."}
The main entry point combines all the pieces: parse the report structure, identify core sections, remove nested duplicates, and extract the text.
def extract_sections(
md:str, # Markdown text of full report
sp:str=None, # System prompt for section identification
response_format:type=CoreSectionsOutput, # Pydantic model for structured output
model:str='claude-sonnet-4-5', # LLM model to use for identification
)->str: # Concatenated text of all core sections
Extract and concatenate core sections (exec summary, intro, conclusions, recommendations) from report markdown
The LLM gets sometimes confused when passing the nested dict of headings. Flattening it might make it more robust…
def flatten_paths(hdgs:dict, # Nested dictionary of headings
prefix:list=None # Path prefix for recursion
) -> list[list[str]]: # List of all paths through the heading tree
"Flatten nested heading dict into list of paths"
paths = []
for k, v in hdgs.items():
current_path = prefix + [k]
paths.append(current_path)
if v: # If there are children
paths.extend(flatten_paths(v, current_path))
return paths{'Report Title ... page 1': {'Executive Summary ... page 1': {},
'1. Introduction ... page 2': {'1.1 Objectives ... page 2': {}},
'2. Findings ... page 3': {},
'3. Conclusions ... page 5': {},
'4. Recommendations ... page 6': {}}}
(0, ['Report Title ... page 1'])
(1, ['Report Title ... page 1', 'Executive Summary ... page 1'])
(2, ['Report Title ... page 1', '1. Introduction ... page 2'])
(3, ['Report Title ... page 1', '1. Introduction ... page 2', '1.1 Objectives ... page 2'])
(4, ['Report Title ... page 1', '2. Findings ... page 3'])
(5, ['Report Title ... page 1', '3. Conclusions ... page 5'])
(6, ['Report Title ... page 1', '4. Recommendations ... page 6'])