OCR

OCR evaluation reports

For testing purposes, let’s load evaluations:

fname_json = '../_data/output/evaluations.json'
evals = load_evals(fname_json)
print(evals[0])

{
    'id': '1a57974ab89d7280988aa6b706147ce1',
    'docs': [
        {
            'Document Subtype': 'Evaluation report',
            'File URL': 
'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Internal%20Evaluation_NG20P0516_MAY_2023_F
INAL_Abderrahim%20EL%20MOULAT.pdf',
            'File description': 'Evaluation Report'
        },
        {
            'Document Subtype': 'Evaluation brief',
            'File URL': 
'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/RR0163_Evaluation%20Brief_MAY_%202023_Abde
rrahim%20EL%20MOULAT.pdf',
            'File description': 'Evaluation Brief'
        }
    ],
    'meta': {
        'Title': 'EX-POST EVALUATION OF THE PROJECT:  NIGERIA: STRENGTHENING REINTEGRATION FOR RETURNEES (SRARP)  -
PHASE II',
        'Year': 2023,
        'Author': 'Abderrahim El Moulat',
        'Best Practicesor Lessons Learnt': 'Yes',
        'Date of Publication': '2023-05-10',
        'Donor': 'Government of Germany',
        'Evaluation Brief': 'Yes',
        'Evaluation Commissioner': 'Donor, IOM',
        'Evaluation Coverage': 'Country',
        'Evaluation Period From Date': 'nan',
        'Evaluation Period To Date': 'NaT',
        'Executive Summary': 'Yes',
        'External Version of the Report': 'No',
        'Languages': 'English',
        'Migration Thematic Areas': 'Assistance to vulnerable migrants, Migrant training and integration (including
community cohesion), Migration health (assessment, travel, health promotion, crisis-affected), Return and AVRR',
        'Name of Project(s) Being Evaluated': nan,
        'Number of Pages Excluding annexes': nan,
        'Other Documents Included': nan,
        'Project Code': 'RR.0163',
        'Countries Covered': ['Nigeria'],
        'Regions Covered': 'RO Dakar',
        'Relevant Crosscutting Themes': 'Gender, Rights-based approach',
        'Report Published': 'Yes',
        'Terms of Reference': 'No',
        'Type of Evaluation Scope': 'Programme/Project',
        'Type of Evaluation Timing': 'Ex-post (after the end of the project/programme)',
        'Type of Evaluator': 'Internal',
        'Level of Evaluation': 'Decentralized',
        'Document Subtype': 'Evaluation report, Evaluation brief',
        'File URL': 
'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Internal%20Evaluation_NG20P0516_MAY_2023_F
INAL_Abderrahim%20EL%20MOULAT.pdf,   
https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/RR0163_Evaluation%20Brief_MAY_%202023_Abder
rahim%20EL%20MOULAT.pdf',
        'File description': 'Evaluation Report , Evaluation Brief',
        'Management response': 'No',
        'Date added': 'Fri, 07/07/2023 - 15:35',
        'Metaevaluation': '2020-24',
        'exclude': nan,
        'reason': nan
    }
}

To find a particular evaluation per name (e.g “Final Evaluation of the EU-IOM Joint …”), we can simply:

title = 'Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the Horn of Africa'
results = [o for o in evals.filter(lambda x: title.lower() in x['meta']['Title'].lower())]; results

[{'id': '49d2fba781b6a7c0d94577479636ee6f',
  'docs': [{'Document Subtype': 'Evaluation report',
    'File URL': 'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf',
    'File description': 'Evaluation Report'},
   {'Document Subtype': 'Evaluation brief',
    'File URL': 'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf',
    'File description': 'Evaluation Brief'},
   {'Document Subtype': 'Annexes',
    'File URL': 'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf',
    'File description': 'Abridged Report'},
   {'Document Subtype': 'Management response',
    'File URL': 'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/HoA%20EU%20JI%20Final%20Eval%20-%20Management%20Response%20Matrix%20-%20Final.pdf',
    'File description': 'Management Response'}],
  'meta': {'Title': 'Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa',
   'Year': 2023,
   'Author': 'PPMI Group',
   'Best Practicesor Lessons Learnt': 'Yes',
   'Date of Publication': '2023-03-17',
   'Donor': 'European Union',
   'Evaluation Brief': 'Yes',
   'Evaluation Commissioner': 'IOM',
   'Evaluation Coverage': 'Regional',
   'Evaluation Period From Date': 'nan',
   'Evaluation Period To Date': 'NaT',
   'Executive Summary': 'Yes',
   'External Version of the Report': 'No',
   'Languages': 'English',
   'Migration Thematic Areas': 'Assistance to vulnerable migrants, Return and AVRR',
   'Name of Project(s) Being Evaluated': nan,
   'Number of Pages Excluding annexes': nan,
   'Other Documents Included': nan,
   'Project Code': 'RT.1354',
   'Countries Covered': ['Djibouti', 'Ethiopia', 'Somalia', 'South Sudan'],
   'Regions Covered': 'RO Nairobi',
   'Relevant Crosscutting Themes': 'Accountability to affected populations, Environment, Gender, Mainstreaming protection into crisis response, Principled humanitarian action, Rights-based approach',
   'Report Published': 'Yes',
   'Terms of Reference': 'No',
   'Type of Evaluation Scope': 'Programme/Project',
   'Type of Evaluation Timing': 'Final (at the end of the project/programme)',
   'Type of Evaluator': 'External',
   'Level of Evaluation': 'Decentralized',
   'Document Subtype': 'Evaluation report, Evaluation brief, Annexes, Management response',
   'File URL': 'https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf,   https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf,   https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf,   https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/HoA%20EU%20JI%20Final%20Eval%20-%20Management%20Response%20Matrix%20-%20Final.pdf',
   'File description': 'Evaluation Report , Evaluation Brief, Abridged Report, Management Response',
   'Management response': 'No',
   'Date added': 'Fri, 05/26/2023 - 12:38',
   'Metaevaluation': '2020-24',
   'exclude': nan,
   'reason': nan}}]

Utils

Given an evaluation id and a pdf file name of one of its supporting doc we’d like to check its subtype:

source

get_doc_subtype

 get_doc_subtype (id:str, fname:str, evals)

Get Document Subtype for a given file in the evaluation dataset

	Type	Details
id	str	ID of the evaluation
fname	str	Name of the file
evals		Evaluations data
Returns	str	Document Subtype

Exported source

def get_doc_subtype(
    id:str, # ID of the evaluation
    fname:str, # Name of the file
    evals # Evaluations data
    )->str: # Document Subtype
    "Get Document Subtype for a given file in the evaluation dataset"
    eval_data = L(evals).filter(lambda x: x['id']==id)
    if not eval_data: return None
    
    docs = L(eval_data[0]['docs'])
    matches = docs.filter(lambda x: Path(x['File URL']).name==fname)
    return matches[0]['Document Subtype'] if matches else None

Now, based on downloaded PDFs on path_pdf, we can get the subtype of each document:

path_pdf = Path('../_data/pdf_library')
id = '49d2fba781b6a7c0d94577479636ee6f'
for o in path_pdf.ls().filter(lambda x: x.name == id)[0].ls():
    print(f'Name: {o.name}\nSubtype: {get_doc_subtype(id, o.name, evals)}')

Name: Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf
Subtype: Evaluation brief

Name: HoA%20EU%20JI%20Final%20Eval%20-%20Management%20Response%20Matrix%20-%20Final.pdf
Subtype: Management response

Name: Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf
Subtype: Evaluation report

Name: Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf
Subtype: Annexes

source

clean_pdf_name

 clean_pdf_name (pdf_name:str)

Clean PDF name to create folder-friendly string. Removes special characters and spaces, replaces with underscores.

Exported source

# Note: filename will be cleaned upstream in a next version
def clean_pdf_name(pdf_name: str) -> str:
    """
    Clean PDF name to create folder-friendly string.
    Removes special characters and spaces, replaces with underscores.
    """
    # Remove URL encoding
    pdf_name = urllib.parse.unquote(pdf_name)
    
    # Replace spaces and special characters with underscores
    # Replace any character that is not a word character (\w), whitespace (\s), or hyphen (-) with underscore
    cleaned = re.sub(r'[^\w\s-]', '_', pdf_name)
    
    # Replace any sequence of hyphens or whitespace with a single underscore
    cleaned = re.sub(r'[-\s]+', '_', cleaned)
    
    # Replace multiple consecutive underscores with a single underscore
    cleaned = re.sub(r'_+', '_', cleaned)
    cleaned = cleaned.strip('_')  # Remove leading/trailing underscores
    
    return cleaned.lower()

clean_pdf_name("Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf")

'final_evaluation_report_final_olta_ndoja_pdf'

Batch processing

source

setup_output_dirs

 setup_output_dirs (md_library_path='../_data/md_library')

Set up the output directory structure for markdown files

Exported source

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('batch_ocr.log'),
        logging.StreamHandler()  # Also print to console
    ]
)

Exported source

def setup_output_dirs(md_library_path="../_data/md_library"):
    "Set up the output directory structure for markdown files"
    md_output_dir = Path(md_library_path)
    mkdir(md_output_dir, parents=True, exist_ok=True, overwrite=False)
    return md_output_dir

source

get_pdfs_and_dir

 get_pdfs_and_dir (report_path:pathlib.Path, md_output_dir:pathlib.Path)

Get PDFs from report directory and create output directory

	Type	Details
report_path	Path	Path to the report directory
md_output_dir	Path	Path to the output directory
Returns	tuple

Exported source

def get_pdfs_and_dir(
    report_path:Path, # Path to the report directory
    md_output_dir:Path # Path to the output directory
    ) -> tuple[list[Path], str]:
    "Get PDFs from report directory and create output directory"
    pdfs = report_path.ls(file_exts='.pdf')
    eval_report_path = report_path.name
    mkdir(md_output_dir / eval_report_path, parents=True, exist_ok=True, overwrite=False)
    return pdfs, eval_report_path

Example usage for the “’Final Evaluation of the EU-IOM Joint Initiative …” evaluation:

report_id_test = '49d2fba781b6a7c0d94577479636ee6f'
reports = [p for p in path_pdf.ls() if p.name == report_id_test]
pdfs, eval_report_path = get_pdfs_and_dir(reports[0], md_output_dir)
print(f'Reports: {reports}')
print(f'pdfs[0]: {pdfs[0]}\nEvaluation report path: {eval_report_path}')

Reports: [Path('../_data/pdf_library/49d2fba781b6a7c0d94577479636ee6f')]

pdfs[0]: ../_data/pdf_library/49d2fba781b6a7c0d94577479636ee6f/Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf
Evaluation report path: 49d2fba781b6a7c0d94577479636ee6f

source

save_page_images

 save_page_images (page, dest_folder:pathlib.Path)

Save all images from a page to destination folder as PNG

Exported source

def save_page_images(page, dest_folder: Path):
    "Save all images from a page to destination folder as PNG"
    images = page.images if hasattr(page, 'images') else page.get('images', [])
    
    for img in images:
        img_data = getattr(img, 'image_base64', img.get('image_base64'))
        img_id = getattr(img, 'id', img.get('id'))
        
        if img_data and img_id:
            img_bytes = base64.b64decode(img_data.split(',')[1])
            pil_img = Image.open(BytesIO(img_bytes))
            output_path = dest_folder / img_id
            pil_img.save(output_path)

# def create_batch_ocr_job(
#     pdf_paths: List[Path],
#     model: str = "mistral-ocr-latest",
#     include_images: bool = True,
#     api_key: str = mistral_api_key
# ):
#     "Create a batch job for multiple PDFs"
#     cli = Mistral(api_key=api_key)
    
#     # Upload all PDFs and create batch entries
#     batch_entries = []
#     for i, pdf_path in enumerate(pdf_paths):
#         # Upload PDF
#         uploaded_pdf = cli.files.upload(
#             file={
#                 "file_name": pdf_path.stem,
#                 "content": pdf_path.read_bytes(),
#             },
#             purpose="ocr"
#         )
        
#         signed_url = cli.files.get_signed_url(file_id=uploaded_pdf.id)
        
#         # Create batch entry with custom_id to track which PDF this is
#         entry = {
#             "custom_id": f"{pdf_path.parent.name}_{pdf_path.stem}",  # eval_id_pdfname
#             "body": {
#                 "document": {
#                     "type": "document_url",
#                     "document_url": signed_url.url,
#                 },
#                 "include_image_base64": include_images
#             }
#         }
#         batch_entries.append(entry)
    
#     return batch_entries, cli

source

process_batch_results

 process_batch_results (results, md_output_dir)

Process batch OCR results and save to appropriate folders

Exported source

def process_batch_results(results, md_output_dir):
    "Process batch OCR results and save to appropriate folders"
    for result in results:
        try:
            # Parse custom_id to get eval_id and pdf_name
            eval_id, pdf_name = result['custom_id'].split('_', 1)
            
            # Get OCR response
            ocr_response = result['response']['body']
            
            # Create folder structure
            pdf_clean_name = clean_pdf_name(pdf_name)
            pdf_dir = md_output_dir / eval_id / pdf_clean_name
            pdf_dir.mkdir(parents=True, exist_ok=True)
            
            # Save each page markdown
            for page in ocr_response['pages']:
                page_num = page['index'] + 1
                page_path = pdf_dir / f"page_{page_num}.md"
                page_path.write_text(page['markdown'])
            
            # Save images if they exist
            img_dir = pdf_dir / 'img'
            for page in ocr_response['pages']:
                if page.get('images'):
                    img_dir.mkdir(parents=True, exist_ok=True)
                    save_page_images(page, img_dir)
            
            logging.info(f"Saved {len(ocr_response['pages'])} pages for {pdf_clean_name}")
            
        except Exception as e:
            logging.error(f"Error processing result {result.get('custom_id', 'unknown')}: {e}")

source

create_batch_ocr_job

 create_batch_ocr_job (pdf_paths:List[pathlib.Path], eval_report_path:str,
                       model:str='mistral-ocr-latest',
                       include_images:bool=True, api_key:str=None)

Create batch entries for PDFs from one evaluation report

Exported source

def create_batch_ocr_job(
    pdf_paths: List[Path],
    eval_report_path: str,
    model: str = "mistral-ocr-latest",
    include_images: bool = True,
    api_key: str = mistral_api_key
):
    "Create batch entries for PDFs from one evaluation report"
    cli = Mistral(api_key=api_key)
    
    batch_entries = []
    for pdf_path in pdf_paths:
        uploaded_pdf = cli.files.upload(
            file={
                "file_name": pdf_path.stem,
                "content": pdf_path.read_bytes(),
            },
            purpose="ocr"
        )
        
        signed_url = cli.files.get_signed_url(file_id=uploaded_pdf.id)
        entry = {
            "custom_id": f"{eval_report_path}_{pdf_path.name}",
            "body": {
                "document": {
                    "type": "document_url",
                    "document_url": signed_url.url,
                },
                "include_image_base64": include_images
            }
        }
        batch_entries.append(entry)
        logging.info(f"Added {pdf_path.name} to batch for {eval_report_path}")
        
    return batch_entries, cli

source

submit_and_monitor_batch_job

 submit_and_monitor_batch_job (batch_entries, eval_report_path, cli)

Submit batch job and monitor until completion

Exported source

def submit_and_monitor_batch_job(batch_entries, eval_report_path, cli):
    "Submit batch job and monitor until completion"
    with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=True) as temp_file:
        # Write batch entries to temp file
        for entry in batch_entries:
            temp_file.write(json.dumps(entry) + '\n')
        temp_file.flush()
        
        # Upload and create job
        batch_data = cli.files.upload(
            file={"file_name": f"batch_{eval_report_path}.jsonl", 
                  "content": open(temp_file.name, "rb")},
            purpose="batch"
        )
        
        created_job = cli.batch.jobs.create(
            input_files=[batch_data.id],
            model="mistral-ocr-latest",
            endpoint="/v1/ocr"
        )
        
        logging.info(f"Batch job created for {eval_report_path}: {created_job.id}")
        
        # Monitor completion
        while True:
            job = cli.batch.jobs.get(job_id=created_job.id)
            logging.info(f"Job status: {job.status} - {job.succeeded_requests}/{job.total_requests} completed")
            
            if job.status not in ["QUEUED", "RUNNING"]:
                break
            time.sleep(10)
        
        return job

source

download_and_parse_results

 download_and_parse_results (job, cli)

Download and parse batch job results

Exported source

def download_and_parse_results(job, cli):
    "Download and parse batch job results"
    response = cli.files.download(file_id=job.output_file)
    content = response.read().decode('utf-8')
    
    results = []
    for line in content.strip().split('\n'):
        if line:
            results.append(json.loads(line))
    
    logging.info(f"Downloaded and parsed {len(results)} OCR results")
    return results

source

process_single_evaluation_batch

 process_single_evaluation_batch (report:pathlib.Path,
                                  md_output_dir:pathlib.Path)

Process one evaluation report using batch OCR

Exported source

def process_single_evaluation_batch(report: Path, md_output_dir: Path):
    "Process one evaluation report using batch OCR"
    logging.info(f"Processing evaluation: {report.name}")
    pdfs, eval_report_path = get_pdfs_and_dir(report, md_output_dir)
    
    if not pdfs:
        logging.warning(f"No PDFs found in {eval_report_path}")
        return
    
    batch_entries, cli = create_batch_ocr_job(pdfs, eval_report_path)
    
    job = submit_and_monitor_batch_job(batch_entries, eval_report_path, cli)
    
    if job and job.status == "SUCCESS":
        results = download_and_parse_results(job, cli)
        process_batch_results(results, md_output_dir)
        logging.info(f"Completed processing evaluation: {eval_report_path}")
    else:
        logging.error(f"Job failed for {eval_report_path}")

source

process_all_reports_batch

 process_all_reports_batch (reports:list[pathlib.Path],
                            md_library_path='../_data/md_library')

Process evaluation reports using batch OCR

Exported source

def process_all_reports_batch(
    reports: list[Path],
    md_library_path="../_data/md_library"
):
    "Process evaluation reports using batch OCR"
    logging.info(f"Starting batch OCR processing for {len(reports)} reports")
    md_output_dir = setup_output_dirs(md_library_path)
    
    for report in tqdm(reports, desc="Processing reports"):
        process_single_evaluation_batch(report, md_output_dir)
    
    logging.info("Batch OCR processing completed for all reports")

process_all_reports_batch(reports, md_library_path="../_data/md_library")

2025-08-06 17:38:39,461 - INFO - Starting batch OCR processing for 1 reports
Processing reports:   0%|          | 0/1 [00:00<?, ?it/s]2025-08-06 17:38:39,464 - INFO - Processing evaluation: 49d2fba781b6a7c0d94577479636ee6f
2025-08-06 17:38:42,165 - INFO - HTTP Request: POST https://api.mistral.ai/v1/files "HTTP/1.1 200 OK"
2025-08-06 17:38:42,521 - INFO - HTTP Request: GET https://api.mistral.ai/v1/files/20ea2096-b022-492c-84de-4130e24c6188/url?expiry=24 "HTTP/1.1 200 OK"
2025-08-06 17:38:42,522 - INFO - Added Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf to batch for 49d2fba781b6a7c0d94577479636ee6f
2025-08-06 17:38:44,401 - INFO - HTTP Request: POST https://api.mistral.ai/v1/files "HTTP/1.1 200 OK"
2025-08-06 17:38:44,756 - INFO - HTTP Request: GET https://api.mistral.ai/v1/files/67423130-6d16-4f97-960e-35f1e85aa91d/url?expiry=24 "HTTP/1.1 200 OK"
2025-08-06 17:38:44,760 - INFO - Added HoA%20EU%20JI%20Final%20Eval%20-%20Management%20Response%20Matrix%20-%20Final.pdf to batch for 49d2fba781b6a7c0d94577479636ee6f
2025-08-06 17:38:52,790 - INFO - HTTP Request: POST https://api.mistral.ai/v1/files "HTTP/1.1 200 OK"
2025-08-06 17:38:52,986 - INFO - HTTP Request: GET https://api.mistral.ai/v1/files/b3d152f4-a906-4bb3-89e9-b9bb67ee995e/url?expiry=24 "HTTP/1.1 200 OK"
2025-08-06 17:38:52,988 - INFO - Added Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf to batch for 49d2fba781b6a7c0d94577479636ee6f
2025-08-06 17:39:18,422 - INFO - HTTP Request: POST https://api.mistral.ai/v1/files "HTTP/1.1 200 OK"
2025-08-06 17:39:18,611 - INFO - HTTP Request: GET https://api.mistral.ai/v1/files/6a41e07f-0e95-4408-81c7-b608d4c6e0c6/url?expiry=24 "HTTP/1.1 200 OK"
2025-08-06 17:39:18,615 - INFO - Added Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf to batch for 49d2fba781b6a7c0d94577479636ee6f
2025-08-06 17:39:19,210 - INFO - HTTP Request: POST https://api.mistral.ai/v1/files "HTTP/1.1 200 OK"
2025-08-06 17:39:19,899 - INFO - HTTP Request: POST https://api.mistral.ai/v1/batch/jobs "HTTP/1.1 200 OK"
2025-08-06 17:39:19,905 - INFO - Batch job created for 49d2fba781b6a7c0d94577479636ee6f: 427954fa-adde-448d-ba8e-c7fcd4bd9c40
2025-08-06 17:39:20,110 - INFO - HTTP Request: GET https://api.mistral.ai/v1/batch/jobs/427954fa-adde-448d-ba8e-c7fcd4bd9c40 "HTTP/1.1 200 OK"
2025-08-06 17:39:20,115 - INFO - Job status: QUEUED - 0/4 completed
2025-08-06 17:39:30,467 - INFO - HTTP Request: GET https://api.mistral.ai/v1/batch/jobs/427954fa-adde-448d-ba8e-c7fcd4bd9c40 "HTTP/1.1 200 OK"
2025-08-06 17:39:30,471 - INFO - Job status: RUNNING - 0/4 completed
2025-08-06 17:39:41,172 - INFO - HTTP Request: GET https://api.mistral.ai/v1/batch/jobs/427954fa-adde-448d-ba8e-c7fcd4bd9c40 "HTTP/1.1 200 OK"
2025-08-06 17:39:41,177 - INFO - Job status: RUNNING - 0/4 completed
2025-08-06 17:39:51,469 - INFO - HTTP Request: GET https://api.mistral.ai/v1/batch/jobs/427954fa-adde-448d-ba8e-c7fcd4bd9c40 "HTTP/1.1 200 OK"
2025-08-06 17:39:51,474 - INFO - Job status: RUNNING - 0/4 completed
2025-08-06 17:40:01,881 - INFO - HTTP Request: GET https://api.mistral.ai/v1/batch/jobs/427954fa-adde-448d-ba8e-c7fcd4bd9c40 "HTTP/1.1 200 OK"
2025-08-06 17:40:01,883 - INFO - Job status: SUCCESS - 4/4 completed
2025-08-06 17:40:02,110 - INFO - HTTP Request: GET https://api.mistral.ai/v1/files/0a684deb-2579-43c4-8fd0-477683465972/content "HTTP/1.1 200 OK"
2025-08-06 17:40:03,154 - INFO - Downloaded and parsed 4 OCR results
2025-08-06 17:40:03,156 - INFO - Saved 2 pages for evaluation_learning_brief_final_olta_ndoja_pdf
2025-08-06 17:40:03,159 - INFO - Saved 8 pages for hoa_eu_ji_final_eval_management_response_matrix_final_pdf
2025-08-06 17:40:03,200 - INFO - Saved 31 pages for abridged_evaluation_report_final_olta_ndoja_pdf
2025-08-06 17:40:03,267 - INFO - Saved 142 pages for final_evaluation_report_final_olta_ndoja_pdf
2025-08-06 17:40:03,268 - INFO - Completed processing evaluation: 49d2fba781b6a7c0d94577479636ee6f
Processing reports: 100%|██████████| 1/1 [01:23<00:00, 83.80s/it]
2025-08-06 17:40:03,269 - INFO - Batch OCR processing completed for all reports