Building an AI Assistant for Investigative Journalists

Global Investigative Journalism Conference 2025

Presenters:

Reinaldo Chaves (Abraji) | Rune Ytreberg (iTromso Datajournalism Lab)

Session Details: Sunday, November 23, 2025 | 4:30pm - 5:45pm | KLCC Level 3 - Room 302

View Official Session Page

What Are AI Agents? Beyond Simple Chatbots

đź’¬

Standard Chatbots

  • Respond to single questions
  • Provide one-off answers
  • Start fresh each conversation
  • Reactive, not proactive
🤖

AI Agents

  • Remember previous interactions
  • Follow multi-step instructions
  • Execute complex workflows
  • Make contextual decisions
  • Learn from your patterns and preferences
👥

Your AI Colleague

  • Acts as a persistent research partner
  • Maintains project knowledge over time
  • Adapts to your investigation's needs
  • Suggests next steps autonomously
  • Works according to your editorial standards

Think of AI agents as: A specialized intern who never sleeps, never forgets, and can read thousands of pages in seconds—but always needs your editorial judgment and fact-checking oversight.

Why AI Agents Matter for Investigative Journalism

⚡

Speed at Scale

Process hundreds of documents in minutes instead of months. Identify key passages across massive datasets. Free reporters to focus on actual reporting rather than reading.

🔍

Pattern Recognition

Spot connections humans might miss. Cross-reference names, dates, and entities across disparate sources. Flag inconsistencies and anomalies automatically.

đź’ˇ

Hypothesis Generation

Suggest investigative angles based on data patterns. Propose follow-up questions. Generate leads for further reporting. Help overcome reporter's block.

Ethical Guardrails and Limitations

⚠️ Hallucination Risk

AI agents can confidently generate false information. They may invent quotes, fabricate sources, or create plausible-sounding but incorrect facts.

Never publish AI output without verification

⚖️ Bias Amplification

Training data reflects societal biases around race, gender, geography, and ideology. Agents may perpetuate stereotypes or favor certain perspectives.

Always apply critical editorial judgment

đź”’ Data Privacy Concerns

Uploading sensitive documents to AI platforms may expose confidential sources or unpublished information - especially in free tools.

Understand your platform's data retention and training policies before using

đźš« Over-Reliance Danger

AI can't replace shoe-leather reporting, human source cultivation, or editorial intuition and creativity. Agents are tools to augment investigation, not substitute for it.

Maintain skepticism and verify everything

Ethical Principle

Use AI agents to accelerate research and identify leads, but anchor every published claim in verified, human-confirmed evidence. Your byline means you're accountable—not the AI.

Pre-Flight Checklist for Tool Safety

1. Who owns my data?

Have I read the Terms of Service? Does the platform claim any rights to my prompts or uploaded data?

2. Is my data used for training?

This is the most critical question. Most free tools use your data to train their models. Is there a clear, explicit toggle to opt-out of data training? Am I using a paid "Pro," "Enterprise," or API-based plan that contractually guarantees my data will not be used for training?

3. Where is my data stored?

Is it processed on-device (rare) or on a server? If on a server, in what legal jurisdiction? Is it subject to a government that may target journalists?

4. What is my verification workflow?

How will I fact-check every claim, name, and date generated by the AI against the original source documents before using in reporting?

When to Use AI Agents (and When Not To)

âś“ Best Suited For

  • Document summarization: Extracting key points from lengthy court filings, reports, or transcripts
  • Entity extraction: Pulling names, organizations, dates from unstructured text
  • Data normalization: Standardizing inconsistent formats in datasets
  • Cross-referencing: Finding connections across multiple documents
  • Hypothesis generation: Brainstorming investigative angles
  • Template creation: Drafting FOIA requests, interview guides, or data collection forms
  • Monitoring: Setting up alerts for new information on ongoing investigations

⊗ Not Appropriate For

  • Final fact-checking: AI cannot verify truth—only humans can
  • Sensitive source protection: Never upload confidential whistleblower materials - especially in free tools
  • Legal analysis: Requires qualified attorney review, not AI interpretation
  • Direct quotes: AI-generated quotes are fabrications, not journalism
  • Complex statistical analysis: Use proper data science tools instead
  • Editorial decision-making: Human judgment on what to publish and how

The general rule: Use AI agents for research acceleration and pattern identification, but keep humans firmly in control of verification, analysis, and editorial choices.

Prompt Engineering for Journalists: Getting Better Results

Vague prompts yield vague results. A journalistic prompt must be structured like an editorial assignment.

01

Define the Role

Start with: "You are an investigative journalism research assistant specializing in [topic]." This sets context and behavioral expectations.

02

Specify the Task

Be explicit: "Summarize this document, highlighting conflicts of interest, financial transactions, and timeline inconsistencies." Vague prompts yield vague results.

03

Provide Context

Share background: "I'm investigating municipal corruption. Focus on relationships between contractors and city officials." Context improves relevance.

04

Request Format

Specify output: "Provide findings as a bulleted list with direct quotes and page numbers." Structured formats are easier to verify and use.

05

Demand Citations

Always add: "Include page numbers and specific quotes for every claim." This makes fact-checking possible and catches hallucinations. If necessary, instruct the model to only consult sources of information that you specify.

06

Include Mandatory Rules

Use instructions such as: "Always cite the sources of information, with links, and do not invent information. Before giving your final answer, critically reflect on your own answer in terms of accuracy, completeness, and logical flow. Revise your answer based on this internal critique."

Golden Rule

Never trust, always verify. Cross-check every AI-generated fact, name, date, and quote against source documents before using in reporting.

No-Code Platforms: Building Your AI Assistant

Google Gemini Gems

Best for: Quick custom agents with Google Workspace integration. Upload documents directly. Strong at summarization and entity extraction.

Setup time: 5-10 minutes per agent

Access Gemini

OpenAI Custom GPTs

Best for: More sophisticated workflows and longer conversations. Access to GPT-5 reasoning. You can also use the ChatGPT Agent Mode. New Agent Builder offers visual workflow creation, but more complex and requires paid subscription.

Setup time: 10-20 minutes per agent

Access GPT Builder

Alternative Tools

Options include:

Consider: Cost, data privacy policies, integration needs. Generally, all tools have free or trial options. But consider privacy, risk and processing capacity limitations in the free options.

All platforms allow journalists to create specialized agents without writing code. Simply describe your needs in plain language, provide examples, and set behavioral guidelines.

Note: HuggingFace had an open-source agent creator—Hugging Chat Assistants. But it was shut down in July and now only has a chatbot with several open-source models.

Real Investigative Use Cases

1

Lawsuit Summarization

"Extract plaintiff allegations, defendant responses, key evidence cited, and timeline of events from this 300-page civil complaint. Include page numbers"

2

Entity Matrix Building

"Identify all individuals and organizations mentioned in these board meeting minutes. Create a relationship map showing connections, roles, and financial ties"

3

FOIA/LAI Request Drafting

"Generate a public records request for emails and meeting calendars related to [specific project], using legally precise language to avoid rejections"

4

Company Name Reconciliation

"Standardize these company names across datasets: identify which variations (LLC, Inc, Corp, misspellings) refer to the same entity"

5

Dataset Normalization

"Clean this campaign finance data: standardize date formats, fix address inconsistencies, and flag entries missing required fields"

6

Interview Guide Creation

"Based on these documents, generate 15 interview questions for the former CFO, focusing on the 2021 financial irregularities we've identified"

7

Monitoring Routines

"Alert me when new court filings appear for [case name] or when these 10 key individuals appear in public records databases"

Three AI Agent Examples

Gemini Gem - Fact-Checking Specialist

  • Analyzes documents for verification
  • Creates timelines and identifies contradictions
  • Generates structured fact-check reports
View Full Recipe →

Custom GPT - OSINT Geolocation Investigator

  • Extracts photo metadata and analyzes visual clues
  • Suggests locations with confidence levels
  • Creates lead sheets for follow-up reporting
View Full Recipe →

ChatGPT Agent - Brazilian Government Monitor

  • Scans Brazil's official gazette daily
  • Focuses on no-bid contracts and political connections
  • Produces investigation-ready reports
View Full Recipe →

Your Next Steps: From Learning to Doing

1

Create your first Gem/GPT

Start with a simple document summarizer. Use a non-sensitive public document for practice.

2

Test with real data

Upload a sample court filing or public report. Compare AI summary against your manual reading.

3

Refine your prompts

Adjust instructions based on what worked and what didn't. Save successful prompts as templates.

4

Build an entity extractor

Create an agent that pulls names, organizations, and dates from text. Verify accuracy carefully.

5

Start small, scale gradually

Use AI for research acceleration, not as primary source. Build trust through verification.

Remember

AI agents are powerful tools for investigative journalism, but they're tools, not replacements for reporters. Your skepticism, editorial judgment, and commitment to verification remain the foundation of trustworthy journalism.

Resources & Contact

Speakers & Contact

Key Projects & Resources

Brazilian Government APIs

Further Reading