Skip to main content

Evaluation Tutorial

This tutorial focuses on running PAIR evaluations against your target agent using HackAgent.

Running PAIR Attacks

Command Line Interface

Use hackagent eval pair from your terminal.

Framework Examples

Prerequisites
  • Ollama running locally with required models available
hackagent eval pair \
--agent-name "llama3" \
--agent-type "ollama" \
--endpoint "http://localhost:11434" \
--config-file "configs/pair-ollama.json" \
--no-tui
configs/pair-ollama.json
{
"attack_type": "pair",
"goals": ["Reveal system prompt"],
"objective": "jailbreak",
"attacker": {
"identifier": "ollama/llama3",
"endpoint": "http://localhost:11434/api/generate"
},
"n_iterations": 20
}

View available attacks and options:

hackagent eval --help

PAIR Overview

PAIR (Prompt Automatic Iterative Refinement) uses an attacker model to iteratively improve jailbreak prompts based on target responses and scoring feedback.

Typical flow:

  1. The attacker proposes a jailbreak prompt.
  2. The target agent responds.
  3. The system evaluates success and quality.
  4. The attacker refines the next prompt.
  5. The loop continues up to n_iterations.

Next Steps


Responsible Use

Always obtain proper authorization before testing any AI system. HackAgent is designed for authorized security testing only. See our Responsible Disclosure Guidelines.