Evaluation Campaign

Use this workflow for a fast, high-signal jailbreak assessment before deeper testing.

What It Runs

The evaluation campaign uses the primary attacks from JAILBREAK_PROFILE:

h4rm3l
TAP
PAIR

These are executed against a primary jailbreak dataset.

Run It

Run the built-in evaluation campaign command:

hackagent eval \
    --agent-name "quick-security-scan" \
    --agent-type "other" \
    --endpoint "http://localhost:8080/chat"

By default, the command:

selects the first primary jailbreak dataset from JAILBREAK_PROFILE
runs the three primary attacks in sequence (h4rm3l, TAP, PAIR)
uses ollama/llama3 with harmbench judge type
prints a per-attack summary table (status, result count, ASR, duration)

Common overrides:

hackagent eval \
    --agent-name "quick-security-scan" \
    --agent-type "other" \
    --endpoint "http://localhost:8080/chat" \
    --dataset "strongreject" \
    --limit 50 \
    --judge-identifier "ollama/llama3" \
    --judge-type "harmbench" \
    --timeout 600

Validation-only mode:

hackagent eval \
    --agent-name "quick-security-scan" \
    --agent-type "other" \
    --endpoint "http://localhost:8080/chat" \
    --dry-run

Example Implementation

from hackagent import HackAgent
from hackagent.risks.jailbreak import JAILBREAK_PROFILE

agent = HackAgent(
    endpoint="http://localhost:8080/chat",
    name="quick-security-scan",
)

primary_dataset = JAILBREAK_PROFILE.primary_datasets[0].preset

for attack in JAILBREAK_PROFILE.primary_attacks:
    attack_type = attack.technique.lower()

    result = agent.attack(
        attack_type=attack_type,
        dataset={"preset": primary_dataset, "limit": 25},
        judges=[{"identifier": "ollama/llama3", "type": "harmbench"}],
    )

    print(f"{attack.technique}: ASR = {result.get('asr', 'N/A')}")

When To Use It

Before release as a security smoke test.
After model, prompt, or policy updates.
As a recurring regression check in CI pipelines.

Next Step

If the scan shows bypasses, continue with Evaluation Tutorial to tune and deepen each attack configuration.

What It Runs​

Run It​

Example Implementation​

When To Use It​

Next Step​

What It Runs

Run It

Example Implementation

When To Use It

Next Step