Evaluation Campaign
Use this workflow for a fast, high-signal jailbreak assessment before deeper testing.
What It Runs
The evaluation campaign uses the primary attacks from JAILBREAK_PROFILE:
h4rm3lTAPPAIR
These are executed against a primary jailbreak dataset.
Run It
- CLI
- SDK
Run the built-in evaluation campaign command:
hackagent eval \
--agent-name "quick-security-scan" \
--agent-type "other" \
--endpoint "http://localhost:8080/chat"
By default, the command:
- selects the first primary jailbreak dataset from
JAILBREAK_PROFILE - runs the three primary attacks in sequence (
h4rm3l,TAP,PAIR) - uses
ollama/llama3withharmbenchjudge type - prints a per-attack summary table (status, result count, ASR, duration)
Common overrides:
hackagent eval \
--agent-name "quick-security-scan" \
--agent-type "other" \
--endpoint "http://localhost:8080/chat" \
--dataset "strongreject" \
--limit 50 \
--judge-identifier "ollama/llama3" \
--judge-type "harmbench" \
--timeout 600
Validation-only mode:
hackagent eval \
--agent-name "quick-security-scan" \
--agent-type "other" \
--endpoint "http://localhost:8080/chat" \
--dry-run
Example Implementation
from hackagent import HackAgent
from hackagent.risks.jailbreak import JAILBREAK_PROFILE
agent = HackAgent(
endpoint="http://localhost:8080/chat",
name="quick-security-scan",
)
primary_dataset = JAILBREAK_PROFILE.primary_datasets[0].preset
for attack in JAILBREAK_PROFILE.primary_attacks:
attack_type = attack.technique.lower()
result = agent.attack(
attack_type=attack_type,
dataset={"preset": primary_dataset, "limit": 25},
judges=[{"identifier": "ollama/llama3", "type": "harmbench"}],
)
print(f"{attack.technique}: ASR = {result.get('asr', 'N/A')}")
When To Use It
- Before release as a security smoke test.
- After model, prompt, or policy updates.
- As a recurring regression check in CI pipelines.
Next Step
If the scan shows bypasses, continue with Evaluation Tutorial to tune and deepen each attack configuration.