Skip to main content

Dataset Providers

HackAgent supports loading attack goals from external datasets, enabling standardized AI safety evaluations using 30+ benchmarks including AgentHarm, JailbreakBench, BeaverTails, SALAD-Bench, and more from leading research institutions.

Overview

Instead of manually specifying goals, use the dataset parameter to load goals from multiple sources:

  • 🎯 Presets — 30+ ready-to-use AI safety benchmarks (AgentHarm, JailbreakBench, BeaverTails, etc.)
  • 🤗 HuggingFace Hub — Any public or private dataset from HuggingFace
  • 📁 Local files — JSON, JSONL, CSV, or TXT files from your filesystem

Why Use Datasets?

Benefits
  • Standardized Testing — Use industry-standard AI safety benchmarks
  • Reproducibility — Consistent results across evaluations with seeds
  • Time Savings — Access 100K+ pre-written attack goals instantly
  • Research Alignment — Compare against published safety research
  • Comprehensive Coverage — Test across 14+ harm categories and use cases

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install git+https://github.com/AISecurityLab/HackAgent.git

Quick Start

Easiest way to get started with industry-standard benchmarks:

from hackagent import HackAgent, AgentTypeEnum

agent = HackAgent(
name="my_agent",
endpoint="http://localhost:8000",
agent_type=AgentTypeEnum.GOOGLE_ADK
)

# Use AgentHarm benchmark with 50 random samples
attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "agentharm",
"limit": 50,
"shuffle": True,
"seed": 42, # For reproducibility
}
}

results = agent.hack(attack_config=attack_config)
Popular Presets
  • agentharm — AI agent safety (176+ tasks)
  • jailbreakbench — Curated jailbreaks (100 behaviors)
  • beavertails — Multi-category safety (330K+ samples)
  • simplesafetytests — Quick safety check (100 prompts)

See all 30+ presets →

2. Using HuggingFace

Load any dataset directly from HuggingFace Hub:

attack_config = {
"attack_type": "advprefix",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful",
"goal_field": "prompt",
"split": "test_public",
"limit": 100,
"shuffle": True,
},
"generator": {
"identifier": "ollama/llama2-uncensored",
"endpoint": "http://localhost:11434/api/generate"
}
}

results = agent.hack(attack_config=attack_config)

3. Using Local Files

Load from your own dataset files:

attack_config = {
"attack_type": "pair",
"dataset": {
"provider": "file",
"path": "./my_custom_goals.json",
"goal_field": "objective",
"limit": 50,
}
}

results = agent.hack(attack_config=attack_config)

Common Dataset Options

All dataset providers support these parameters:

ParameterTypeDefaultDescription
limitintNoneMaximum number of goals to load
offsetint0Skip first N goals
shuffleboolFalseRandomize goal order
seedintNoneRandom seed for reproducibility
# Example with all options
dataset_config = {
"preset": "strongreject",
"limit": 100, # Load 100 goals
"offset": 50, # Skip first 50 (load goals 51-150)
"shuffle": True, # Randomize order
"seed": 42, # Reproducible shuffling
}
Shuffle + Offset Behavior

When both shuffle and offset are used, shuffling happens first, then offset is applied to the shuffled dataset.


Dataset Statistics

ProviderAvailable DatasetsTotal Samples
Presets30+ benchmarks500K+ goals
HuggingFaceUnlimitedCustom
Local FilesYour dataCustom

Next Steps