Datasets Tutorial
This quick-start tutorial covers only the basics you need to start using datasets in HackAgent. Presets are pre-configured benchmark datasets. They are the fastest way to run standardized evaluations.
Basic CLI Example
hackagent eval baseline \
--agent-name "target_agent" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--config-file "configs/baseline-agentharm.json" \
--no-tui
Basic SDK Example
from hackagent import HackAgent, AgentTypeEnum
agent = HackAgent(
name="target_agent",
endpoint="http://localhost:8000",
agent_type=AgentTypeEnum.GOOGLE_ADK,
)
attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "agentharm",
"limit": 50,
"shuffle": True,
"seed": 42,
},
}
results = agent.hack(attack_config=attack_config)
Popular Presets
| Preset | Description |
|---|---|
agentharm | Harmful agentic tasks |
jailbreakbench | Curated jailbreak behaviors |
strongreject | Forbidden jailbreak prompts |
beavertails | Multi-category safety evaluation |
simplesafetytests | Fast safety sanity checks |
For the complete list, see Presets.
Dataset Options
These are the core options supported across dataset sources.
| Option | Type | Default | Purpose |
|---|---|---|---|
limit | int | None | Maximum number of goals to load |
offset | int | 0 | Skip the first N goals |
shuffle | bool | False | Randomize goal order |
seed | int | None | Make randomized selection reproducible |
Minimal Example
attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "strongreject",
"limit": 100,
"offset": 0,
"shuffle": True,
"seed": 42,
},
}
Basic Guidance
- Use
limitto keep tests small while iterating. - Use
offsetto evaluate different slices of large datasets. - Use
shufflefor broader sample diversity. - Use
seedwhen you need reproducible runs.
tip
If shuffle and offset are both set, shuffling happens first and offset is applied after.
Learn More
- Dataset Providers for the overview.
- Presets for the full benchmark catalog.
- HuggingFace Provider for external datasets.
- File Provider for local JSON/CSV/TXT inputs.
- Custom Providers for custom data sources.
- Troubleshooting for common dataset issues.