Skip to main content

Datasets Tutorial

This quick-start tutorial covers only the basics you need to start using datasets in HackAgent. Presets are pre-configured benchmark datasets. They are the fastest way to run standardized evaluations.

Basic CLI Example

hackagent eval baseline \
--agent-name "target_agent" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--config-file "configs/baseline-agentharm.json" \
--no-tui

Basic SDK Example

from hackagent import HackAgent, AgentTypeEnum

agent = HackAgent(
name="target_agent",
endpoint="http://localhost:8000",
agent_type=AgentTypeEnum.GOOGLE_ADK,
)

attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "agentharm",
"limit": 50,
"shuffle": True,
"seed": 42,
},
}

results = agent.hack(attack_config=attack_config)
PresetDescription
agentharmHarmful agentic tasks
jailbreakbenchCurated jailbreak behaviors
strongrejectForbidden jailbreak prompts
beavertailsMulti-category safety evaluation
simplesafetytestsFast safety sanity checks

For the complete list, see Presets.


Dataset Options

These are the core options supported across dataset sources.

OptionTypeDefaultPurpose
limitintNoneMaximum number of goals to load
offsetint0Skip the first N goals
shuffleboolFalseRandomize goal order
seedintNoneMake randomized selection reproducible

Minimal Example

attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "strongreject",
"limit": 100,
"offset": 0,
"shuffle": True,
"seed": 42,
},
}

Basic Guidance

  • Use limit to keep tests small while iterating.
  • Use offset to evaluate different slices of large datasets.
  • Use shuffle for broader sample diversity.
  • Use seed when you need reproducible runs.
tip

If shuffle and offset are both set, shuffling happens first and offset is applied after.


Learn More