HuggingFace Provider
Load goals from any dataset on HuggingFace Hub — access thousands of datasets including safety benchmarks, question-answering datasets, and custom evaluations.
Use the HuggingFace provider when you want to:
- Load datasets not covered by presets
- Use your own private HuggingFace datasets
- Access the latest versions of datasets
- Customize dataset configurations
Initial Setup (Authentication)
Public datasets usually work without authentication. For private or gated datasets, you should set a Hugging Face token first.
- Create a token on Hugging Face:
- Go to https://huggingface.co/settings/tokens
- Create a token with at least
readpermission
- Export it in your shell as
HF_TOKEN:
export HF_TOKEN="hf_xxx_your_token_here"
- (Optional) Persist it in your shell profile (
~/.zshrc,~/.bashrc) so it is available in new sessions.
Quick check:
python -c "from datasets import load_dataset; ds = load_dataset('ag_news', split='train[:1]'); print(len(ds))"
If this succeeds, your Hugging Face access is correctly configured.
Configuration Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | Yes | — | Must be "huggingface" |
path | string | Yes | — | Dataset path (e.g., "ai-safety-institute/AgentHarm") |
goal_field | string | No | "input" | Field containing the goal text |
split | string | No | "test" | Dataset split to use |
name | string | No | — | Configuration name (for multi-config datasets) |
fallback_fields | list | No | ["input", "prompt", "question", "text"] | Alternative fields if primary not found |
trust_remote_code | bool | No | false | Trust remote code execution |
limit | int | No | — | Maximum number of goals |
shuffle | bool | No | false | Randomize goal selection |
seed | int | No | — | Random seed for reproducibility |
Authentication is handled by the Hugging Face ecosystem (datasets / huggingface_hub).
This provider does not define a dedicated token field in the config, so use HF_TOKEN in the environment when required.
Basic Usage
attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"goal_field": "prompt",
"split": "test_public",
}
}
Multi-Configuration Datasets
Some datasets have multiple configurations. Use name to specify:
attack_config = {
"attack_type": "advprefix",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful", # Configuration name
"goal_field": "prompt",
"split": "test_public",
}
}
Fallback Fields
When the primary goal_field doesn't exist, the provider tries fallback_fields in order:
attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "your-org/your-dataset",
"goal_field": "objective",
"fallback_fields": ["prompt", "instruction", "query"], # Tried if "objective" not found
}
}
Remote Code Execution
Some datasets require running remote code. Enable with caution:
attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "some-org/custom-dataset",
"trust_remote_code": True, # ⚠️ Security risk - only for trusted sources
}
}
Only set trust_remote_code: True for datasets from trusted sources. This allows arbitrary Python code execution from the dataset repository, which could be malicious.
Practical Examples
Example 1: Testing with Different Splits
# Test on public split
public_results = agent.hack(attack_config={
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful",
"goal_field": "prompt",
"split": "test_public",
"limit": 50,
}
})
# Compare with held-out split (if you have access)
held_out_results = agent.hack(attack_config={
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful",
"goal_field": "prompt",
"split": "test_held_out",
"limit": 50,
}
})
Example 2: Loading Private Datasets
# Ensure HF_TOKEN is set in the environment before running this
attack_config = {
"attack_type": "advprefix",
"dataset": {
"provider": "huggingface",
"path": "your-org/private-safety-dataset",
"goal_field": "attack_prompt",
"split": "test",
}
}
Example 3: Sampling Strategy
# Random sampling for diversity
random_sample = {
"provider": "huggingface",
"path": "PKU-Alignment/BeaverTails",
"goal_field": "prompt",
"split": "330k_test",
"limit": 500,
"shuffle": True,
"seed": 42,
}
# Sequential sampling for systematic testing
sequential_sample = {
"provider": "huggingface",
"path": "PKU-Alignment/BeaverTails",
"goal_field": "prompt",
"split": "330k_test",
"limit": 500,
"offset": 0, # Start from beginning
"shuffle": False,
}
Programmatic Access
from hackagent.datasets import load_goals
goals = load_goals(
provider="huggingface",
path="ai-safety-institute/AgentHarm",
name="harmful",
goal_field="prompt",
split="test_public",
limit=50,
shuffle=True,
seed=42,
)
print(f"Loaded {len(goals)} goals")
print(goals[0]) # First goal
Finding the Right Field
To discover available fields in a dataset:
from datasets import load_dataset
ds = load_dataset("ai-safety-institute/AgentHarm", "harmful", split="test_public")
print(ds.features) # Shows all fields
print(ds[0]) # Shows first record