Skip to main content

HuggingFace Provider

Load goals from any dataset on HuggingFace Hub — access thousands of datasets including safety benchmarks, question-answering datasets, and custom evaluations.

When to Use

Use the HuggingFace provider when you want to:

  • Load datasets not covered by presets
  • Use your own private HuggingFace datasets
  • Access the latest versions of datasets
  • Customize dataset configurations

Initial Setup (Authentication)

Public datasets usually work without authentication. For private or gated datasets, you should set a Hugging Face token first.

  1. Create a token on Hugging Face:
  2. Export it in your shell as HF_TOKEN:
export HF_TOKEN="hf_xxx_your_token_here"
  1. (Optional) Persist it in your shell profile (~/.zshrc, ~/.bashrc) so it is available in new sessions.

Quick check:

python -c "from datasets import load_dataset; ds = load_dataset('ag_news', split='train[:1]'); print(len(ds))"

If this succeeds, your Hugging Face access is correctly configured.

Configuration Options

OptionTypeRequiredDefaultDescription
providerstringYesMust be "huggingface"
pathstringYesDataset path (e.g., "ai-safety-institute/AgentHarm")
goal_fieldstringNo"input"Field containing the goal text
splitstringNo"test"Dataset split to use
namestringNoConfiguration name (for multi-config datasets)
fallback_fieldslistNo["input", "prompt", "question", "text"]Alternative fields if primary not found
trust_remote_codeboolNofalseTrust remote code execution
limitintNoMaximum number of goals
shuffleboolNofalseRandomize goal selection
seedintNoRandom seed for reproducibility

Authentication is handled by the Hugging Face ecosystem (datasets / huggingface_hub). This provider does not define a dedicated token field in the config, so use HF_TOKEN in the environment when required.


Basic Usage

attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"goal_field": "prompt",
"split": "test_public",
}
}

Multi-Configuration Datasets

Some datasets have multiple configurations. Use name to specify:

attack_config = {
"attack_type": "advprefix",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful", # Configuration name
"goal_field": "prompt",
"split": "test_public",
}
}

Fallback Fields

When the primary goal_field doesn't exist, the provider tries fallback_fields in order:

attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "your-org/your-dataset",
"goal_field": "objective",
"fallback_fields": ["prompt", "instruction", "query"], # Tried if "objective" not found
}
}

Remote Code Execution

Some datasets require running remote code. Enable with caution:

attack_config = {
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "some-org/custom-dataset",
"trust_remote_code": True, # ⚠️ Security risk - only for trusted sources
}
}
Security Warning

Only set trust_remote_code: True for datasets from trusted sources. This allows arbitrary Python code execution from the dataset repository, which could be malicious.


Practical Examples

Example 1: Testing with Different Splits

# Test on public split
public_results = agent.hack(attack_config={
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful",
"goal_field": "prompt",
"split": "test_public",
"limit": 50,
}
})

# Compare with held-out split (if you have access)
held_out_results = agent.hack(attack_config={
"attack_type": "baseline",
"dataset": {
"provider": "huggingface",
"path": "ai-safety-institute/AgentHarm",
"name": "harmful",
"goal_field": "prompt",
"split": "test_held_out",
"limit": 50,
}
})

Example 2: Loading Private Datasets

# Ensure HF_TOKEN is set in the environment before running this

attack_config = {
"attack_type": "advprefix",
"dataset": {
"provider": "huggingface",
"path": "your-org/private-safety-dataset",
"goal_field": "attack_prompt",
"split": "test",
}
}

Example 3: Sampling Strategy

# Random sampling for diversity
random_sample = {
"provider": "huggingface",
"path": "PKU-Alignment/BeaverTails",
"goal_field": "prompt",
"split": "330k_test",
"limit": 500,
"shuffle": True,
"seed": 42,
}

# Sequential sampling for systematic testing
sequential_sample = {
"provider": "huggingface",
"path": "PKU-Alignment/BeaverTails",
"goal_field": "prompt",
"split": "330k_test",
"limit": 500,
"offset": 0, # Start from beginning
"shuffle": False,
}

Programmatic Access

from hackagent.datasets import load_goals

goals = load_goals(
provider="huggingface",
path="ai-safety-institute/AgentHarm",
name="harmful",
goal_field="prompt",
split="test_public",
limit=50,
shuffle=True,
seed=42,
)

print(f"Loaded {len(goals)} goals")
print(goals[0]) # First goal

Finding the Right Field

To discover available fields in a dataset:

from datasets import load_dataset

ds = load_dataset("ai-safety-institute/AgentHarm", "harmful", split="test_public")
print(ds.features) # Shows all fields
print(ds[0]) # Shows first record