Dataset Troubleshooting
Use this page when dataset loading fails, returns zero goals, or behaves unexpectedly.
1. No Goals Loaded
Symptoms
- Error about missing
goals/ empty dataset - Attack starts but immediately stops with no samples
Checks
- Confirm you provided either
datasetor explicitgoalsinattack_config. - If using presets, verify the preset name in Presets.
- Remove
offsettemporarily; it may skip all rows when combined with a smalllimit. - Disable filters temporarily (
filter_field,filter_value) to confirm the source contains matching rows.
2. HuggingFace Dataset Fails to Load
Symptoms
- Dataset not found
- Split/config errors
- Authentication errors for private repos
Checks
- Verify
dataset.pathis correct (owner/dataset_name). - Confirm
splitexists for that dataset. - If needed, set the correct config name in
dataset.name. - For private datasets, authenticate with
huggingface-cli loginfirst. - Ensure
goal_fieldexists in records.
Reference: HuggingFace Provider
3. Local File Provider Issues
Symptoms
- File not found
- Field extraction errors
- Empty goals from JSON/CSV
Checks
- Use an absolute path or confirm relative path from your current working directory.
- Verify file format is supported (
.json,.jsonl,.csv,.txt). - For JSON/CSV, ensure
goal_fieldmatches a real column/key. - For TXT, use one goal per line and remove empty lines.
Reference: File Provider
4. Reproducibility Problems
Symptoms
- Different results across runs with the same dataset source
Checks
- Set both
shuffle: trueand a fixedseed. - Keep the same
limitandoffsetacross runs. - Pin dataset version when possible (especially for frequently updated external datasets).
5. Slow Dataset Loading
Quick Fixes
- Reduce
limitfor local debugging. - Use presets for fast startup before moving to large external datasets.
- Avoid heavy filtering in the first pass; validate with a small sample first.
6. Debug Checklist
Run this minimal pattern to isolate dataset issues:
attack_config = {
"attack_type": "baseline",
"dataset": {
"preset": "simplesafetytests",
"limit": 10,
"shuffle": False,
},
}
If this works, increase complexity gradually: add shuffle, then seed, then custom provider/path/filter options.