hackagent.datasets.providers.huggingface
HuggingFace dataset provider for loading goals from HuggingFace Hub.
HuggingFaceDatasetProvider Objects
class HuggingFaceDatasetProvider(DatasetProvider)
Dataset provider for HuggingFace Hub datasets.
This provider loads datasets from HuggingFace Hub and extracts goal strings from specified fields. It supports filtering, splitting, and limiting samples.
Example:
provider = HuggingFaceDatasetProvider({
"path"- "ai-safety-institute/AgentHarm","name"- "harmful","goal_field"- "prompt","split"- "test_public", }) goals = provider.load_goals(limit=100)
__init__
def __init__(config: Dict[str, Any])
Initialize the HuggingFace dataset provider.
Arguments:
config- Configuration dictionary with keys:- path (str): HuggingFace dataset path (e.g., "ai-safety-institute/AgentHarm")
- goal_field (str): Field name containing the goal/prompt text
- split (str, optional): Dataset split to use (default: "test")
- name (str, optional): Dataset configuration name
- fallback_fields (list, optional): Alternative fields if goal_field not found
- trust_remote_code (bool, optional): Whether to trust remote code (default: False)
load_goals
def load_goals(limit: Optional[int] = None,
shuffle: bool = False,
seed: Optional[int] = None,
**kwargs) -> List[str]
Load goals from the HuggingFace dataset.
Arguments:
limit- Maximum number of goals to return.shuffle- Whether to shuffle the dataset before selecting.seed- Random seed for shuffling.**kwargs- Additional arguments (unused).
Returns:
List of goal strings.
get_metadata
def get_metadata() -> Dict[str, Any]
Return metadata about the loaded dataset.