Skip to main content

hackagent.datasets.providers.huggingface

HuggingFace dataset provider for loading goals from HuggingFace Hub.

HuggingFaceDatasetProvider Objects

class HuggingFaceDatasetProvider(DatasetProvider)

Dataset provider for HuggingFace Hub datasets.

This provider loads datasets from HuggingFace Hub and extracts goal strings from specified fields. It supports filtering, splitting, and limiting samples.

Example:

provider = HuggingFaceDatasetProvider({

  • "path" - "ai-safety-institute/AgentHarm",
  • "name" - "harmful",
  • "goal_field" - "prompt",
  • "split" - "test_public", }) goals = provider.load_goals(limit=100)

__init__

def __init__(config: Dict[str, Any])

Initialize the HuggingFace dataset provider.

Arguments:

  • config - Configuration dictionary with keys:
    • path (str): HuggingFace dataset path (e.g., "ai-safety-institute/AgentHarm")
    • goal_field (str): Field name containing the goal/prompt text
    • split (str, optional): Dataset split to use (default: "test")
    • name (str, optional): Dataset configuration name
    • fallback_fields (list, optional): Alternative fields if goal_field not found
    • trust_remote_code (bool, optional): Whether to trust remote code (default: False)

load_goals

def load_goals(limit: Optional[int] = None,
shuffle: bool = False,
seed: Optional[int] = None,
**kwargs) -> List[str]

Load goals from the HuggingFace dataset.

Arguments:

  • limit - Maximum number of goals to return.
  • shuffle - Whether to shuffle the dataset before selecting.
  • seed - Random seed for shuffling.
  • **kwargs - Additional arguments (unused).

Returns:

List of goal strings.

get_metadata

def get_metadata() -> Dict[str, Any]

Return metadata about the loaded dataset.