hackagent.attacks.base
BaseAttack Objects
class BaseAttack(abc.ABC)
Abstract base class for black-box attacks against language models.
This class provides the foundational interface and structure that all attack implementations must follow. It handles common initialization patterns and enforces a consistent API across different attack types.
Attributes:
config- A dictionary containing configuration settings for the attack.
__init__
def __init__(config: Dict[str, Any])
Initializes the attack with configuration parameters.
Arguments:
config- A dictionary containing configuration settings for the attack. Must include all required parameters for the specific attack type.
Raises:
TypeError- If config is not a dictionary.ValueError- If required configuration parameters are missing or invalid.
run
@abc.abstractmethod
def run(**kwargs: Any) -> Any
Executes the attack logic.
This abstract method must be implemented by all attack subclasses to define their specific attack methodology and execution flow.
Arguments:
**kwargs- Attack-specific arguments that vary by implementation. Common examples include:- input_prompts: List of prompts to test
- goals: List of target goals for the attack
- dataset: Input dataset for evaluation
- target_model: The model to attack
Returns:
Attack-specific results. The format varies by implementation but typically includes:
- adversarial_examples: Generated adversarial inputs
- success_metrics: Attack success rates and statistics
- detailed_results: Comprehensive result data (e.g., pandas DataFrame)
- attack_report: Summary of attack performance
Raises:
NotImplementedError- If the method is not implemented by a subclass.RuntimeError- If the attack execution fails due to configuration or runtime errors.