hackagent.attacks.evaluator.evaluation_step
Base evaluation step for attack pipeline stages.
This module provides BaseEvaluationStep, the shared foundation for all
evaluation pipeline stages across attack techniques (AdvPrefix, FlipAttack, etc.).
It centralises the common logic that was previously duplicated:
- Multi-judge evaluation orchestration
- Judge type inference from model identifiers
- Agent type resolution (string / enum →
AgentTypeEnum) EvaluatorConfigconstruction from raw judge config dicts- Single evaluator instantiation and execution
- Result merging via lookup keys
(goal, prefix, completion) - Server sync via
sync_evaluation_to_server - Best-score computation across judge columns
- ASR logging
Subclasses only need to implement execute() and, optionally, override
configuration or data-transformation hooks.
Usage: from hackagent.attacks.evaluator.evaluation_step import BaseEvaluationStep
class MyEvaluation(BaseEvaluationStep):
def execute(self, input_data):
...
BaseEvaluationStep Objects
class BaseEvaluationStep()
Shared foundation for evaluation pipeline stages.
Provides multi-judge evaluation, result merging, server sync,
best-score computation, and ASR logging. Subclasses implement
execute() with technique-specific data transformation.
__init__
def __init__(config: Dict[str, Any], logger: logging.Logger,
client: AuthenticatedClient)
Extract common tracking context and dependencies.
Arguments:
config- Step configuration dictionary (may contain_run_id,_client,_trackerinternal keys).logger- Logger instance.client-AuthenticatedClientfor backend API calls.
infer_judge_type
@staticmethod
def infer_judge_type(identifier: Optional[str],
default: Optional[str] = None) -> Optional[str]
Infer judge evaluator type from a model identifier string.
Checks for known substrings (harmbench, nuanced,
jailbreak) and returns the matching type key, or default.
resolve_agent_type
def resolve_agent_type(agent_type_value: Any) -> AgentTypeEnum
Convert a string, enum, or None into an AgentTypeEnum.
compute_best_score
def compute_best_score(item: Dict[str, Any]) -> float
Return the best (max) binary score across all judge columns.
prepare_and_sync
def prepare_and_sync(evaluated_items: list, run_id: str)
Prepare evaluated items for backend sync:
- Add _run_id if missing
- Ensure result_id exists
- Build judge_keys
- Call _sync_to_server (only if not already synced by the attack)
get_statistics
def get_statistics() -> Dict[str, Any]
Return a copy of execution statistics.