hackagent.attacks.evaluator.evaluation_step

Base evaluation step for attack pipeline stages.

This module provides BaseEvaluationStep, the shared foundation for all evaluation pipeline stages across attack techniques (AdvPrefix, FlipAttack, etc.).

It centralises the common logic that was previously duplicated:

Multi-judge evaluation orchestration
Judge type inference from model identifiers
Agent type resolution (string / enum → AgentTypeEnum)
EvaluatorConfig construction from raw judge config dicts
Single evaluator instantiation and execution
Result merging via lookup keys (goal, prefix, completion)
Server sync via sync_evaluation_to_server
Best-score computation across judge columns
ASR logging

Subclasses only need to implement execute() and, optionally, override configuration or data-transformation hooks.

Usage: from hackagent.attacks.evaluator.evaluation_step import BaseEvaluationStep

class MyEvaluation(BaseEvaluationStep):
    def execute(self, input_data):
        ...

BaseEvaluationStep Objects

class BaseEvaluationStep()

Shared foundation for evaluation pipeline stages.

Provides multi-judge evaluation, result merging, server sync, best-score computation, and ASR logging. Subclasses implement execute() with technique-specific data transformation.

init

def __init__(config: Dict[str, Any], logger: logging.Logger,
             client: AuthenticatedClient)

Extract common tracking context and dependencies.

Arguments:

config - Step configuration dictionary (may contain _run_id, _client, _tracker internal keys).
logger - Logger instance.
client - AuthenticatedClient for backend API calls.

infer_judge_type

@staticmethod
def infer_judge_type(identifier: Optional[str],
                     default: Optional[str] = None) -> Optional[str]

Infer judge evaluator type from a model identifier string.

Checks for known substrings (harmbench, nuanced, jailbreak) and returns the matching type key, or default.

resolve_agent_type

def resolve_agent_type(agent_type_value: Any) -> AgentTypeEnum

Convert a string, enum, or None into an AgentTypeEnum.

compute_best_score

def compute_best_score(item: Dict[str, Any]) -> float

Return the best (max) binary score across all judge columns.

prepare_and_sync

def prepare_and_sync(evaluated_items: list, run_id: str)

Prepare evaluated items for backend sync:

Add _run_id if missing
Ensure result_id exists
Build judge_keys
Call _sync_to_server (only if not already synced by the attack)

get_statistics

def get_statistics() -> Dict[str, Any]

Return a copy of execution statistics.

BaseEvaluationStep Objects​

__init__​

infer_judge_type​

resolve_agent_type​

compute_best_score​

prepare_and_sync​

get_statistics​