hackagent.attacks.evaluator.base

Base class for LLM-based judge evaluators.

This module provides the abstract base class BaseJudgeEvaluator and the AssertionResult dataclass used by all judge evaluator implementations.

The base class implements a template-method evaluate() that handles the full pipeline of preparing data, filtering short responses, sending to the judge model, and mapping results back. Subclasses only need to implement:

_get_request_data_for_row(row) — format the LLM prompt
_parse_response_content(content, index) — parse the LLM reply

It also implements a DSPy-inspired assert-and-retry loop for robust judge output parsing.

Usage: from hackagent.attacks.evaluator.base import ( BaseJudgeEvaluator, AssertionResult, )

AssertionResult Objects

@dataclass(frozen=True)
class AssertionResult()

Result of a judge output assertion check (DSPy-inspired).

In DSPy, an assertion validates that a module's output satisfies a typed contract. Here the contract is: "the judge must return a parseable verdict."

Attributes:

score - Parsed evaluation score (0 or 1).
explanation - Human-readable explanation of the verdict.
is_confident - True if the parser matched with high confidence (strategies 1-3). False if it fell back to the "Unknown" default — the signal that a retry is worthwhile.

BaseJudgeEvaluator Objects

class BaseJudgeEvaluator(ABC)

Abstract base class for LLM-based judge evaluators.

Provides a template-method evaluate() that handles the full pipeline of preparing data, filtering short responses, sending to the judge model, and mapping results back. Subclasses only need to implement:

_get_request_data_for_row(row) — format the LLM prompt
_parse_response_content(content, index) — parse the LLM reply

Class attributes for subclasses: eval_column (str): Column name for the evaluation score. explanation_column (str): Column name for the explanation. PROMPT (str): Prompt template for the judge. skip_length_filter (bool): If True, don't filter by response length.

init

def __init__(client: AuthenticatedClient,
             config: Any,
             run_id: Optional[str] = None,
             tracking_client: Optional[AuthenticatedClient] = None,
             tracker: Optional["Tracker"] = None)

Initialize the judge evaluator.

Arguments:

client - Authenticated client for API access.
config - EvaluatorConfig dataclass with model and eval settings.
run_id - Optional run ID for result tracking.
tracking_client - Optional dedicated tracking client.
tracker - Optional Tracker for per-goal result tracking.

prepare_responses

def prepare_responses(data: List[Dict[str, Any]]) -> List[Dict[str, Any]]

Prepare and standardize response data for evaluation processing.

evaluate

def evaluate(data: List[Dict[str, Any]]) -> List[Dict[str, Any]]

Evaluate responses using this judge. Template method.

Pipeline:

Prepare responses (standardize keys)
Add tracking indices
Split into filtered (short) and processable rows
Mark filtered rows with score 0
Process rows via judge LLM
Map results back by index
Clean up temporary indices

Subclasses control filtering via skip_length_filter.

AssertionResult Objects​

BaseJudgeEvaluator Objects​

__init__​

prepare_responses​

evaluate​

AssertionResult Objects

BaseJudgeEvaluator Objects

init

prepare_responses

evaluate