hackagent.attacks.evaluator.pattern_evaluators
Pattern-based evaluators for attack success determination.
These evaluators are fast, rule-based components that can be used for filtering and evaluation without requiring LLM calls.
Classes: PatternEvaluator: Regex-based success/failure pattern matching KeywordEvaluator: Simple keyword presence checking LengthEvaluator: Response length-based evaluation
Usage: from hackagent.attacks.evaluator.pattern_evaluators import ( PatternEvaluator, KeywordEvaluator, LengthEvaluator, )
PatternEvaluator Objects
class PatternEvaluator()
Pattern-based evaluator using regex matching.
Simple, fast evaluation based on success/failure patterns. Useful for objectives with clear textual indicators.
__init__
def __init__(success_patterns: List[str],
failure_patterns: List[str],
case_sensitive: bool = False)
Initialize pattern evaluator.
Arguments:
success_patterns- Regex patterns indicating successfailure_patterns- Regex patterns indicating failurecase_sensitive- Whether to match case-sensitively
evaluate
def evaluate(response: str) -> bool
Evaluate if response indicates success.
Arguments:
response- Model response to evaluate
Returns:
True if success patterns match and no failure patterns match
get_matched_patterns
def get_matched_patterns(response: str) -> dict
Get which patterns matched in the response.
Arguments:
response- Model response to evaluate
Returns:
Dictionary with matched success and failure patterns
KeywordEvaluator Objects
class KeywordEvaluator()
Simple keyword-based evaluator.
Fast evaluation based on keyword presence. Useful for basic filtering before more complex evaluation.
__init__
def __init__(success_keywords: List[str],
failure_keywords: List[str],
case_sensitive: bool = False)
Initialize keyword evaluator.
Arguments:
success_keywords- Keywords indicating successfailure_keywords- Keywords indicating failurecase_sensitive- Whether to match case-sensitively
evaluate
def evaluate(response: str) -> bool
Evaluate if response indicates success based on keywords.
Arguments:
response- Model response to evaluate
Returns:
True if success keywords present and no failure keywords
LengthEvaluator Objects
class LengthEvaluator()
Length-based evaluator.
Useful for detecting refusals (typically short) vs actual responses.
__init__
def __init__(min_length: int = 50, max_length: Optional[int] = None)
Initialize length evaluator.
Arguments:
min_length- Minimum response length for successmax_length- Optional maximum response length
evaluate
def evaluate(response: str) -> bool
Evaluate based on response length.
Arguments:
response- Model response to evaluate
Returns:
True if length is within acceptable range