AI Risks
HackAgent structures evaluation with three elements:
- Risk Profile: the set of one or more risk macro-categories (or micro-categories) to be tested.
- Evaluation Campaign: the executable evaluation plan (datasets, attacks, objective, metrics).
- Results: measured outcomes (for example ASR and judge score) used for tracking and comparison.
Security Workflow
Reference Risk Macro-Categories
The current taxonomy includes the following risk macro-categories:
- Cybersecurity
- Accountability & Transparency
- Explainability & Interpretability
- Fairness
- Validity, Accuracy & Robustness
- Data Governance
- Data Privacy
- People & Social Impact
- Safety
- Sustainability & Environmental Risk
- Maintainability & AI Infrastructure
- Third Party Management
Micro-Category Example
- Jailbreak is a risk micro-category under Cybersecurity.
What Is a Risk Profile
A risk profile is a set of one or more risk macro-categories (or micro-categories) that you want to test.
note
The defined categories are not intended to be mutually exclusive, nor to form an exhaustive partition of the risk space. Individual risks may span multiple categories, and overlaps are inherent to the taxonomy.
What Is Implemented Today
- Implemented Risk Macro-Category: Cybersecurity
- Implemented Risk Micro-Category: Jailbreak (under Cybersecurity)
- Current quick flow: Evaluation Campaign
Core Concepts
| Concept | Meaning |
|---|---|
| Risk Profile | Defines the scope of the assessment as one or more macro-categories (or micro-categories). |
| Evaluation Campaign | Operationalizes the risk profile into an executable plan: datasets, attacks, objective, and metrics. |
For the currently implemented configuration, see Jailbreak.
Documentation Guide
| Page | Description |
|---|---|
| Jailbreak | Cybersecurity risk profile + Jailbreak micro-category reference |
| Evaluation Campaign | Run the 3 primary jailbreak attacks in sequence |
Related Resources
- Evaluation Tutorial - Deep dive into PAIR attack configuration
- Attacks - Full catalog of available attack techniques
- CLI Reference - Command-line workflows for automated testing
- SDK Reference - Programmatic API documentation