Skip to main content
HackAgent - AI Agent Security Testing Toolkit
The Open-Source AI Security Red-Team Toolkit

Discover vulnerabilities in your AI agents before attackers do.

Python VersionLicenseTest CoverageCI Status

What is HackAgent?

HackAgent is a comprehensive Python SDK and CLI designed to help security researchers, developers, and AI safety practitioners evaluate and strengthen the security of AI agents.

HackAgent CLI Demo

Interactive TUI with real-time attack progress and beautiful visualizations

As AI agents become more powerful and autonomous, they face unique security challenges that traditional testing tools can't address:

ThreatDescription
Prompt InjectionMalicious inputs that hijack agent behavior
JailbreakingBypassing safety guardrails and content filters
Goal HijackingManipulating agents to pursue unintended objectives
Tool MisuseExploiting agent capabilities for unauthorized actions

HackAgent automates testing for these vulnerabilities using research-backed attack techniques, helping you identify and fix security issues before they're exploited in the real world.


Get Started Now

Quick Install
python3 -m venv .venv
source .venv/bin/activate
pip install git+https://github.com/AISecurityLab/HackAgent.git
Works locally out of the box, with optional cloud sync via HACKAGENT_API_KEY

Questions? Join our community discussions or email us at ais@ai4i.it


Architecture

HackAgent is built with a modular architecture that makes it easy to test any AI agent:

Inputs
Goals
Custom Goals
Datasets
AgentHarmJailbreakBenchHarmBenchAdvBenchStrongREJECTBeaverTailsSALAD-BenchWMDP (Bio/Cyber/Chem)AIR-BenchToxicChatHuggingFace CustomFile (JSON/CSV/JSONL/TXT)
HackAgent
Attack Engine
AdvPrefixAutoDAN-TurboPAIRTAPFlipAttackBoNh4rm3lCipherChatPAPBaseline
LLM Models
Attacker/Generator/DecoratorSummarizer (AutoDAN-Turbo only)Judge/ScorerCategory classifier
Your Agent
Google ADKOpenAI SDKLiteLLMLangChainOllamavLLM
Output
ResultsReportsDashboard
ComponentDescription
Attack EngineOrchestrates attacks using AdvPrefix, AutoDAN-Turbo, PAIR, TAP, FlipAttack, BoN, h4rm3l, CipherChat, PAP, and Baseline
GeneratorLLM that creates adversarial prompts to test the target agent
JudgeLLM that evaluates whether attacks successfully bypassed safety measures
Target AgentYour AI agent being tested (supports multiple frameworks)
DatasetsPre-built benchmark presets plus custom HuggingFace/file datasets

Supported Frameworks

Google ADKOpenAI SDKLiteLLMLangChain

Responsible Use

HackAgent is designed for authorized security testing only. Always obtain explicit permission before testing any AI system.

Do
• Test your own agents
• Conduct authorized pentesting
• Follow coordinated disclosure
• Share knowledge responsibly
Don't
• Test without permission
• Exploit vulnerabilities maliciously
• Violate terms of service
• Share exploits irresponsibly

Read our full Responsible Use Guidelines →