HackAgent

The Open-Source AI Security Red-Team Toolkit

Discover vulnerabilities in your AI agents before attackers do.

What is HackAgent?

HackAgent is a comprehensive Python SDK and CLI designed to help security researchers, developers, and AI safety practitioners evaluate and strengthen the security of AI agents.

Interactive TUI with real-time attack progress and beautiful visualizations

As AI agents become more powerful and autonomous, they face unique security challenges that traditional testing tools can't address:

Threat	Description
Prompt Injection	Malicious inputs that hijack agent behavior
Jailbreaking	Bypassing safety guardrails and content filters
Goal Hijacking	Manipulating agents to pursue unintended objectives
Tool Misuse	Exploiting agent capabilities for unauthorized actions

HackAgent automates testing for these vulnerabilities using research-backed attack techniques, helping you identify and fix security issues before they're exploited in the real world.

Get Started Now

Quick Install

python3 -m venv .venv
source .venv/bin/activate
pip install git+https://github.com/AISecurityLab/HackAgent.git

Works locally out of the box, with optional cloud sync via HACKAGENT_API_KEY

Try the Platform Quick Start Star on GitHub

Next Steps

Installation CLI Reference Attack Techniques Integrations

Questions? Join our community discussions or email us at ais@ai4i.it

Architecture

HackAgent is built with a modular architecture that makes it easy to test any AI agent:

Inputs

Goals

Custom Goals

Datasets

AgentHarmJailbreakBenchHarmBenchAdvBenchStrongREJECTBeaverTailsSALAD-BenchWMDP (Bio/Cyber/Chem)AIR-BenchToxicChatHuggingFace CustomFile (JSON/CSV/JSONL/TXT)

↓

HackAgent

Attack Engine

AdvPrefixAutoDAN-TurboPAIRTAPFlipAttackBoNh4rm3lCipherChatPAPBaseline

LLM Models

Attacker/Generator/DecoratorSummarizer (AutoDAN-Turbo only)Judge/ScorerCategory classifier

⇄

Your Agent

Google ADKOpenAI SDKLiteLLMLangChainOllamavLLM

↓

Output

ResultsReportsDashboard

Component	Description
Attack Engine	Orchestrates attacks using AdvPrefix, AutoDAN-Turbo, PAIR, TAP, FlipAttack, BoN, h4rm3l, CipherChat, PAP, and Baseline
Generator	LLM that creates adversarial prompts to test the target agent
Judge	LLM that evaluates whether attacks successfully bypassed safety measures
Target Agent	Your AI agent being tested (supports multiple frameworks)
Datasets	Pre-built benchmark presets plus custom HuggingFace/file datasets

Supported Frameworks

Responsible Use

HackAgent is designed for authorized security testing only. Always obtain explicit permission before testing any AI system.

✅ Do

• Test your own agents

• Conduct authorized pentesting

• Follow coordinated disclosure

• Share knowledge responsibly

❌ Don't

• Test without permission

• Exploit vulnerabilities maliciously

• Violate terms of service

• Share exploits irresponsibly

Read our full Responsible Use Guidelines →

What is HackAgent?​

Get Started Now​

Architecture​

Supported Frameworks​

Responsible Use​

What is HackAgent?

Get Started Now

Architecture

Supported Frameworks

Responsible Use