Awesome AI for Security 
A curated list of tools, papers, and datasets for applying AI to cybersecurity tasks. This list primarily focuses on modern AI technologies like Large Language Models (LLMs), Agents, and Multi-Modal systems and their applications in security operations.
Found this resource helpful for your security research? Consider adding a star ⭐ to support the ongoing curation effort.
Contents
Other collections and lists that may be of interest.
Models
AI models specialized for security applications and scenarios.
Specialized Security Models
- Foundation-Sec-8B - Recent 8B parameter security-specialized model outperforming Llama 3.1 8B by +3.25% on CTI-MCQA and +8.83% on CTI-RCM, rivaling 70B models with 10x fewer parameters.
- Llama-Primus-Base - Foundation model with cybersecurity-specific pretraining on proprietary corpus.
- Llama-Primus-Merged - Combined model through pretraining and instruction fine-tuning.
- Llama-Primus-Reasoning - Reasoning-specialized model enhancing security certification through o1-distilled reasoning patterns.
Datasets
Resources designed for training and fine-tuning AI systems on security-related tasks.
Pre-Training Datasets
- Primus-FineWeb - Filtered cybersecurity corpus (2.57B tokens) derived from FineWeb using classifier-based selection.
IFT & Capability Datasets
- Primus-Reasoning - Cybersecurity reasoning tasks with o1-generated reasoning steps and reflection processes.
- Primus-Instruct - Expert-curated cybersecurity scenario instructions with GPT-4o generated responses spanning diverse tasks.
Benchmarks & Evaluation
This section covers frameworks and methodologies for evaluating AI systems within security contexts.
Vulnerability Assessment
- AutoPatchBench - Benchmark for automated repair of fuzzing-detected vulnerabilities, pioneering evaluation standards.
- SecLLMHolmes - Automated framework for systematic LLM vulnerability detection evaluation across multiple dimensions.
Threat Intelligence
- CTI-Bench - Benchmark suite for evaluating LLMs on cyber threat intelligence tasks.
- SECURE - Practical cybersecurity scenario dataset focusing on extraction, understanding, and reasoning capabilities.
Offensive Security
- NYU CTF Bench - Dockerized CTF challenges repository enabling automated LLM agent interaction across categories.
General Security Knowledge
- CyberSecEval 4 - Comprehensive benchmark suite for assessing LLM cybersecurity vulnerabilities with multi-vendor evaluations.
- SecBench - Largest comprehensive benchmark dataset distinguishing between knowledge and reasoning questions.
- MMLU Computer Security - Standard benchmark with dedicated computer security evaluation subset for general LLMs.
- MMLU Security Studies - General benchmark’s security studies subset providing broader security knowledge assessment.
Publications
Academic and industry research on AI applications in security.
Models & Datasets
- Foundation-Sec Technical Report - Detailed methodology for domain-adaptation of Llama-3.1 for cybersecurity applications.
- Primus Paper - First open-source cybersecurity dataset collection addressing critical pretraining corpus shortage.
Benchmarking & Evaluations
- SecBench Paper - Multi-dimensional benchmark dataset with unprecedented scale for LLM cybersecurity evaluation.
- NYU CTF Bench Paper - First scalable benchmark focusing on offensive security through CTF challenges.
- SECURE Paper - Industry-focused benchmark targeting Industrial Control System security knowledge evaluation.
- CyberMetric Paper - RAG-based cybersecurity benchmark with human-validated questions across diverse knowledge areas.
- SecLLMHolmes Paper - Comprehensive analysis revealing significant non-robustness in LLM vulnerability identification capabilities.
- LLM Offensive Security Benchmarking - Analysis of evaluation methodologies for LLM-driven offensive security tools with recommendations.
Other
- OffsecML Playbook - Comprehensive collection of offensive and adversarial techniques with practical demonstrations.
- MCP-Security-Checklist - Comprehensive security checklist for MCP-based AI tools by SlowMist.
Software tools that implement AI for security applications.
Adversarial ML
- DeepFool - Simple yet accurate method for generating adversarial examples against deep neural networks.
- Counterfit - Automation layer for comprehensive ML system security assessment across multiple attack vectors.
- Charcuterie - Collection of code execution techniques targeting ML libraries for security evaluation.
Security Testing
- garak - Specialized security probing tool designed specifically for LLM vulnerability assessment.
- Snaike-MLFlow - MLflow-focused red team toolsuite for attacking ML pipelines and infrastructure.
- MCP-Scan - Security scanning tool specifically designed for Model Context Protocol servers.
Learning Environments
- Malware Env for OpenAI Gym - Reinforcement learning environment enabling malware manipulation for AV bypass learning.
- Deep-pwning - Framework for assessing ML model robustness against adversarial attacks through systematic evaluation.
Security Agents
AI systems designed to perform security-related tasks with varying degrees of autonomy.
Autonomous Agents
- HackingBuddyGPT - Autonomous pentesting agent with corresponding benchmark dataset for standardized evaluation.
- Agentic Radar - Open-source CLI security scanner for agentic workflows with automated detection.
Red Team Agents
- HackGPT - LLM-powered tool designed specifically for offensive security and ethical hacking.
- agentic_security - LLM vulnerability scanner specializing in agentic systems and workflows.
Contribute
Contributions welcome! Read the contribution guidelines first.
Star History

License
CC0