CVE Triage Agent

A stateful multi-agent framework for investigating, triaging, and resolving software vulnerabilities.

The CVE Triage Agent ingests security advisories (PDF, HTML, plain text, or the NVD API), extracts structured vulnerability data using an LLM, stores it in PostgreSQL and a ChromaDB vector index, and runs a LangGraph agentic pipeline to produce prioritised triage reports. The system is designed for security teams that need to process a high volume of advisories and decide quickly what to act on. The entire stack runs locally via Docker Compose with Ollama, so no data leaves the machine.

Check out the code here -> https://github.com/peterstringer/cve-triage-agent

The agent pipeline takes 30–60 seconds per triage, depending on how many gather-assess loops are needed. It runs entirely on a local Llama 3.1 8B model via Ollama, with optional support for OpenAI and Anthropic backends.

A vulnerability triage report generated using a GitHub repository link.

How it works

The core of the system is a LangGraph state machine with five nodes: parse_query extracts CVE IDs, products, and vendors from the user's input; gather calls up to six tools (knowledge base search, NVD API lookup, CISA KEV exploit check, PostgreSQL advisory query, dependency scan via OSV.dev, and a deterministic scoring function); assess synthesises the findings and identifies information gaps; prioritise applies a weighted scoring formula; and report generates a human-readable summary. A conditional edge between assess and gather loops up to three times if gaps remain, so the agent autonomously decides when it has enough information.

Priority scoring is deterministic and not LLM-generated:

priority =
(cvss_normalised * 0.4) + (exploit_known * 0.3) +
(recency * 0.2) +
(attack_vector * 0.1)

Exploit status comes from the CISA Known Exploited Vulnerabilities catalogue, cached for 24 hours. Recency decays linearly to zero over 365 days.

Ingestion supports multiple paths. Uploaded files are parsed (PDF via LlamaIndex, HTML via BeautifulSoup, plain text directly), then sent to the LLM for structured extraction with up to 3 retries and a partial extraction fallback. The NVD client is an async generator that paginates through the NVD 2.0 API, splitting date ranges into 120-day windows to respect their API constraints, and streams progress to the frontend via SSE. Extracted vulnerabilities are chunked (512 tokens, 50 overlap), embedded with nomic-embed-text, and upserted into ChromaDB.

Search uses a hybrid approach: semantic similarity from ChromaDB is combined with severity, recency, and exploit status in a multi-factor re-ranking formula. The engine over-fetches 2x the requested limit to give the re-ranker enough headroom.

The frontend streams triage progress over WebSocket, showing each node transition and tool call as it happens. NVD bulk fetches stream via SSE with live counts of fetched, ingested, and deduplicated CVEs.

What I found interesting

The gather-assess loop is where the agent earns its keep. On the first pass the agent typically searches the knowledge base and queries the NVD. The assess node identifies what is missing, and the second pass targets those gaps specifically. Three iterations is sufficient; additional loops produce diminishing returns.

LLM extraction needs a fallback path. Llama 3.1 8B produces valid JSON most of the time, but occasionally returns partial or malformed output. Retrying up to 3 times catches most failures. When that is not enough, the partial extraction fallback validates each field individually against the Pydantic schema and keeps whatever parsed correctly. This recovers usable data from responses that would otherwise be discarded entirely.

The async-sync bridge is unavoidable with LangChain tools. LangChain's @tool decorator expects synchronous functions, but several tools need async database or HTTP access. The workaround detects whether an event loop is already running and dispatches the coroutine to a ThreadPoolExecutor with asyncio.run() to avoid blocking the agent loop.

Separating the scoring from the LLM was deliberate. Early versions had the LLM assign priority scores directly. The scores were inconsistent and difficult to explain. Moving to a deterministic formula with explicit weights made the output reproducible and auditable, while still letting the LLM handle the parts it is good at: parsing unstructured text and synthesising findings.

Background

This project was built to demonstrate agentic AI applied to a practical cybersecurity workflow. The engineering focus is on the integration surface between LLMs and structured systems: getting reliable structured output from unreliable LLM responses, bridging async and sync execution contexts, and keeping deterministic scoring separate from generative reasoning.

Tech Stack

Python 3.12,
FastAPI,
SQLAlchemy (async),
LangGraph,
LangChain,
LlamaIndex,
ChromaDB,
Ollama,
PostgreSQL,
React 19,
TypeScript,
Vite 7,
Tailwind CSS v4,
Docker Compose (7 services).

Page updated

Google Sites

Report abuse