Code Static Analysis with OpenRewrite and AI Skills
2026-04-19 / modified at 2026-04-19 / 1.9k words / 11 mins

Combining the structured automation of OpenRewrite with the reasoning capabilities of AI allows for a sophisticated approach to managing codebases risks (like OWASP/CWE/XSS). Our dev team have found hundreds of vulnerable exploits with a local qwen3.5-27b model.

the Challenges

There are 1.2 millions LOCs of Java codebases, after a breif review, I found three security tech debts

  • stale and vulnerable maven dependency libraries, especially SpringBoot
  • boilerplate, non-auditable code like MyBatis Mappers, getter & setter
  • business related security risks: information disclosure, privilege escaping, RCE, SQL injections

Security Remissions

As tech debts are mainly in Java, I choose OpenRewrite (Apache-2.0 license) recipes for automation

  • Dependency Upgrade: remission from known supply chain risks
    • SBOM centralization with daily smoking test enforment, no more explicit version allowed
    • Automated upgrading with SpringBoot migration from OpenRewrite recipes
  • Boilerplate Cleanup: Removing dead code with references detection for shadow attack surface reduction
    • Intellij’s Find Usage with manual deletion
    • OpenRewrite Best practices: especially “Remove unused private methods”
    • Self-development OpenRewrite recipes
      • Persistence Layer Cleanup: Pruning unmapped MyBatis XML mapper methods
  • Security risk identification
    • Scan with known security tools
      • Static code check: Qodana, SpotBugs (Find Security Bugs), Semgrep
      • GitLeak: scan passwords, token & PII on codebase, database, logs and key-value storages for leaks.
    • Scan with custom-development OpenRewrite recipes designed to handle taint data flow analyze
      • Endpoint Extraction: Automatically identifying and mapping API surface areas across thousands of controllers.
      • Taint sink Extraction: collect potential vulnerabilities snippets for further AI evaluation.
        • All non-literal string concatenation: URLs, shells, directories, HTML Tags…
        • weak & dangerous methods
          • template engine: thymeleaf, Jinja…
          • insecure serialization: JSON(Jackson)/XML/SOAP…
          • Injection: LDAP, OS Commands, SQL, RCE, XSS, SSRF
  • AI-Enhanced security audit with Skills (OWASP/CWE Top 10)
    • secure design overview: using public available threating model skills
    • For complex security logic, using AI to read OpenRewrite CSV outputs and identify vulnerable chains
      • Information protection: encryption, redaction, auditable, logging, configuration
      • Authn/Authz control: auth bypass, privileges escalated
      • Taint analysis: taint data flow analysis for input sanitization

Here is the full steps

$2

Step 4: Remediation

Step 3: AI Skills Audit

Step 2: Context Engineering (The Core)

Step 1: Cleanup & Broad SAST

OpenRewrite Debt Reduction

Pruned Code

Precision Snippets

AI Audit Skills

Threat Modeling

Authn/Authz Verification

Taint Flow & Sanitization

Custom OpenRewrite

Endpoint & API Mapper

Taint Sinks (LDAP, SQL, JSON)

Custom Security Logic Scanners

Broad Scanning

Qodana / SpotBugs

GitLeaks

Git Repo(Java Source)

SBOM Inventory

Dead Code / Boilerplate

Supply Chain Upgrades

OpenRewrite AST (LST)

Structured Context CSV(Snippets + Metadata)

AI Reachability Analysis

High-Fidelity
Security Report

Details on OpenRewrite

Limitations on pure AI SAST

By default, the agentic tools like Claude use grep, ripgrep or language server to implement a semantic search on codebase. When you ask “audit OWASP Top 10 on codebase” with AI agent, the AI may extract some risks, but with uncertainty.

  • Scope: AI can’t distinct “dead & unrelated & test code” with a simple grep, the context & token will have been wasted until it read. For human is “You can’t learn swimming without entering water”, that is the same on AI regardless how advanced the model will be.
  • Precision: the grep-based semantic search can’t garanteen the consistency output, nor the testability.

Generally, we are under a context window deliemma: when AI finally understands the code, the context window (256k) may have half left for a security analysis.

Why OpenRewrite

Unlike regex-based searching, OpenRewrite is a static Java AST(LST) analyzer that read full code into memory, and provides tree-traversal API on Java methods & variables. For instance, when audit injections on a webhook controller

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@RestController("/api/v1/webhook")
class WebHookController {
@GetMapping("/connectivity")
void connectivity(String url){
// it could be a complex nested call chains
RuntimeService.run(url);
}
}

class RuntimeService {
private static final String CURL_VERSION = "--version";
void run(String url) throws Exception {
// the literal concatenation(hard-code) should be safe or low risk
Runtime.getRuntime().exec("curl " + CURL_VERSION);
// the injectable RCE/SSRF
Runtime.getRuntime().exec("curl " + url);
}
}

As we’re scanning reports on 1 million codebase, the massive hard-code false positive snippets can be a critical speed & memory issue, so there are some custimize on the recipe

  • constant resolving: using visitor & scanner with a map accumulator to remove hard-code fragments.
  • temporary ignore logic: excluding scheduled & WIP tasks after we identify them.

And one more thing is testability, OpenRewrite provides mature JUnit-based test framework and vulnerable examples

  • Private rules: Everytime we catch a new risk from a code — some risks are rare, entry-level even ridiculous — we put the pattern into the private recipe test cases => our organizations share the skill, which can’t be found from public.
  • Human distill for smarted models: We also use some public security skills & smarter models to scan the codebase directly, analyse their prompts and scanned results, “steal” and put the missing risks into our recipt

Context Engineering on Security

By moving forward OpenRewrite as a “precision instrument” for scope and precision to feed an AI-driven audit layer, it bridges the gap between Static Analysis (SAST) and Security Orchestration, to help AI understand the underlying scope, type information, and hierarchy, preventing “static” identity in complex systems.

  • Privacy and security: running on local AI models, no vulnerability exposure on public AI services.
  • Good enough: OWASP/CWE are nearly covered with data flow reachability analysis
  • Certainty: recipes are testable, auditable, enumeratable, and reproducible.

Limitations & Tradeoffs

The cost of discovery security vulnerabilities has near dropped to zero, however there are limitations

  • Just-in-Time Knowledge: the scope of security is constraint by the author’s competence circle
  • compliance: may not be suitable for the final report for legal certifications like SOC2, PCI DSS.

AI Polarization: Monkeys Now Have Missiles

As we have talked previously, the decompression and connectivity capabilities are best for pattern match roles, and code static analysis is just the perfect example.

Before AI, the task is finished with security experts’ months of reviewing, or commercial software like Coverity, but for now, most of the task can be handled by a local 27b AI model as simple as translating a document.

Role shifting on AI-based security

As security audit process becomes a “commodity” (cheap, fast, and everywhere), there are more impacts

  • source available projects: may be high risky or has been undermined, use them with cautions
  • security vendors: must have unpublished high-end weapons, or proof of results like brand, insurance and liability
  • developers: vibe-coding comes with security risks => you can’t outsource your security design

AI-driven selloff on cybersecurity companies

Security is multi-dimensional structure, SAST is not the “Anchor”. Instead, the contract may be

  • Consolidation: runtime agent & firewall protection; behavior based rather than code based
  • Governance & Insurance: the legal and liable contractor with customers.

Despite whatever Claude has found vulnerabilities with advanced model, the true fear is never the CrowdStrike’s selloff.

Instead, in a world where everyone has a missile, the only survivor is the one who owns the radar. However, the Anti-Missile Defense System requires multi-dimensional protection infrastructure, which is highly rely on the top proprietary cybersecurity vendors. The defense becomes an essential, non-negotiable expense, we may pay for the “security tax” in the future.