Code Static Analysis with OpenRewrite and AI Skills
2026-04-19 / modified at 2026-04-23 / 2k words / 12 mins

Combining the structured automation of OpenRewrite with the reasoning capabilities of AI allows for a sophisticated approach to managing codebases risks (like OWASP/CWE/XSS). Our dev team have found hundreds of vulnerable exploits with a local qwen3.5-27b model.

the Challenges

There are 1.2 millions LOCs of Java codebases, after a breif review, I found three security tech debts

  • stale and vulnerable maven dependency libraries, especially SpringBoot
  • boilerplate, non-auditable generated code like MyBatis Mappers, getter & setter
  • business related security risks: information disclosure, privilege elevation, RCE, SQL injections

Security Remissions

As tech debts are mainly in Java, I choose OpenRewrite (Apache-2.0 license) recipes for automation

  • Dependency Upgrade: remission from known supply chain risks
    • SBOM centralization with daily smoking test enforcement, no more explicit version allowed
    • Automated upgrading with SpringBoot migration from OpenRewrite recipes
  • Boilerplate Cleanup: Removing dead code with references detection for shadow attack surface reduction
    • Intellij’s Find Usage with manual deletion
    • OpenRewrite Best practices: especially “Remove unused private methods”
    • Self-development OpenRewrite recipes
      • Persistence Layer Cleanup: Pruning unmapped MyBatis XML mapper methods
  • Security risk identification
    • Scan with known security tools
      • Static code check: Qodana, SpotBugs (Find Security Bugs), Semgrep
      • GitLeak: scan passwords, token & PII on codebase, database, logs and key-value storages for leaks.
    • Scan with custom-development OpenRewrite recipes into CSV files for AI taint data flow analyze
      • Endpoint Extraction: Identifying and mapping API surface areas across thousands of Spring controllers.
      • Taint sink Extraction: collect potential vulnerabilities snippets.
        • All non-literal string concatenation: URLs, shells, directories, HTML Tags…
        • weak & dangerous methods
          • template engine: thymeleaf, Jinja…
          • insecure serialization: JSON(Jackson)/XML/SOAP…
          • Injection: LDAP, OS Commands, SQL, RCE, XSS, SSRF
  • AI-Enhanced security audit with Skills (OWASP/CWE Top 10)
    • secure design overlook: using public available threating model skills
    • For complex security logic, using AI to read OpenRewrite CSV files and identify vulnerable chains
      • Information protection: encryption, redaction, auditable, logging, configuration
      • Authn/Authz control: auth bypass, privileges escalated
      • Taint analysis: taint data flow analysis for input sanitization

Here is the full steps

$2

Step 4: Remediation

Step 3: AI Skills Audit

Step 2: Context Engineering (The Core)

Step 1: Cleanup & Broad SAST

OpenRewrite Debt Reduction

Pruned Code

Precision Snippets

AI Audit Skills

Threat Modeling

Authn/Authz Verification

Taint Flow & Sanitization

Custom OpenRewrite

Endpoint & API Mapper

Taint Sinks (LDAP, SQL, JSON)

Custom Security Logic Scanners

Broad Scanning

Qodana / SpotBugs

GitLeaks

Git Repo(Java Source)

SBOM Inventory

Dead Code / Boilerplate

Supply Chain Upgrades

OpenRewrite AST (LST)

Structured Context CSV(Snippets + Metadata)

AI Reachability Analysis

High-Fidelity
Security Report

Details on OpenRewrite

Limitations on pure AI SAST

By default, the agentic tools like Claude use grep, ripgrep or language server to implement a semantic search on codebase. When you ask “audit OWASP Top 10 on codebase” with AI agent, the AI may extract some risks, but with uncertainty.

  • Scope: AI can’t distinct “dead & unrelated & test code” with a simple grep, the context & token will have been wasted until it read. For human is “You can’t learn swimming without entering water”, that is the same on AI on wasting tokens.
  • Precision: the grep-based semantic search can’t garanteen the determinate output, nor the testability.

Generally, we are under a context window deliemma: when AI finally understands the code, the context window (256k) may have half left for a security analysis.

Why OpenRewrite

Unlike regex-based searching, OpenRewrite is a static Java AST(LST) analyzer that read full code into memory, and provides tree-traversal API on Java methods & variables. For instance, when audit injections on a webhook controller

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@RestController("/api/v1/webhook")
class WebHookController {
@GetMapping("/connectivity")
void connectivity(String url){
// it could be a complex nested call chains
RuntimeService.run(url);
}
}

class RuntimeService {
private static final String CURL_VERSION = "--version";
void run(String url) throws Exception {
// the literal concatenation(hard-code) should be low risk
Runtime.getRuntime().exec("curl " + CURL_VERSION);
// the injectable RCE/SSRF
Runtime.getRuntime().exec("curl " + url);
}
}

As we’re scanning reports on 1 million codebase, the massive hard-code false positive snippets can be a critical speed & memory issue, so there are some custimize on the recipe

  • constant resolving: using visitor & scanner with a map accumulator to remove hard-code fragments.
  • temporary ignore logic: excluding the scheduled & WIP tasks after we identify them.

And one more thing is testability, OpenRewrite provides mature JUnit-based test framework and vulnerable examples so we can

  • Have private rules: Everytime we catch a new risk from a code — some risks are rare, entry-level even ridiculous — we put the pattern into the private recipe test cases => our organizations share the skill, which can’t be found from public.
  • Human distill for smarted models: We also use some public security skills & smarter models to scan the codebase directly, analyse their prompts and scanned results, “steal” and put the missing risks into our recipes

Context Engineering on Security

By moving forward OpenRewrite as a “precision instrument” for scope and precision to feed an AI-driven audit layer, it bridges the gap between Static Analysis (SAST) and Security Orchestration, to help AI understand the underlying scope, type information, and hierarchy, preventing “static” identity in complex systems.

  • Privacy and security: running on local AI models, no vulnerability exposure on public AI services.
  • Good enough: OWASP/CWE are nearly covered with data flow reachability analysis
  • Certainty: recipes are testable, auditable, enumeratable, and reproducible.

Limitations & Tradeoffs

The cost of discovery security vulnerabilities has dropped to nearly zero, however there are limitations

  • Just-in-Time Knowledge: the scope of security is under constraint by the author’s competence circle of SAST experiences, and also eliminates frontier AI model’s elaboration capability.
  • scope: the tool can reduce the attack surface, but not eliminates all threats that need for the heavy proprietary armor: WAF, supply chains, daily backups.
  • compliance: may not be suitable for the final report for legal certifications like SOC2, PCI DSS.

Costs on “vibe-coding” on SAST

As non-security engineer, I implement all in 4 months

  • 1 week for Gitleak & Semgrep rules: generate rules by AI first, the apply patches on the overwhelming false positve reports.
  • 2 months for OpenRewrite: Vibecoding with generated & real-world test cases => private cases are the “moat” on our SAST.
  • 2 months for AI Skill: Picked from a well documented Skill template => it requires a strong writing skill and security experiences, I must rerun the scanner for reproducibility.

Once we stabilize the workflow, we get back high ROI on our 1.2 million code

  • centralize risk methods: File & Path, RCE, redacting, encryption
  • apply business related patches with code agents

AI Polarization: Monkeys Now Have Missiles

As we have talked previously, the decompression and connectivity capabilities are best for pattern match roles, and code static analysis(SAST) is just the perfect example.

Before AI, the task is finished with security experts’ months of reviewing, or commercial software like Coverity, but for now, most of the task can be handled by a local 27b AI model as simple as translating a document.

Role shifting on AI-based security

As security audit process becomes a “commodity” (cheap, fast, and everywhere), AI are shrinking the exploit timeline, here are more impacts

  • source available projects: may be high risky or have been undermined, use them with cautions
  • security vendors: must own high-end weapons, or proof of results like brand, insurance and liability
  • developers: vibe-coding comes with security risks => you can’t outsource your security design

AI-driven selloff on cybersecurity companies

Despite whatever Claude has found vulnerabilities with advanced model, the true fear is never the CrowdStrike’s selloff. Security is multi-dimensional structure, SAST is not the “moat”. Instead, the contract on cybersecurity films should be with

  • Consolidation: runtime agent & firewall protection; behavior based rather than code based
  • Governance & Insurance: the legal and liable responsibility for customers.

In a world where everyone has a missile, the only survivor is the one who owns the radar. As the Anti-Missile Defense System requires multi-dimensional protection infrastructure, which is highly rely on the top proprietary cybersecurity vendors, the defense service may become an essential, non-negotiable “security tax” in the future.