Code Static Analysis with OpenRewrite and AI Skills

2026-04-19 / modified at 2026-06-11 / 2k words / 12 mins

Combining the structured automation of OpenRewrite with the reasoning capabilities of AI allows for a sophisticated approach to managing codebases risks (like OWASP/CWE/XSS). Our dev team have found hundreds of vulnerable exploits with a local qwen3.5-27b model.

the Challenges

There are 1.2 millions LOCs of Java & frontend codebases, after a breif review, I found three security tech debts

stale and vulnerable maven dependency libraries, especially SpringBoot
boilerplate, non-auditable generated code like MyBatis Mappers, getter & setter
business related security risks: information disclosure, privilege elevation, RCE, SQL injections

Security Remissions

As tech debts are mainly in Java, I choose OpenRewrite (Apache-2.0 license) recipes for automation

Dependency Upgrade: remission from known supply chain risks
- SBOM centralization with daily smoking test enforcement, no more explicit version allowed
- Automated upgrading with SpringBoot migration from OpenRewrite recipes
Boilerplate Cleanup: Removing dead code with references detection for shadow attack surface reduction
- Intellij’s Find Usage with manual deletion
- OpenRewrite Best practices: especially “Remove unused private methods”
- Self-development OpenRewrite recipes
  - Persistence Layer Cleanup: Pruning unmapped MyBatis XML mapper methods
Security risk identification
- Scan with known security tools
  - Static code check: Qodana, SpotBugs (Find Security Bugs), Semgrep
  - GitLeak: scan passwords, token & PII on codebase, database, logs and key-value storages for leaks.
- Scan with custom-development OpenRewrite recipes into CSV files for AI taint data flow analyze
  - Endpoint Extraction: Identifying and mapping API surface areas across thousands of Spring controllers.
  - Taint sink Extraction: collect potential vulnerabilities snippets.
    - All non-literal string concatenation: URLs, shells, directories, HTML Tags…
    - weak & dangerous methods
      - template engine: thymeleaf, Jinja, String.format…
      - insecure serialization: JSON(Jackson)/XML/SOAP…
      - Injection: LDAP, OS Commands, SQL, RCE, XSS, SSRF
AI-Enhanced security audit with Skills (OWASP/CWE Top 10)
- secure design overlook: using public available threating model skills
- For complex security logic, using AI to read OpenRewrite CSV files and identify vulnerable chains
  - Information protection: encryption, redaction, auditable, logging, configuration
  - Authn/Authz control: auth bypass, privileges escalated, IDOR
  - Taint analysis: taint data flow analysis for input sanitization

Here is the full steps

Details on OpenRewrite

Limitations on AI-only SAST

By default, the agentic tools like Claude use grep, ripgrep or language server to implement a semantic search on codebase. When you ask “audit OWASP Top 10 on codebase” with AI agent, the AI may find some risks, but always with uncertainty.

Scope: AI can’t distinct “dead & unrelated & test code” with a simple grep, the context & token will have been wasted until it read. For human is “You can’t learn swimming without entering water”, that is the same on AI on wasting tokens.
Precision: the grep-based semantic search can’t garanteen the determinate output, nor the testability.

Generally, we are under a context window deliemma: when AI finally understands the code, the context window (256k) may have half left for a security analysis.

Why OpenRewrite

Unlike regex-based searching, OpenRewrite is a static Java AST(LST) analyzer that reads full code into memory, and provides tree-traversal API on Java methods & variables. For instance, when audit injections on a webhook controller

@RestController("/api/v1/webhook")
class WebHookController {
    @GetMapping("/connectivity")
    void connectivity(String url){
        // it could be a complex nested call chains
        RuntimeService.run(url);
    }
}

class RuntimeService {
    private static final String CURL_VERSION = "--version";
    void run(String url) throws Exception {
        // the literal concatenation(hard-code) should be low risk
        Runtime.getRuntime().exec("curl " + CURL_VERSION);
        // the injectable RCE/SSRF
        Runtime.getRuntime().exec("curl " + url);
    }
}

As we’re scanning reports on 1 million codebase, the massive hard-code false positive snippets can be a critical speed & memory issue, so there are some custimize on the recipe

constant resolving: using visitor & scanner with a map accumulator to remove hard-code fragments.
temporary ignore logic: excluding the scheduled & WIP tasks after we identify them.

And one more thing is testability, OpenRewrite provides mature JUnit-based test framework and vulnerable examples so we can

Maintain internal rules: Everytime we catch a new risk from a code — some risks are rare, entry-level even ridiculous — we put the pattern into the private recipe test cases => our organizations share the skill, which can’t be found from public.
Human distill for smarted models: We also use some public security skills & smarter models to scan the codebase directly, analyse their prompts and scanned results, contributing back on our recipes

Context Engineering on Security

By moving forward OpenRewrite as a “precision instrument” for scope and precision to feed an AI-driven audit layer, it bridges the gap between Static Analysis (SAST) and Security Orchestration, to help AI understand the underlying scope, type information, and hierarchy, preventing “static” identity in complex systems.

Privacy and security: running on local AI models, no vulnerability exposure on public AI services.
Good enough: OWASP/CWE are nearly covered with data flow reachability analysis
Certainty: recipes are testable, auditable, enumeratable, and reproducible.

Limitations & Tradeoffs

The cost of discovery security vulnerabilities has dropped to nearly zero, however there are limitations

Just-in-Time Knowledge: the scope of security is under constraint by the author’s competence circle of SAST experiences, and also eliminates frontier AI model’s elaboration capability.
scope: the tool can reduce the attack surface, but not eliminates all threats that rely on heavy proprietary armor: WAF, supply chains, daily backups.
liability engineering: may not be suitable for the final report for legal certifications like SOC2, PCI DSS.

Costs on “vibe-coding” on SAST

As a non-security engineer, I implement all in 4 months

1 week for Gitleak & Semgrep rules: generate rules by AI first, the apply patches on the overwhelming false positve reports.
2 months for OpenRewrite: Vibecoding with generated & real-world test cases => our private test cases are the “moat” on our SAST.
2 months for AI Skill: Picked from a well documented Skill template => it requires a strong writing skill and security experiences, I must rerun the scanner for reproducibility.

Once we stabilize the workflow, we get back high ROI on our 1.2 million code

centralize risk methods: File & Path, RCE, redacting, encryption
apply business related patches with code agents

AI Polarization: Monkeys Now Have Missiles

As we have talked previously, the AI-augmented decompression and connectivity capabilities are best for pattern match roles, and code static analysis(SAST) is just the perfect example.

Before AI, the task is finished with security experts’ months of reviewing, or commercial software like Coverity, but for now, most of the task can be handled by a local 27b AI model as simple as translating a document.

Role shifting on AI-based security

As security audit process becomes a “commodity” (cheap, fast, and everywhere), AI are shrinking the exploit timeline, here are more impacts

source available projects: may be high risky or have been undermined, use them with cautions
security vendors: must own high-end weapons, or proof of results like trust, insurance and liability
developers: vibe-coding comes with security risks => you can’t outsource your security design

AI-driven selloff on cybersecurity companies

Despite whatever Claude has found vulnerabilities with advanced model, the true fear is never the CrowdStrike’s selloff. Security is multi-dimensional structure, SAST is not the “moat”. Instead, the contract on cybersecurity films should be with

Consolidation: runtime agent & firewall protection; behavior based rather than code based
Governance & Insurance: the legal and liable responsibility for customers.

In a world where everyone has a missile, the only survivor is the one who owns the radar. As the Anti-Missile Defense System requires multi-dimensional protection infrastructure, which is highly rely on the top proprietary cybersecurity vendors, the defense service may become an essential, non-negotiable “security tax” in the future.

AI SAST Security