I Built a Financial AI Compliance Engine in a Week. Here’s How Spec-Driven Vibe Coding Made the Impossible Routine.

By a Senior DevOps Engineer who thought nginx would be enough.

The Problem Nobody Is Talking About

AI trading agents are going live. Right now. Not in some distant future — today, LLMs are executing real trades through APIs like Alpaca, making decisions at machine speed, and holding accountability to exactly nobody.

I’ve spent years as a senior DevOps engineer. I know logs. I know pipelines. I know what “audit trail” actually means when a regulator is sitting across from you asking why your system lost $50,000 in forty seconds. The answer “the AI did it” will not fly.

So I started asking the question nobody else seemed to be asking: if an AI agent makes a catastrophic trade, can you prove — with cryptographic receipts — exactly what it intended vs. what the exchange actually executed?

The answer, for almost every AI trading system being built right now, is no.

That is the gap I set out to close. In one week. With a project I called Agent Provost.

Week One: What I Thought I Needed vs. What I Actually Needed

My first instinct was obvious. Any senior DevOps engineer’s first instinct is obvious: nginx. Reverse proxy, access logs, done. Point the LLM at a proxy, proxy passes to the MCP server, log everything, call it a day.

I got about thirty minutes into the design before I hit the wall.

nginx logs requests. It does not log bodies. Not really — not in a way that’s useful for financial compliance. You get a URI, a status code, bytes sent. What you don’t get is:

The JSON-RPC payload the LLM sent: “buy 500 shares of TSLA”
The HTTP POST that actually hit Alpaca’s API: {"symbol": "TSLA", "qty": 500, "side": "buy"}
The exchange confirmation receipt: {"id": "abc123", "status": "filled", "filled_avg_price": "200.50"}
The final response the LLM received confirming execution

That’s four steps. nginx, out of the box, gives you shadows of two of them. And even those are incomplete — request bodies require $request_body, which nginx only captures under specific buffering conditions, and response bodies aren’t logged at all without custom modules.

Then I discovered OpenResty.

The Real Tool: OpenResty + LuaJIT at the HTTP Layer

OpenResty is nginx with LuaJIT embedded directly into the request lifecycle. At every phase — request arrival, upstream proxying, response filtering, access logging — you have a fully programmable Lua runtime executing at C-native speeds.

This is not a plugin architecture. This is not middleware bolted on top. The Lua code runs inside the nginx worker process, with direct access to request and response buffers.

For financial audit logging, this is the difference between watching a trade happen through frosted glass vs. holding the actual signed receipt in your hands.

With OpenResty’s body_filter_by_lua_block, I could intercept every byte of every response body as it streamed through, buffer it, and stamp it into the access log as structured JSON. With access_by_lua_block, I could parse every incoming JSON-RPC payload and make real-time blocking decisions — what I called the circuit breaker.

-- From default.conf: THE CIRCUIT BREAKER (Hop 1, port 8000)
access_by_lua_block {
    local req_body = ngx.req.get_body_data()
    if req_body then
        local cjson = require "cjson.safe"
        local parsed = cjson.decode(req_body)
        if parsed and parsed.params and parsed.params.arguments then
            local qty = tonumber(parsed.params.arguments.quantity)
                     or tonumber(parsed.params.arguments.qty)
            if qty and qty > 100 then
                ngx.status = 403
                ngx.say('{"error": "PROVOST_INTERVENTION: Risk Limit Exceeded."}')
                return ngx.exit(403)
            end
        end
    end
}

That’s a hard kill switch. An LLM that tries to place an order for more than 100 shares gets a 403 before the request ever reaches the MCP server. No debate. No exception handling. No application-layer trust. The proxy is the policy enforcement point.

The response body capture is equally direct:

-- THE LEDGER: buffer response body for the access log
body_filter_by_lua_block {
    local chunk = ngx.arg[1]
    local MAX_CAPTURE_BYTES = 65536
    local buffered = ngx.ctx.buffered or ""
    if #buffered < MAX_CAPTURE_BYTES and chunk and #chunk > 0 then
        local remaining = MAX_CAPTURE_BYTES - #buffered
        if #chunk > remaining then
            buffered = buffered .. string.sub(chunk, 1, remaining)
        else
            buffered = buffered .. chunk
        end
        ngx.ctx.buffered = buffered
    end
    if ngx.arg[2] then   -- ngx.arg[2] is true on the final chunk
        ngx.var.resp_body = buffered
    end
}

ngx.arg[2] being true signals the final chunk in a streaming response. That detail — not in the first tutorial you’ll find, buried in the OpenResty reference docs — is what makes body capture work correctly for Server-Sent Events (SSE), which is exactly the transport the MCP protocol uses. Getting SSE body capture right required understanding the chunked transfer encoding interaction with proxy_buffering off. That’s the kind of depth that normally costs you two days and a Slack thread.

The Architecture: Two-Hop MITM, Four-Step Ledger

The architecture that emerged is what I call a Two-Hop MITM Proxy. Two OpenResty server blocks, two log files, one ironclad audit trail.

LLM Agent
    │
    ▼  (port 8088, external)
┌─────────────────────────────────┐
│  HOP 1: agent-provost:8000      │  ←── llm_to_alpaca_access.log
│  "Agentic Intent Layer"         │
│  • Logs Step 1 (LLM request)    │
│  • Runs circuit breaker         │
│  • Logs Step 4 (LLM response)   │
└─────────────────────────────────┘
    │
    ▼  (Docker internal network: mcp_internal)
┌─────────────────────────────────┐
│  alpaca-mcp (port 8088)         │
│  alpaca-mcp-server==2.0.0       │
│  PATCHED at startup by          │
│  entrypoint.sh to route all     │
│  API calls through Hop 2        │
└─────────────────────────────────┘
    │
    ▼  (port 8081, internal-only exposure)
┌─────────────────────────────────┐
│  HOP 2: agent-provost:8081      │  ←── mcp_to_alpaca_access.log
│  "Financial Execution Layer"    │
│  • Logs Step 2 (trade request)  │
│  • Logs Step 3 (Alpaca receipt) │
└─────────────────────────────────┘
    │
    ▼  (internet, egress-only network)
  Alpaca Paper Trading API

The log format is structured JSON on every line, capturing both the full request body and full response body at every hop:

log_format json_full escape=json
'{'
  '"time_local":"$time_local",'
  '"remote_addr":"$remote_addr",'
  '"request":"$request",'
  '"status": "$status",'
  '"body_bytes_sent":"$body_bytes_sent",'
  '"request_time":"$request_time",'
  '"upstream_response_time":"$upstream_response_time",'
  '"request_body":"$req_body",'
  '"resp_body":"$resp_body"'
'}';

Every trade, every market query, every tool call — stamped, sequenced, tamper-evident. Four steps captured across two log files. That is the Holy Grail for financial AI governance:

Step 1: What the LLM asked to do (JSON-RPC, from Hop 1 request log)
Step 2: What HTTP payload actually hit the exchange API (REST, from Hop 2 request log)
Step 3: What the exchange actually confirmed (REST response, from Hop 2 response log)
Step 4: What the LLM was told happened (JSON-RPC response, from Hop 1 response log)

If those four receipts don’t agree with each other, you have found your bug — or your fraud.

The network topology enforces the audit guarantee. Port 8081 (Hop 2) is only exposed — never ports-mapped. The MCP server container is on mcp_internal, an internal: true Docker network. It cannot reach the internet directly. Every outbound call is forced through Hop 2. There’s no escape hatch.

The MCP Server Interception Problem (And a Surgical Fix)

Alpaca’s MCP server — alpaca-mcp-server==2.0.0 — hardcodes its trading API base URL internally. The _get_trading_base_url() function reads a hardcoded dict, with no environment variable override for the primary URL. My proxy was being bypassed for Hop 2.

The obvious fix is to fork the package. That means a maintained branch, dependency tracking, upstream drift — ongoing cost forever.

Instead, entrypoint.sh patches the package in-place at container startup, before the server process is exec’d:

python - "$SERVER_PY" <<'PYEOF'
import re, sys
path = sys.argv[1]
src = open(path).read()
pattern = r"def _get_trading_base_url\(\) -> str:\n(?:    .*\n){1,6}"
new_block = (
    "import os\n"
    "def _get_trading_base_url() -> str:\n"
    "    forced = os.environ.get(\"TRADE_API_URL\")\n"
    "    if forced:\n"
    "        return forced.rstrip(\"/\")\n"
    "    paper = os.environ.get(\"ALPACA_PAPER_TRADE\", \"true\").lower() in (\"true\", \"1\", \"yes\")\n"
    "    return TRADING_API_BASE_URLS[\"paper\" if paper else \"live\"]\n"
)
patched, count = re.subn(pattern, new_block, src, count=1)
open(path, "w").write(patched)
PYEOF

exec alpaca-mcp-server --transport streamable-http --host 0.0.0.0 --port 8088

The container starts. Python patches the installed package. The MCP server launches with its internal routing rewritten to point at http://agent-provost:8081. The patch takes under a second. No fork. No maintained branch. The regex targets the exact function signature, and count=1 ensures exactly one substitution — safe against future package structural changes causing silent double-patches.

That’s not a hack. That’s precision surgery.

Spec-Driven Vibe Coding: What It Actually Means in Practice

I need to be honest about how this got built, because the methodology is as important as the result.

I did not write most of this code in the traditional sense. What I did was articulate — in precise, domain-specific terms — exactly what financial governance required, what the audit trail needed to prove, and what the failure modes looked like. I described the four-step ledger before I wrote a line of Lua. I described the circuit breaker’s decision logic before access_by_lua_block existed in the config. I described what “two-hop proxy with body capture and SSE support” meant architecturally before a single server {} block was written.

Spec-driven vibe coding is what happens when a senior engineer with deep domain knowledge uses AI as an implementation accelerator — not as a replacement for thinking, but as a replacement for the hours of boilerplate, syntax lookup, and scaffolding that usually sit between the idea and the working code.

The results: in one week, this project has all of the following.

Infrastructure:

Two-hop OpenResty proxy with LuaJIT body capture, circuit breaking, and SSE-aware response buffering
Pinned Docker image digest (openresty/openresty@sha256:162...) — no mutable tags
Non-root container user appuser (uid 10001) with scoped chown on site-packages only
Docker HEALTHCHECK via Python socket probe
Internal-only Docker network (mcp_internal: internal: true) isolating the MCP container from direct internet access
Separate proxy_egress network for controlled outbound routing

Testing:

Lua unit tests (Busted framework) for circuit breaker logic — tests every boundary: qty=100 allowed, qty=101 blocked, qty vs quantity field fallback, absent fields, nil inputs
Lua unit tests for body filter buffering logic — tests chunk accumulation, 64KB cap enforcement, final-chunk signaling, nil chunk handling
Shell integration tests (BATS framework) for entrypoint.sh patch behavior and verify_proxy_routing.sh
Lua static analysis via luacheck in CI

CI/CD Pipeline (GitHub Actions):

ShellCheck on all shell scripts
hadolint on the Dockerfile (DL3013 clean — all pip packages version-pinned)
luacheck Lua linting
busted Lua unit tests
BATS shell integration tests
docker-compose config validation (catches schema errors before runtime)
Checkov IaC scanning
Trivy filesystem scan
Trivy image scan on both the pulled OpenResty image and the built alpaca-mcp image, with --exit-code 1 on CRITICAL/HIGH

CVE remediation:

CVE-2026-23949 (jaraco.context path traversal via malicious tar archives) — fixed same session
CVE-2026-24049 (wheel privilege escalation via malicious wheel files) — fixed same session

In a traditional sprint structure, the security hardening alone — digest pinning, non-root users, CVE scanning pipelines, hadolint compliance — would be a multi-day effort across multiple tickets. The Lua testing setup (finding Busted, getting it running in CI with lua-cjson, writing tests that mirror production Lua logic without a running nginx instance) would be an afternoon of research. The OpenResty body capture architecture would be days of documentation diving.

It happened in hours. Because I knew what I needed. The AI helped me build it.

The CVE Cycle: Real-Time Security Hardening in a Single Session

One of the most instructive parts of this week was watching the CI pipeline catch real vulnerabilities and close them in the same working session — not in a future sprint.

It went like this:

Round 1 — hadolint DL3013: The first CI run after adding hadolint flagged that the pip install --upgrade call was installing unpinned packages (setuptools, wheel, jaraco.context). DL3013: “Pin versions in pip.” Fix: run the exact base image, query the installed versions, hardcode them. Five minutes. Committed.

Round 2 — Trivy CVEs: The next CI run built the image successfully, then Trivy scanned it and returned:

┌───────────────────────────┬────────────────┬──────────┬────────┬─────────────┬───────────────┐
│          Library          │ Vulnerability  │ Severity │ Status │  Installed  │ Fixed Version │
├───────────────────────────┼────────────────┼──────────┼────────┼─────────────┼───────────────┤
│ jaraco.context (METADATA) │ CVE-2026-23949 │ HIGH     │ fixed  │ 5.3.0       │ 6.1.0         │
│ wheel (METADATA)          │ CVE-2026-24049 │ HIGH     │ fixed  │ 0.45.1      │ 0.46.2        │
└───────────────────────────┴────────────────┴──────────┴────────┴─────────────┴───────────────┘

The catch: these weren’t the top-level packages we’d pinned. These were vendored copies inside setuptools/_vendor/. setuptools==79.0.1 was shipping its own bundled copies of jaraco.context==5.3.0 and wheel==0.45.1, and Trivy correctly identified them as vulnerable.

The fix: upgrade setuptools itself to 82.0.1, which vendors jaraco.context==6.1.0 and wheel==0.46.3. Verified locally by running the base image and grepping the _vendor/ METADATA files. Confirmed hadolint still clean. Committed:

fix(dockerfile): upgrade setuptools/wheel to patch CVE-2026-23949 and CVE-2026-24049

That is the loop. CI catches it. You fix it in the same session. You prove the fix with the same scanner. You push. Not a ticket. Not next sprint. Now.

This is what a mature security feedback loop looks like when the pipeline is fast enough to be part of your development flow rather than a gate you wait for.

What “A Week Old” Actually Means

When I say this project is a week old, I want to be precise about what that means.

It means I went from “I wonder if anyone is logging AI trading agent activity properly” to:

Working two-hop MITM proxy with Lua-level body capture
Circuit breaker enforcing trade size limits at the network layer
Patched MCP server routing through the proxy without forking upstream
Full test suite in two languages (Lua/Busted + shell/BATS)
CI pipeline with eight quality and security gates
Docker image passing Trivy clean with all CVEs remediated
Live market data flowing through the system (DIA/SPY/QQQ via Alpaca paper trading)

In one week. As a single engineer.

That is not normal velocity. That used to take quarters for a platform team. It took a week because the cognitive work of architecture and domain judgment is still mine, and the implementation work is now accelerated by an order of magnitude.

Knowing that nginx wouldn’t cut it — that’s domain judgment.
Knowing that body capture needs body_filter_by_lua_block and that SSE requires proxy_buffering off — that’s research, guided by AI, compressed from days to an afternoon.
Knowing that the four-step ledger is the non-negotiable requirement for financial AI governance — that’s senior engineering experience.
Knowing that setuptools/_vendor/ could contain vulnerable sub-packages that Trivy would find and the top-level pin wouldn’t fix — that’s the kind of subtle security detail that bites teams when it’s not their day job.

Spec-driven vibe coding doesn’t replace that judgment. It multiplies it.

The Remaining Holy Grail: Hop 2 Validation

The four-step ledger is architecturally complete, but Hop 2 — the MCP-to-Alpaca leg — needs full end-to-end validation to confirm Steps 2 and 3 are appearing in mcp_to_alpaca_access.log. The network topology guarantees the traffic must flow through it. The logging config is in place. The next session validates it with live trades.

After that: cryptographic signing of log entries. Because an audit trail that can be modified after the fact is not an audit trail — it’s a liability with a JSON format.

And after that: a compliance dashboard. Because logs that only auditors can read during a post-mortem aren’t governance — they’re archaeology.

For Anyone Building AI Agents That Touch Real Money

You need to answer four questions before you go live:

What did the LLM ask to do?
What did your code actually send to the exchange?
What did the exchange actually confirm?
What did the LLM get told happened?

If you can’t produce a tamper-evident, time-stamped receipt for all four — from a system that the LLM itself cannot modify — you are operating blind with someone else’s money.

Agent Provost is the answer to that question.

And it’s one week old.

The code is at github.com/CharmingSteve/agent-provost (you can check out the first and newest release here). The project is actively under development. PRs and issues welcome.

I Built a Financial AI Compliance Engine in a Week. Here’s How Spec-Driven Vibe Coding Made the Impossible Routine.

The Problem Nobody Is Talking About

Week One: What I Thought I Needed vs. What I Actually Needed

The Real Tool: OpenResty + LuaJIT at the HTTP Layer

The Architecture: Two-Hop MITM, Four-Step Ledger

The MCP Server Interception Problem (And a Surgical Fix)

Spec-Driven Vibe Coding: What It Actually Means in Practice

The CVE Cycle: Real-Time Security Hardening in a Single Session

What “A Week Old” Actually Means

The Remaining Holy Grail: Hop 2 Validation

For Anyone Building AI Agents That Touch Real Money

Like this:

Related

Leave a ReplyCancel reply

The Problem Nobody Is Talking About

Week One: What I Thought I Needed vs. What I Actually Needed

The Real Tool: OpenResty + LuaJIT at the HTTP Layer

The Architecture: Two-Hop MITM, Four-Step Ledger

The MCP Server Interception Problem (And a Surgical Fix)

Spec-Driven Vibe Coding: What It Actually Means in Practice

The CVE Cycle: Real-Time Security Hardening in a Single Session

What “A Week Old” Actually Means

The Remaining Holy Grail: Hop 2 Validation

For Anyone Building AI Agents That Touch Real Money

Share this:

Like this:

Related

Leave a ReplyCancel reply