Technical Brief — Sean Dougherty
MIT Open Source Production Hardening Needed April 2026
Private · Three · AI · Knowledge
Access code required
This document is confidential and access-controlled.
01

What is P3AK

Platform Overview
P3AK is an open-source AI infrastructure platform built for organizations that need persistent AI memory without cloud lock-in. Three products: an encrypted vault (Rust, AES-256-GCM), an AI data room (TypeScript/Next.js), and an agent orchestration runtime (Pi/TypeScript). MIT licensed. Everything runs on your hardware. Spoken: PEAK. The 3 encodes E (peak), PE (private equity), and three products simultaneously.
Infrastructure
P3AK Vault
Rust · MIT
AES-256-GCM encryption at rest
Argon2id key derivation (RFC 9106)
38 formats ingested (Tier 1–3)
98% Top-1 hybrid search accuracy
413 tests — Rust + Python SDK
WAL audit log, hash-linked entries
REST API · MCP server · Python SDK
Application
P3AK Room
TypeScript · Next.js 14
5 sections per data room
Gap analysis with weighted scoring
isomorphic-git per company
.mdr format — Markdown Data Room
Drizzle ORM + PostgreSQL
Clerk auth · Anthropic AI pipeline
Room → vault push bridge (REST)
Orchestration
P3AK Harness
TypeScript · Pi
CREST protocol — 5 phases
Focus → Obstacle → Routines
→ Gradient → Evolve
State Bus (JSONL)
Domain agents — CAIO, Counselor…
Vault memory via Pi extension
Slash commands · skills auto-discovered
02

BayouClaw — Linux Agent Runtime

What Impressed You

Alpine Docker, per-agent Linux users, Unix domain sockets, seccomp syscall filtering, and a policy engine enforcing the Iron Rule: no agent acts without HITL sign-off. Seven agents. Nine test suites. Fourteen board meetings run in production.

7
Agents
9
Test Suites
14
Board Meetings
4
Isolated Users

Agent Roster

AgentLinux UserRoleTransportSeccomp
amberp3ak-amberKernel — routes all requests, voice interfaceUnix sockRestricted
ledgerp3ak-ledgerFinancial analysis, AR/APUnix sockStrict
counselorp3ak-counselorLegal review, contract analysisUnix sockStrict
signalp3ak-signalEmail, calendar, communicationsUnix sockRestricted
scoutp3ak-scoutWeb research, external dataUnix sockStrict
architectp3ak-architectCodebase analysis, technical decisionsUnix sockStrict
boardp3ak-boardMulti-agent synthesis, board meeting engineUnix sockRestricted

Request Routing Architecture

HTTP :3000localhost-only
Amber (kernel)routing + context
Policy Engineauth · rate limit · Iron Rule
/run/bayouclaw/*.sockUnix domain socket
Domain Agentledger · signal · counselor…
Vault Querylong-term memory
🔌
No HTTP between agents. All inter-agent comms use Unix domain sockets at /run/bayouclaw/*.sock. No network stack exposed. No open ports between processes. Not interceptable from outside the container — this is the architectural advantage over Claude Code's file-based mailbox approach.

Dockerfile Security Model

Dockerfilebayouclaw/Dockerfile
# Alpine Linux — minimal attack surface (~5MB base) FROM alpine:3.19 # Per-agent Linux users — kernel enforces isolation, not code RUN adduser --disabled-password --no-create-home p3ak-amber && \ adduser --disabled-password --no-create-home p3ak-ledger && \ adduser --disabled-password --no-create-home p3ak-signal && \ adduser --disabled-password --no-create-home hitl-daemon # Staging: agents write, cannot list or read outbox RUN chown root:bayouclaw-agents /var/spool/p3ak/staging/email && \ chmod 730 /var/spool/p3ak/staging/email # 730 = root rwx | group wx (write, no list) | others nothing # Outbox: ONLY hitl-daemon can write RUN chown hitl-daemon:hitl-daemon /var/spool/p3ak/outbox/email && \ chmod 700 /var/spool/p3ak/outbox/email # 700 = hitl-daemon only. Agents cannot ls, cat, or modify. # IPC only — no external socket exposure RUN mkdir -p /run/bayouclaw && chmod 750 /run/bayouclaw COPY config/seccomp/ /opt/bayouclaw/seccomp/ ENV GATEWAY_HOST=127.0.0.1 # Gateway: localhost only
⚠️
Gap: Seccomp profiles are defined but coverage analysis hasn't run against each agent's actual syscall pattern. A pen test would identify over-permissive profiles — Sean's specialty.
03

HITL — Human in the Loop

OS-Level Control

The agent cannot bypass this. The kernel enforces it. Every external action goes through a staging directory. Agents have WRITE to staging, ZERO access to outbox. The hitl-daemon is the only process that can move files forward. This is file permissions — not an if-statement.

/var/spool/p3ak/ ├── staging/agents WRITE here (chmod 730 — no list, no read) │ ├── email/ ← {to, subject, body, agent, timestamp, level_required} │ ├── calendar/ │ ├── files/ │ ├── money/ ← Level 5 required │ └── system/ ← Level 3–5 required │ ├── outbox/hitl-daemon ONLY (chmod 700). Agents: zero access. ├── sent/audit trail after relay └── rejected/human said no — kept for review

6 Approval Levels

L0 · AUTO
Autonomous
Executes immediately. Logged, not reviewed.
vault reads, search, file reads, list ops
L1 · VOICE
Voice Approval
Amber asks. Richard says "yes." Fastest approval.
send email, create calendar event, share doc
L2 · TAP
Tap Approval
Push to phone. Tap APPROVE or DENY. No voice needed.
file shares, uploads, email delete, git push
L3 · PIN
PIN Approval
4-6 digit PIN on phone. Adds knowledge factor.
delete files, sensitive docs, deployments
L4 · BIOMETRIC
Biometric
Face ID / Touch ID. Proves the human is present.
financial actions, legal signing, vault rekey
L5 · MFA
Multi-Factor
Biometric + PIN. No crawfish window. Confirm twice.
money transfers, vault destruction, prod deploy

HITL Policy (abbreviated)

YAML/opt/bayouclaw/config/hitl-policy.yaml
# Read ops — always L0, no gate read: vault_search: { level: 0 } email_read: { level: 0 } web_search: { level: 0 } # External send — voice approval minimum send: email_send: { level: 1, crawfish: 600, notify: [voice, push] } drive_share: { level: 2, crawfish: 300, notify: [voice, push] } # Destructive — PIN minimum destroy: file_delete: { level: 3, crawfish: 300, notify: [voice, push] } vault_delete:{ level: 4, crawfish: 0, notify: [voice, push, sms] } deploy: { level: 3, crawfish: 0, notify: [voice, push] } # Financial — L5 always, no undo, confirm twice financial: payment_send:{ level: 5, crawfish: 0, notify: [voice,push,sms], confirm_twice: true } # Key ops — L5, no undo system: vault_rekey: { level: 5, crawfish: 0, notify: [voice,push,sms] } factory_reset:{ level: 5, crawfish: 86400, confirm_twice: true }

Approval Flow

Agent draftsaction JSON
staging/chmod 730
hitl-daemoninotifywait
Notify humanvoice/push/sms
APPROVEbiometric/PIN/tap
outbox/chmod 700
Relaymsmtp/API
🦞
CRAWFISH Protocol: After approval, a configurable window lets you undo. "Amber, pull that back." Email: 10 min. Calendar: 5 min. File delete: 5 min. Financial: 0 (confirm twice instead). Named for the Louisiana tradition — you can always pull it back.
04

Encryption & Key Management

Passcode Cycling

Your question about passcode cycling landed on a real gap. The encryption stack is solid. The rotation pipeline is manual. That's Sprint 1 of hardening.

L1AES-256-GCMSymmetric encryption at rest. 256-bit key. Authenticated — tamper detection built in.
L2Argon2id KDFMemory-hard key derivation. GPU-resistant. RFC 9106 params. Passphrase never stored.
L3Per-vault passphraseEach .vault file has its own passphrase. One breach doesn't compromise others.
L4WAL audit logWrite-Ahead Log with hash-linked entries. Every operation recorded. Tampering is detectable.
Manual rotation (current gap)vault_rekey exists in CLI. Not automated. No scheduled rotation. Not HITL-gated yet.
Proposed: Auto-rotation via HITL L5Scheduled rekey (30/90d policy) triggers HITL Level 5. Biometric + PIN to authorize. Old key archived in WAL.

Current State

EncryptionAES-256-GCM
KDFArgon2id (RFC 9106)
Key storageDerived, never persisted
Audit trailHash-linked WAL
Manual rekeyp3ak-vault rekey
Auto rotationNot implemented

Target State

Rotation scheduleConfigurable (30/90d)
Rotation approvalHITL Level 5 (MFA)
Key escrowSplit-key (future)
HSM integrationPlanned (YubiKey)
Post-rekey checkCanary validation
Drift detectionWAL hash comparison
BASHvault-rekey-scheduled.sh (concept)
#!/bin/sh — runs as cron. Stages rekey for HITL L5 approval. DAYS_SINCE=$(( ($(date +%s) - $(p3ak-vault audit --last-rekey)) / 86400 )) if [ "$DAYS_SINCE" -gt 90 ]; then # Stage the rekey — does NOT execute. hitl-daemon picks up via inotifywait. cat > /var/spool/p3ak/staging/system/vault-rekey.json <<EOF { "type": "vault_rekey", "days_since": $DAYS_SINCE, "level_required": 5, "notify": ["voice", "push", "sms"] } EOF fi
05

Performance

Real Numbers

These come from dev-environment profiling — not a formal load test. Vault ops are sub-second. Voice latency is network-bound (Deepgram + Cartesia). Agent routing is LLM-bound, not compute-bound. Formal JMeter benchmarks are the gap — Sean's expertise.

OperationMedianP99BottleneckStatus
Vault ingest (markdown)18ms45msTantivy indexingFAST
BM25 search12ms28msIndex scanFAST
Hybrid search (BM25 + ZVec + PageIndex)31ms62msThree-way mergeFAST
Vault create8ms20msArgon2id KDFFAST
Vault ingest (PDF, Tier 2)~400ms~1.2sPDF extractionOK
Voice STT (Deepgram)~500ms TTFB~900msNetwork (cloud)OK
Voice TTS (Cartesia)~500ms TTFB~800msNetwork (cloud)OK
BayouClaw agent routing<2s<5sLLM inferenceLLM-BOUND
Board meeting (5 agents)30–60s~90sSequential LLM callsEXPECTED
Room → vault push (full doc)~180ms~400msHTTP + ingestFAST
📊
No formal benchmark suite. Dev profiling only — not load-tested, not under concurrency, not measured under degraded state. The Rust core should be significantly faster under isolated benchmarking. JMeter profile is the first hardening deliverable.

What Needs Measuring

Concurrent vault readersUntested
Large vault degradationUntested
Socket queue depthUntested
Room under 50 usersUntested
Board meeting parallelismUntested

Target SLAs (Proposed)

Vault search P95<100ms
Agent routing P99<3s
Voice TTFB P95<700ms
Room page load P95<1s
Uptime target99.5% (Docker only)
06

Amber — Voice AI with OS Access

LiveKit · Deepgram · Cartesia

Real-time voice agent with full OS access, connected to BayouClaw's agent network. LiveKit Cloud handles WebRTC. Deepgram for STT. Cartesia TTS (Caroline voice). Nine tools — dangerous ones staged for HITL approval before any execution. The Force Field eliminates AI fingerprint patterns from her speech.

vault_search
Hybrid BM25 + semantic search across encrypted vault
HITL L0 — auto
vault_write
Persist new records to encrypted vault
HITL L0 — logged
read_file
Read any file on the host system
HITL L0 — auto
run_command
Execute shell commands — full OS access
HITL L3 — PIN gate
send_email
Draft + stage email via Signal agent
HITL L1 — voice approval
web_search
Real-time web queries via Scout agent
HITL L0 — auto
route_to_agent
Dispatch to BayouClaw domain agent via socket
HITL L0 — routed
board_meeting
Convene multi-agent synthesis session
HITL L1 — voice confirm
hitl_approve
Notify human about staged action, await approval
Policy-driven L0→L5
🎙️
Force Field active. Every Amber response strips 200+ AI tells: no "Certainly!", no "Great question!", no uniform rhythm, no em-dash overload. She varies sentence length, has opinions, uses contractions. Sounds like a person hired for the job — not a language model performing helpfulness.
07

Test Suite

Coverage State

413 Rust tests, 9 BayouClaw shell suites, 51 regression tests last session. Score: 88% (45/51). Two failures — both security, both fixed same session. No CI/CD pipeline. Tests run manually. That's the biggest engineering gap before production.

413
Vault (Rust)
9
BayouClaw Suites
51
Regression Tests
88%
Last Score 45/51
P3AK Vault (Rust)413 tests
Unit (vault-core)
265
CLI integration
54
Accuracy 98% Top-1
12
PyO3 native
44
Python SDK
35
BayouClaw (Shell)9 suites
E2E pipeline
Agent intelligence
Security audit
88%
Policy engine
HITL staging
Partial
Regression (Last Session)51 tests
Passed
45
Failed (fixed)
2
Skipped
4
Both failures: security category. Seccomp over-permission + outbox write-access misconfiguration. Both patched same session.
Coverage GapsNeeds work
CI/CD pipeline
None
Load / stress
None
Pen test (BayouClaw)
None
Spec traceability
None
TestNectar: spec → test case traceability closes the audit-readiness gap in one sprint.
🧪
TestNectar integration path: The HITL policy YAML, seccomp profiles, and vault format spec are machine-parseable requirements. Map them into TestNectar. Spec → coverage traceability is a one-sprint integration. Audit-ready test evidence on every push — exactly what enterprise customers ask for.
08

Where Sean Fits

Production Hardening Roadmap

P3AK's architecture is correct. The gaps are production readiness, not design. No CI/CD. No formal pen test. No automated key rotation. No load benchmarks. That's a QA and security engineering engagement. Here's the exact work.

Production Hardening
BayouClaw security surface
Pen test BayouClaw — exploit socket layer, policy engine, HITL bypass surface
Audit seccomp profiles — map actual syscall usage, tighten every profile
Red-team HITL staging — attempt outbox write from agent Linux user
Validate permission model holds across Docker restarts and rebuilds
Threat model the Iron Rule — document what CAN be bypassed and what cannot
Automated CI/CD
GitHub Actions pipeline
All 413 Rust tests on every PR (cargo test -p vault-core --lib)
All 9 BayouClaw shell suites in Docker on every push
Security regression suite — seccomp, HITL, outbox isolation
Badge reporting: test count, coverage %, last pen-test date
Staging promotion gate: all green → deploy to staging
Performance Benchmarks
JMeter profiles
Vault ops under concurrency: 1, 10, 50, 100 concurrent readers
Vault degradation curve: 1K, 10K, 100K documents ingested
BayouClaw socket queue: agent routing latency under load
Room API: page load + doc save under 10/50 concurrent users
Voice pipeline: TTFB distribution over 500 samples
Secret Rotation Pipeline
Automated key cycling
Design + implement scheduled vault-rekey via HITL L5
Policy config: 30/90-day rotation schedule per vault
Old key archival strategy (WAL + separate escrow)
YubiKey / HSM integration path (design doc)
Post-rekey canary check to validate vault integrity
TestNectar Integration
Spec → test traceability
Map HITL policy YAML as requirements into TestNectar
Map seccomp profiles as security requirements
Map vault encryption spec as compliance requirements
Generate audit-ready coverage matrix on every CI run
Export evidence package for enterprise security reviews
Kubernetes Readiness
Multi-node prep (Phase 2)
Stateless/stateful split: vault files need PV, agents don't
Per-agent K8s pod with seccomp SecurityContext
Service mesh evaluation: Linkerd vs Cilium for socket policy
Multi-vault sharding strategy (one vault per tenant)
Helm chart draft for BayouClaw deployment
🤝
The pitch is direct: P3AK has a security architecture that's correct in design and incomplete in verification. You have 20 years of QA, DevOps, and security engineering — Air Force-grade. That combination closes the gap between "architecturally sound" and "production-ready." That's the engagement.