We don't guess.
We verify.
AI Architect builds agents that prove what they claim. 97.8% recall on LongMemEval, 5 fused retrieval signals, zero LLM-judges-LLM. Every output is traceable, every claim is checked — by deterministic algorithms, not by another model's opinion.
Four principles. No exceptions.
Every claim has a source.
If an agent states a fact, that fact ties back to a memory, a file, a commit, or a citation. No assertion lives without a trail.
Zero LLM-judges-LLM.
Verification is deterministic: graph analysis, semantic checks, atomic claim decomposition. We don't ask one model whether another model is right.
Compounding context.
Cortex applies neuroscience — spreading activation, dream cycles, microglial pruning — so agents remember what worked, not just what happened.
Built for regulated work.
Every PRD, PR, decision and reasoning step is logged and reviewable. Designed against the same bar as financial-systems software.
Watch an agent prove its work.
This is the actual verification report from a generated PRD. 64 atomic claims, decomposed and checked against six independent algorithms. The full audit trail lives next to the deliverable — not buried in a log file.
Cortex memory. Zetetic reasoning. Verified pipeline. One platform.
Cortex
Persistent memory for Claude Code.
LongMemEval
Zetetic Agents
Reasoning patterns. One epistemic standard.
97 + 19 specialists
Automatised Pipeline
Read-only codebase intelligence.
10 stages
PRD Spec Generator
Stateless reducer. Feature description → verified PRD.
9 pipeline steps
Hire us to build it. Or build it yourself with our tools.
Every component is open source, MIT-licensed, and shipping in production. The choice is whether you want the system handed to you — or the keys to do it yourself.
We build the
agent with you.
For operators who know AI should help — but don't want to spend six months stitching tutorials together. We design and ship the agent against your real infrastructure. Same engineering bar as the financial systems we build by day.
- Discovery — find the one workflow worth automating
- Built against your CRM, data, internal tools — not a sandbox
- Verification baked in: every action is auditable
- Hand-over with documentation & 30 days of post-launch support
Grab the templates.
Ship faster.
If you build with AI yourself, use the same components we use in production. Cortex, Zetetic Agents, Automatised Pipeline, and PRD Spec Generator — fully documented, MIT-licensed, no telemetry, no lock-in.
- Cortex — persistent memory with neuroscience-backed retrieval
- Zetetic Agents — 116 reasoning patterns, one epistemic standard
- Automatised Pipeline — read-only codebase intelligence (Rust MCP)
- PRD Spec Generator — 11 MCP tools, 9-step pipeline + 2-phase multi-judge verification, specialized panels by claim type
From "we should try AI" to a system you can audit. Four stages.
Every engagement runs the same protocol — same one our open-source pipeline runs internally. You see working software early, you see verification at every step, and you own everything when we're done.
Discovery & framing
We map the workflow where an agent will actually move the needle. No slide decks, no AI theater. Output: a one-page spec with success criteria you can measure against.
Build with verification
Agent designed against your real systems. Every component ships with tests, provenance and an audit log. You watch it work in /cortex-visualize as we go.
Hand-over
Deployed to your infra. Runbooks, dashboards, and a verification report on every shipped feature. Your team is trained on how to extend it — not dependent on us forever.
Compounding
Cortex memory means the agent gets smarter every week without retraining. We stay on call for the first 30 days; after that, you own a system you can audit, evolve, and keep running.
founder
I ship critical systems by day. I research how agents should think by night.
By day I build software in financial infrastructure, where "mostly works" never ships. Every system has to be tested, verified, auditable. Or it doesn't go live.
By night I apply that same bar to AI. I started AI Architect because I kept seeing the same anti-pattern: teams treating agents like demos, stacking prompts on prompts, asking another LLM whether the first one got it right, and wondering why nothing held up in production.
The work here is zetetic — every claim is investigated, never assumed. The tools are open source because the frontier should be shared. The consulting exists because some teams need the system built with them, not handed a repo and a prayer.
"An agent without memory isn't intelligent. An agent without verification isn't trustworthy. I'm only interested in building both."
The papers behind the numbers.
Read them. Help us publish.
Stage-Aware Context Assembly for Long-Context Memory Retrieval
A structured context-assembly architecture that recovers the geometric degradation of dense vector retrieval at the 10M-token scale. Two primitives: a priority-budgeted prompt decomposer with domain-aware condensers, and a two-phase stage-aware assembler with submodular coverage selection, Personalized PageRank entity-graph traversal, and schema-structured summary fallback.
On BEAM-10M, the assembler reaches 0.471 MRR — +33.4% over the flat baseline, with 8 of 10 memory abilities improving. Originally designed in September 2025 for Apple Intelligence's 4,096-token window — one month before the BEAM benchmark itself was published.
Thermodynamic Memory vs. Flat-Importance Stores: Why Long-Term Retrieval Collapses Without Decay
External memory for LLMs is dominated by flat-importance stores — vector indexes, BM25 corpora, long-context buffers — in which every item carries the same long-term retrieval prior. This design is asymptotically broken: as the corpus grows, top-k retrieval degenerates into near-arbitrary tie-breaking.
This paper formalises the collapse and describes Cortex, a memory architecture that maintains a non-flat priority distribution across N by coupling four mechanisms: continuously decaying heat (Ebbinghaus), a hierarchical predictive-coding write gate (Friston), consolidation cascades (Kandel, McClelland), and WRRF fusion with heat as tie-breaker. Result: LongMemEval R@10 = 98.4% (vs 78.4% paper-best), LoCoMo R@10 = 94.2%, BEAM Overall = 0.591 (vs 0.329 paper-best).
Forthcoming — preprint #3: HALO retrieval. Drafting in progress; will join the same endorsement queue once complete.
The science behind the system.
Cortex draws from 41 peer-reviewed papers across neuroscience, memory research and AI evaluation. A few of the load-bearing ones:
Before you
book a call.
/plugin marketplace add cdeust/agentic-ai && /plugin install memory@agentic-ai.distribution_suspicious flag catches confirmatory bias. NFR claims never receive PASS — only SPEC-COMPLETE or NEEDS-RUNTIME.UNSOURCED / MAGIC_NUMBER / TODO_NO_REF.Step 1 — add the marketplace once:
/plugin marketplace add cdeust/agentic-aiStep 2 — install any of the four:
/plugin install memory@agentic-ai — Cortex persistent memory (requires PostgreSQL + pgvector)/plugin install reasoning@agentic-ai — 97 reasoning patterns + 19 specialists/plugin install codebase@agentic-ai — codebase graph + semantic search (Rust toolchain required; builds on first install)/plugin install prd@agentic-ai — PRD pipeline with multi-judge verification (Node 20.x or 22.x)All four interoperate — memory remembers, reasoning reasons, codebase maps, prd adjudicates the spec.
Tell us what you want the agent to do.
We'll tell you if it can be verified.
A 30-minute call. No pitch deck, no commitment. If your problem doesn't fit what we do, we'll point you somewhere that does.