NSP is the protocol layer that makes autonomous AI agents reliable enough for production — enforcing structured belief management, confidence tracking, and cognitive gating before any action executes.
What NSP is: Cognitive infrastructure sitting between memory/retrieval and agent execution. Adds structured belief management and verification that no other production system provides.
What NSP is not: Not an LLM wrapper. Not an agent framework. Not a memory store. NSP occupies the uncontested cognitive state layer of the agentic AI stack.
Third-party project mempalace #565 independently converged on the same five cognitive primitives — convergent evolution from an unrelated starting point.
Every production agent can perceive, reason, remember, and act. None of them can verify they understand before they act.
Agents modify code they don't understand. Average prompt exceeds 20K tokens with no comprehension gate before destructive operations.
Prompt-engineered personality degrades as context fills. No mathematical consistency model — character breaks unpredictably.
Zero auditable reasoning trail. No structured record of what data was considered or what was uncertain — a compliance blocker.
No mechanism to detect and correct strategic errors. Failed approaches repeat indefinitely without cognitive oversight.
Reasoning models now >50% of usage. The market is no longer Q&A — it is agentic workflows that demand cognitive infrastructure.
Protocol-layer capture model: NSP sits beneath all verticals like Stripe under e-commerce. 1% protocol capture = $3.5B revenue opportunity by 2030.
Reasoning models >50% of usage. Tool invocation rising. Prompt lengths 4× longer. Agents need infrastructure that verifies they act correctly.
Every deployment decision involves "but can we trust it?" NSP provides auditable cognitive state management — a board-level concern in regulated industries.
EU AI Act enforced Aug 2025. FDA AI/ML guidance. Global regulations require AI decision auditability. NSP provides compliance infrastructure by design.
275,000+ lines of cognitive infrastructure. Not a wrapper — the full cognitive architecture: belief management, confidence calibration, CIA Layers 0–4, cross-domain knowledge graph, and a mathematical personality engine grounded in dynamical systems theory.
Same cognitive architecture validated across five distinct production domains — the platform thesis in action.
Hooks into AI coding assistants via PreToolUse gate. Enforces structured understanding before every code edit. Live cognitive state dashboard. Architecture-agnostic — plugs into any tool-use-capable LLM.
Mathematical personality engine via cusp-catastrophe dynamics. Zero LLM tokens per state update. 2,300+ tests. Sleep/dream cycle. Multi-character sessions with independent belief states.
Self-correcting research framework. Applied to Anthropic's public benchmark — reached 1,291 cycles (114.4× speedup), passing all 8 thresholds. Result reproduced across 6 random seeds.
Tracks what students actually understand vs. what they can repeat. Persistent knowledge graph per learner. Active misconception detection. Adapts based on verified mastery state.
Mathematical character evolution at game-tick speed. Viable for real-time simulation with hundreds of simultaneous characters. No LLM per tick — the same math-driven state layer is one step from a World Model SDK.
NSP does not compete with memory systems or agent frameworks — they are complementary. A production AI agent needs all three.
| Capability | Memory Systems (Mem0, Zep) | Agent Frameworks (LangChain) | NSP |
|---|---|---|---|
| Belief dynamics | — | — | Cusp catastrophe engine |
| Confidence tracking | — | — | Per-turn calibration |
| Action gating | — | — | PreToolUse verification |
| Personality consistency | — | — | Mathematical state machine |
| Auditability / belief trace | — | — | Full trace by design |
| Fact retrieval | Core strength | — | Not the focus |
| Tool orchestration | — | Core strength | Not the focus |
275,000+ lines across schemas, mathematical models, and lifecycle management. Requires rethinking architectural foundations to replicate.
Same belief dynamics validated across coding, roleplay, research, and education. Independently corroborated by mempalace #565 — convergent evolution is strong architectural validation.
Cusp catastrophe engine grounded in dynamical systems theory. Produces principled behaviour — noise tolerance, signal detection, threshold revision.
Every interaction produces structured cognitive data. Competitors without state management cannot generate this class of data — a permanent structural advantage.
Coding-first revenue architecture — the natural wedge where the Claude Code integration is already in production and the buyer has AI tooling budget.
| Tier | Annual Price | Target |
|---|---|---|
Team | $30K/yr | 5–25 AI engineers |
Enterprise | $78K/yr | 25–200 AI engineers |
Enterprise+ | $180K+/yr | 200+ engineers / platform |
Priced as infrastructure, not tooling. Benchmarked against Datadog / Sentry observability — not GitHub Copilot per-seat — to protect margin and positioning.
| Phase | Timeline | Target ARR |
|---|---|---|
| Seed · Pilots | Year 1 | $0.5M |
| Growth · Platform | Year 2 | $3.5M |
| Scale · Multi-vertical | Year 3 | $12M |
| Platform · Ecosystem | Year 5+ | $35M+ |
75–85% gross margin target. State management runs on zero LLM tokens. Margin improves with scale.
Every NSP interaction produces structured cognitive data with precise schema — not logs or opaque vectors.
When and why AI beliefs change — calibration benchmarks for new models and improved gating thresholds.
Per-turn confidence vs. outcome records — feeds improved CIA trigger thresholds across all runtimes.
Which cognitive interventions succeeded or failed — drives CIA architecture improvements across all domains.
Structural cost advantage at scale: NSP-Roleplay's personality and state engines run as pure computation — zero LLM tokens per state update. In consumer-scale companion mode, this is an order-of-magnitude cost advantage vs. competitors whose state maintenance consumes tokens per operation.
5 production runtimes. 5,142+ tests. CIA Layers 0–4. Cusp catastrophe engine validated. Cross-domain knowledge graph (27 nodes, 17 bridges). 114.4× speedup on Anthropic benchmark. Two technical papers published, Paper 3 before May 2026.
Companion Mode MVP. NSP-Coding enterprise plugin. 3–5 design partner pilots. Professional tier launch. SDK + documentation.
Healthcare AI audit layer. Education platform. Data flywheel first products. Third-party ecosystem via SDK. Asia-Pacific expansion.
World model integration. Industry standards push. Government / regulated sectors. Complementary memory system acquisitions.
Comparable companies at similar product maturity:
| Company | Category | Valuation | Revenue | Date |
|---|---|---|---|---|
| Cursor (Anysphere) | AI Coding | $29.3B | $1B ARR | Nov 2025 |
| Cognition AI (Devin) | AI Coding Agent | $14.5B | $73M ARR | Jun 2025 |
| Hippocratic AI | Healthcare AI Agents | $3.5B | Early revenue | Nov 2025 |
| Abridge | AI Clinical Docs | $5.3B | ~$100M ARR | Jun 2025 |
| Character.ai | AI Companions | $1B+ (Series A) | — | — |
Product maturity comparable to Series A AI startups. Discount reflects commercial traction, not engineering evidence.
Five runtimes, 275K+ lines, 5,142+ tests, three-paper arc, 114.4× speedup on Anthropic's public benchmark, independently corroborated architecture. This is our anchor.
Category creation premium — if investors believe NSP establishes cognitive state management as standard infrastructure like observability or auth.
Two raise scenarios with clear capital allocation and a defined path to Series A.
18-month ARR target: $2M–4M. The number that unlocks a credible Series A at the platform valuation. $15M raise is sufficient if GTM stays disciplined around the coding beachhead.
In 12–18 months this will be an established category with competition.
NSP is defining a new infrastructure category. Early investors in Stripe, Twilio, and Databricks captured maximum value before the category was recognised. No established price anchors yet.
Agentic AI is becoming the default mode of LLM usage. The window to set the protocol standard is open right now — not in 18 months.
NSP's value increases super-linearly with domain coverage — each runtime makes all others more valuable.
The moat is the accumulated cognitive architecture backed by two published papers and a third-party benchmark anchor. Any single component can be reimplemented in a quarter; the full architecture cannot.
State engines run on pure computation — the inverse of LLM-wrapper competitors whose costs scale linearly with usage.
The cognitive state layer is currently uncontested by production infrastructure. NSP is the most mature implementation in that layer today.
Phase 1: Coding beachhead (M1–18) · Phase 2: Platform expansion (M19–36) · Series A modelled at M24
Team-based annual license. Benchmarked vs. observability tools, not Copilot per-seat.
Usage-based. NSP's zero-token advantage makes this strongly ROI-positive at scale.
Core computation is non-LLM. Low COGS. Margin improves with scale.
Core argument: The plan tries to monetise five verticals simultaneously. That is not a revenue model — it is a wish list. Every successful infrastructure company picked one wedge first. NSP's natural wedge is NSP-Coding: the Claude Code integration exists and is in production, the customer problem is urgent and has a clear dollar cost, and the buyer already has AI tooling budget.
Direct enterprise sales to AI engineering teams deploying AI coding agents. NSP-Coding as a standalone enterprise plugin for Claude Code plus an API layer for teams building custom agents.
| Month | Action | Revenue Signal |
|---|---|---|
| 0–3 | 3–5 design partner pilots, free with case-study obligation | $0 · high signal |
| 3–6 | Convert 2–3 pilots to paid, publish benchmark results | $150K–300K ARR |
| 6–12 | 8–12 enterprise accounts via referrals + enterprise BD outreach | $600K–1.2M ARR |
| 12–18 | Companion mode B2B beta with 2–3 licensing partners | $1.5M–3M ARR |
18-month ARR target: $2M–4M. This is the number that unlocks a credible Series A at the platform valuation the pitch claims.
License personality engine to companion app developers. API integration — low-touch, high-margin. Usage-based per active character session.
Enter through 2–3 large EdTech platforms as embedded feature, not standalone product. Per-MAU licensing fee.
Fixed-term contracts with pharma or materials science labs. Long sales cycles but very high ACV ($200K–500K/contract).
Package cognitive transition data as benchmarking API for LLM providers and enterprise AI teams.
| Year | ARR | Headcount | Annual Burn | Net Cash Position |
|---|---|---|---|---|
| Year 1 | $0.5M | 12 | $6.0M | $9.0M |
| Year 2 | $3.5M | 22 | $9.5M | → raise Series A |
| Year 3 | $12.0M | 38 | $14.0M | Series A funded |
The $15M raise is sufficient to reach Series A if GTM stays disciplined around the coding beachhead. The $30M raise is only necessary for a multi-vertical parallel push — a riskier posture at this stage.
$15M raise · Base case · Monthly view · All figures $K
Team-based annual license. Benchmarked vs. Datadog/Sentry observability tools, not GitHub Copilot per-seat.
Usage-based per active character session. NSP's zero-LLM-token advantage makes this strongly ROI-positive at scale.
Core computation is non-LLM → low COGS. State management consumes zero tokens; LLM inference only on extraction turns. Companion-scale deployments see margin improve most strongly with scale.
Core argument: The most important pricing decision is to avoid per-seat comparison to GitHub Copilot ($19/seat/mo). NSP is infrastructure, not a productivity tool. Price as infrastructure — team-based annual license benchmarked against observability tools like Datadog — to protect margin and positioning.
| Tier | Target | Price | Includes |
|---|---|---|---|
Team | 5–25 AI engineers | $2,500/mo · $30K/yr | NSP-Coding hooks, dashboard, standard support, 90-day audit log |
Enterprise | 25–200 AI engineers | $6,500/mo · $78K/yr | Team + custom schemas, SLA 99.5%, SSO, 2-year audit log, onboarding |
Enterprise+ | 200+ engineers or platform | $15K+/mo · custom | Multi-tenant deploy, dedicated infra, compliance exports, custom integrations |
Team tier ($30K/yr) sits below the "needs procurement" threshold at most companies (~$25–50K), enabling direct engineering leader purchase. Enterprise benchmarked against Datadog, Sentry, incident.io — all reliability/observability tools with similar value framing.
| Tier | Monthly Min. | Per-Session | Target |
|---|---|---|---|
Indie | $500 | $0.008 | <50K MAU |
Growth | $2,000 | $0.005 | 50K–500K MAU |
Scale | $8,000 | $0.002 | 500K+ MAU |
At 100K DAU × 5 sessions/day at $0.005 = ~$75K/mo. NSP's zero-token personality engine saves the licensee 10%+ in inference costs at this scale — strong ROI.
| Contract | Price | Duration |
|---|---|---|
Pilot | $50K | 3 months |
Annual | $200K–500K | 12 months |
Strategic | $500K+ | Multi-year |
| Model | Rate |
|---|---|
| Per-active-learner/month | $0.50–$1.50 |
| Platform integration fee (one-time) | $25K–75K |
| Annual minimum | $60K |
Removing open-core eliminates the primary developer adoption engine without replacing it. A trial mechanism is required to maintain bottom-up discovery.
Full NSP-Coding access, no credit card required, one workspace. Standard SaaS conversion motion. Converts at 15–25% for infrastructure tools with genuine product value.
NSP-Coding for personal/non-commercial projects, capped at 500 gate invocations/day. Drives developer familiarity and word-of-mouth without cannibalising Team tier.
The NSP cognitive state layer is architecture-agnostic by construction and runs on any tool-use-capable LLM. The current distribution channel for NSP-Coding, however, is Claude Code's PreToolUse hook, which is Anthropic-specific. If Anthropic changes that hook API, restricts third-party access, or absorbs cognitive gating natively, the integration surface would need to re-target — the underlying protocol would not change, but the go-to-market for the coding wedge would slow.
| Priority | Action | Timeline |
|---|---|---|
| Immediate | Document the Claude-specific integration surface vs. the vendor-neutral protocol layer | Month 1 |
| Short-term | Ship integration adapters for OpenAI and Google tool-use APIs — no protocol rewrite required | Month 3–6 |
| Medium | Certify NSP-Coding on current-generation OpenAI and Google frontier models | Month 6–12 |
| Strategic | Formal API partnership with Anthropic — convert dependency into co-development lock-in | Ongoing |
Anthropic's extended thinking already gives Claude introspective capabilities. A natural extension is structured belief state output — which is exactly what NSP's extraction layer does. This risk is absent from the current plan entirely.
The cognitive gate adds latency before every PreToolUse event. Enterprise buyers will ask this question in the first sales call. No benchmarks are published.
| Component | Est. Latency |
|---|---|
| Belief state retrieval (YAML) | < 5ms |
| Confidence threshold check | < 1ms |
| Understanding extraction (LLM call) | 400ms – 1,200ms |
| Gate decision + state write-back | < 15ms |
| Total (gated turn) | ~420ms – 1,220ms |
The platform thesis rests on NSP's ability to carry the same cognitive machinery across coding, research, music, and companion domains. Two domains (coding + research) are shipped. The music domain is still in pilot, and formal generalisation proofs appear in Paper 3 before May 2026.
All published benchmarks to date use Claude. This is an evidence gap, not a capability gap — NSP's cognitive state layer is architecture-agnostic by design. Enterprise customers with existing OpenAI or Google contracts will nonetheless ask for a head-to-head benchmark on their stack before committing.
| Risk | Severity | Likelihood | Mitigation Complexity | Action Required |
|---|---|---|---|---|
| Distribution-channel concentration | High | Medium | Low | Integration adapters + Anthropic partnership |
| Foundation model native competition | High | Medium-High | Medium | API partnership + audit trail moat |
| Gate latency overhead | Medium-High | High (will be raised) | Low | Selective gating + benchmarks |
| Cross-domain generalisation proof | Medium | Low (nearly closed) | Low | Paper 3 + Anthropic benchmark |
| Multi-LLM benchmark evidence gap | Medium | Low (fixable) | Low | Run benchmarks, publish |