Caroline
A 4-billion-parameter air-gapped hybrid neuromorphic LLM. Runs fully on a single Jetson AGX Orin with concurrent real-time vision and on-device adaptation. Sends nothing.
LLM throughput
0tok/s
FP16-native baseline
Vision pipeline
0FPS
YOLOv8n · p95 36.7 ms
Air-gap egress
0/4 blocked
kernel-enforced
Adaptation
0steps
full recovery · 576 B
Direct measurement · Jetson AGX Orin · Power mode MAXN · 2026-04-22 08:42:42 UTC
What it is
Caroline is the first model in the NeuratronLLM-Edge family. It runs an instruct LLM, a real-time computer-vision pipeline and a continual-learning loop on a single edge device — under a hard 60 W envelope, with kernel-enforced isolation, and with every number on this page traceable to a single forensic validation run.
Fully air-gapped
Inference, retrieval, tokenization and adaptation all execute on the device. The runtime is launched inside a Linux network namespace; 4 of 4 external egress probes are blocked by the kernel.
Hybrid neuromorphic
A proprietary multi-layer neuromorphic stack wraps the public base LLM. A sparse attention path prunes the dense pair count by an order of magnitude. Internal architecture withheld under NDA.
Adapts on-device
A bounded, integer-quantized update rule recovered 113.71% of the accuracy lost to a controlled distribution drift in 36 steps. Update footprint: 576 bytes — under the 1 KB target.
Baseline
Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary Neuramorphic stack lives on top. Base model identity, version and provenance are released under NDA.
Application surface
Field-service UI · offline build · streaming I/O
Agentic orchestration
Tool-calling · reasoning traces · local memory
Local retrieval
Vector store + reranker — fully on-device
Adaptation layer
Bounded sub-KB on-device update rule (NDA)
Sparse activation engine
Proprietary attention path (NDA)
Quantization map
Expert-aware allocation (NDA)
Hybrid neuromorphic kernel
Architecture withheld (NDA)
Validation host
Jetson AGX Orin · Linux netns · MAXN
Air-gap enforcement
The inference process is launched inside a Linux network namespace created withunshare -rn(CLONE_NEWUSER | CLONE_NEWNET). Inside that namespace, four representative external destinations are probed; all four egress attempts are rejected by the kernel. Local UNIX-socket IPC continues to work — confirming isolation is network-only and does not break local service wiring. No root privileges required.
Status
ENFORCED
Egress attempts
4 of 4 blocked
Local IPC
OK · positive control
Setup latency
65.79 ms
Namespace
user + net
Requires root
No
| Target | Reached | Kernel response |
|---|---|---|
| 1.1.1.1:443 | NO | OSError [Errno 101] Network is unreachable |
| 8.8.8.8:53 | NO | OSError [Errno 101] Network is unreachable |
| huggingface.co:443 | NO | gaierror [Errno -3] Temporary failure in name resolution |
| api.openai.com:443 | NO | gaierror [Errno -3] Temporary failure in name resolution |
| local UNIX socket (positive control) | YES | — |
Live inference server
All throughput numbers come from the same REST endpoint that ships in the field. The validation run uses FP16-native execution (no quantization) for a clean, deterministic baseline. The proprietary mixed-precision stack is available, configurable and validated separately under NDA.
LLM mean throughput
0tok/s
min 9.66 · max 9.78
Per-request latency p95
0s
mean 8.44 s · greedy decode
Vision FPS
0fps
YOLOv8n · 30 iterations
Vision latency p95
0ms
p99 37.62 ms · 4.5 det/frame
GPU utilization
0%
concurrent LLM + vision
Resident GPU memory
0GB
LLM + vision co-resident
Load time
0s
cold start · disk → GPU
Endpoint
0
http://127.0.0.1:8001 · LAN-only
FP32-safe norms
0layers
2 modules promoted · 64 FP16 tensors
Real-time vision
A public YOLOv8n detector runs concurrent with the LLM as a reference vision pipeline. Detection counts are real model outputs at the standard confidence threshold — no detection synthesis is performed.
YOLOv8n real inference at 24.8 FPS (p95 36.7 ms) on the Jetson Orin GPU, returning 4.5 detections per frame on real reference imagery — running concurrent with a 4 B-parameter LLM at full throughput.
The reference pipeline is intentionally a public model so that independent reviewers can re-run it against the published SHA-256 anchor.
On-device adaptation
A controlled distribution-drift experiment was run on a held-out classification task on the validation device. A bounded, low-footprint update rule (proprietary; details under NDA) was applied iteratively from the post-drift state. Below is what a deployment site would observe.
Baseline
0%
Post-drift
0%
Recovered
0%
Recovery vs baseline
0%
Steps to 90% baseline
0steps
from post-drift state
Steps to 95% baseline
0steps
full recovery threshold
Per-update footprint
0B
≤ 1 KB target envelope
Total recovery steps
0steps
bounded INT update rule
Proprietary IP layers
The table below reports only PASS / FAIL validation status and a sanitized verdict for each proprietary IP layer. Layer-internal parameters (firing rates, state-space spectra, fusion mixture coefficients, sparse-attention thresholds, adaptation rule rank / bounds / learning rate / optimizer, and per-tensor quantization bit allocations) are deliberately withheld and released only under a mutual NDA covering the corresponding patent filings.
Kernel layer A
PASSSparse activation pattern validated within target operating regime; homeostatic control verified active. Internal mechanism, parameters and operating set-points withheld.
Kernel layer B
PASSContinuous-time evolution numerically stable across the validated input range; output is finite and bounded. Initialization scheme and spectral parameters withheld.
Fusion core
PASSCross-modal fusion produces finite, well-conditioned outputs; mixture is non-degenerate and asymmetric as designed. Mixture coefficients and routing math withheld.
Sparse attention path
PASSProprietary attention path preserves numerical correctness while reducing the dense pair count by an order of magnitude. Pruning thresholds and top-k parameters withheld.
Adaptation rule
PASSBounded weight update enforced at every step; adaptation footprint kept within the target sub-KB envelope. Rank, bounds, learning rate, optimizer and update mechanism withheld.
Quantization map
AVAILABLEExpert-aware mixed-precision allocation map is implemented and reproducible from configuration; this validation run executed in FP16-native mode for a deterministic baseline. Per-tensor bit allocations, protected substrings and routing rules withheld.
Energy & thermal envelope
Power is estimated from a labelled utilization model (tegrastats utilization × NVIDIA Orin TDP envelope) over an 8-second concurrent workload of 180 iterations and 0 errors. On this OS image the on-board INA3221 sysfs interface requires root; the reviewer should treat power figures as utilization-anchored estimates rather than direct rail measurements.
Mean power
0W
peak 45.08 W
Platform envelope
0W
published TDP
GPU utilization
0%
mean across 8 s
CPU utilization
0%
GPU-dominant load
Junction temp (mean)
0°C
peak 42.06 °C
RAM in use
0GB
mean = peak (steady)
Sustained-load stability
Status
0STABLE
soak verdict
Duration
0s
continuous workload
Errors
0
24 requests served
Latency mean
0ms
warm decode
Latency p95
0ms
warm decode
Latency drift 1st ↔ 2nd half
0%
thermal- and load-stable
Where Caroline matters
Sovereign data, hard latency budgets, regulated facilities, intermittent or hostile connectivity. Caroline is built for the operating contexts where every other LLM architecture quietly assumes a backbone that doesn't exist.
Sovereign defense & intelligence edge
Disconnected reasoning on field hardware. Air-gap is enforced by the kernel, not by policy. Adaptation runs on the device with bounded sub-KB updates.
Industrial vision + reasoning
A single edge HPC node runs a real-time detector and the reasoning LLM concurrently at full envelope. Acts as both the perceptual layer and the explanation layer for the line.
Regulated on-prem environments
Run a capable instruct LLM behind the firewall with verifiable network isolation and a chain-of-custody artefact set for compliance review. Sector-specific deployments shared under NDA.
Hard-latency edge control
Sub-second reasoning co-located with the asset, under a hard 60 W envelope, with no cellular round-trip. Target verticals released under NDA.
Deployment tiers
Chain of custody
All numbers are derived from a single forensic run of the validation pipeline. The artefacts below carry SHA-256 content hashes that any reviewer can re-compute. Full 64-character digests are released under NDA for independent verification.
| Artefact | Size | SHA-256 (first 16) |
|---|---|---|
| Forensic JSON (raw measurements) | 12.1 KB | 72efb203fec2784f… |
| Forensic Markdown (human-readable) | 18.5 KB | 7cb9d1c00a88f32e… |
| Vision model weights (YOLOv8n, reference) | 6.25 MB | f59b3d833e2ff32e… |
Validation timestamp (UTC)
2026-04-22 08:42:42
Validation host
ubuntu · Jetson AGX Orin
Document issued (UTC)
2026-04-22 09:09:46
Classification
CONFIDENTIAL · IP-safe · NDA
FAQ
Caroline is the first model in the NeuratronLLM-Edge family: a 4-billion-parameter-class air-gapped hybrid neuromorphic LLM that runs fully on a single NVIDIA Jetson AGX Orin, with an integrated real-time vision pipeline and an on-device adaptation loop. No data leaves the device.
The inference process is launched inside a Linux network namespace (unshare -rn). Four representative external destinations (1.1.1.1:443, 8.8.8.8:53, huggingface.co:443, api.openai.com:443) are probed and 4 of 4 egress attempts are rejected by the kernel. Local UNIX-socket IPC remains functional as a positive control. No root privileges required.
On Jetson AGX Orin at FP16-native baseline: 9.74 tokens/second mean (min 9.66, max 9.78), per-request latency mean 8437.4 ms and p95 10249.0 ms. The concurrent vision pipeline (YOLOv8n) sustains 24.83 FPS with p95 latency 36.7 ms.
In a controlled drift experiment, baseline accuracy 67.2% dropped to 18.8% after drift, then recovered to 73.8% (113.71% of the original baseline) within 36 on-device steps. Per-update footprint is 576 bytes — under the 1 KB target.
Under concurrent LLM + vision workload at 98.25% GPU utilization: estimated mean power 44.52 W, peak 45.08 W — inside the published 60 W platform envelope. Junction temperature mean 40.92 °C, peak 42.06 °C.
No. Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary IP — hybrid neuromorphic kernel, sparse attention path, on-device adaptation rule and expert-aware quantization mapping — sits as a stack on top of the base model and is the subject of pending USPTO filings. Base model identity and version are released under NDA.
Sovereign defense and intelligence, regulated on-prem environments, industrial vision + reasoning and any operating context where data sovereignty, low latency and zero egress are non-negotiable. Vertical-specific deployments and references are released under NDA.
Every figure is derived from a single forensic run of the validation pipeline. Each artefact (forensic JSON, forensic Markdown, vision weights) carries a SHA-256 content hash that any reviewer can re-compute. Full digests are released under NDA.
Patent pending
The hybrid neuromorphic kernel, the on-device adaptation layer, the sparse attention path and the expert-aware quantization mapping that wrap the public base model are proprietary to Neuramorphic, Inc. and are protected by USPTO patent applications.
Implementation details, calibration parameters, adaptation dynamics and bit-allocation maps are confidential and shared under NDA. Contact legal@neuramorphic.ai.