NeuratronLLM-Edge 4B Caroline — Air-gapped hybrid neuromorphic LLM, validated on Jetson AGX Orin.
Generation 1 · Forensic-validated · Available under NDA

NeuratronLLMEdge4B

Caroline

A 4-billion-parameter air-gapped hybrid neuromorphic LLM. Runs fully on a single Jetson AGX Orin with concurrent real-time vision and on-device adaptation. Sends nothing.

Scroll

LLM throughput

0tok/s

FP16-native baseline

Vision pipeline

0FPS

YOLOv8n · p95 36.7 ms

Air-gap egress

0/4 blocked

kernel-enforced

Adaptation

0steps

full recovery · 576 B

Direct measurement · Jetson AGX Orin · Power mode MAXN · 2026-04-22 08:42:42 UTC

What it is

A foundation model engineered for places where the cloud isn't welcome.

Caroline is the first model in the NeuratronLLM-Edge family. It runs an instruct LLM, a real-time computer-vision pipeline and a continual-learning loop on a single edge device — under a hard 60 W envelope, with kernel-enforced isolation, and with every number on this page traceable to a single forensic validation run.

Fully air-gapped

Inference, retrieval, tokenization and adaptation all execute on the device. The runtime is launched inside a Linux network namespace; 4 of 4 external egress probes are blocked by the kernel.

Hybrid neuromorphic

A proprietary multi-layer neuromorphic stack wraps the public base LLM. A sparse attention path prunes the dense pair count by an order of magnitude. Internal architecture withheld under NDA.

Adapts on-device

A bounded, integer-quantized update rule recovered 113.71% of the accuracy lost to a controlled distribution drift in 36 steps. Update footprint: 576 bytes — under the 1 KB target.

Baseline

Built on a public, US-origin base.

Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary Neuramorphic stack lives on top. Base model identity, version and provenance are released under NDA.

Base model topology

Public class
FamilyOpen-weights instruct LLM
OriginUS-licensed
Class~4 B parameters
AttentionGrouped-query (GQA)
Context window4 K tokens
Position encodingRoPE
Precision (validation)FP16-native
Loaded parameters~4.2 B
Weights on disk~8 GB
Detailed specUnder NDA

Neuramorphic stack

Proprietary · IP-safe
L7

Application surface

Field-service UI · offline build · streaming I/O

L6

Agentic orchestration

Tool-calling · reasoning traces · local memory

L5

Local retrieval

Vector store + reranker — fully on-device

L4

Adaptation layer

Bounded sub-KB on-device update rule (NDA)

L3

Sparse activation engine

Proprietary attention path (NDA)

L2

Quantization map

Expert-aware allocation (NDA)

L1

Hybrid neuromorphic kernel

Architecture withheld (NDA)

L0

Validation host

Jetson AGX Orin · Linux netns · MAXN

Air-gap enforcement

Network-isolated by the kernel. Not by promise.

The inference process is launched inside a Linux network namespace created withunshare -rn(CLONE_NEWUSER | CLONE_NEWNET). Inside that namespace, four representative external destinations are probed; all four egress attempts are rejected by the kernel. Local UNIX-socket IPC continues to work — confirming isolation is network-only and does not break local service wiring. No root privileges required.

Status

ENFORCED

Egress attempts

4 of 4 blocked

Local IPC

OK · positive control

Setup latency

65.79 ms

Namespace

user + net

Requires root

No

TargetReachedKernel response
1.1.1.1:443NOOSError [Errno 101] Network is unreachable
8.8.8.8:53NOOSError [Errno 101] Network is unreachable
huggingface.co:443NOgaierror [Errno -3] Temporary failure in name resolution
api.openai.com:443NOgaierror [Errno -3] Temporary failure in name resolution
local UNIX socket (positive control)YES

Live inference server

Real numbers. Single Jetson AGX Orin 64 GB.

All throughput numbers come from the same REST endpoint that ships in the field. The validation run uses FP16-native execution (no quantization) for a clean, deterministic baseline. The proprietary mixed-precision stack is available, configurable and validated separately under NDA.

LLM mean throughput

0tok/s

min 9.66 · max 9.78

Per-request latency p95

0s

mean 8.44 s · greedy decode

Vision FPS

0fps

YOLOv8n · 30 iterations

Vision latency p95

0ms

p99 37.62 ms · 4.5 det/frame

GPU utilization

0%

concurrent LLM + vision

Resident GPU memory

0GB

LLM + vision co-resident

Load time

0s

cold start · disk → GPU

Endpoint

0

http://127.0.0.1:8001 · LAN-only

FP32-safe norms

0layers

2 modules promoted · 64 FP16 tensors

Real-time vision

Concurrent perception. Same GPU. Same envelope.

A public YOLOv8n detector runs concurrent with the LLM as a reference vision pipeline. Detection counts are real model outputs at the standard confidence threshold — no detection synthesis is performed.

Vision pipeline

Reference · public
ModelYOLOv8n
Parameters3,151,904
Weights size6.25 MB
SHA-256 (first 16)f59b3d833e2ff32e…
Iterations30
Latency p50 / p95 / p9935.66 / 36.7 / 37.62 ms
Mean FPS24.83
Detections / frame (mean)4.5
Resident GPU memory28.3 MB · peak 169.9 MB

Verdict

Pass

YOLOv8n real inference at 24.8 FPS (p95 36.7 ms) on the Jetson Orin GPU, returning 4.5 detections per frame on real reference imagery — running concurrent with a 4 B-parameter LLM at full throughput.

The reference pipeline is intentionally a public model so that independent reviewers can re-run it against the published SHA-256 anchor.

On-device adaptation

Recovers from drift. Without the round-trip.

A controlled distribution-drift experiment was run on a held-out classification task on the validation device. A bounded, low-footprint update rule (proprietary; details under NDA) was applied iteratively from the post-drift state. Below is what a deployment site would observe.

Baseline

0%

Post-drift

0%

Recovered

0%

Recovery vs baseline

0%

Steps to 90% baseline

0steps

from post-drift state

Steps to 95% baseline

0steps

full recovery threshold

Per-update footprint

0B

≤ 1 KB target envelope

Total recovery steps

0steps

bounded INT update rule

Status · FULLY-RECOVERED. The on-device adaptation rule recovered 113.71% of the accuracy lost to the controlled drift, reaching 95% of the pre-drift baseline within 36 on-device steps. Update footprint stayed within the sub-KB envelope. Adaptation rule architecture, rank, learning rate, optimizer and update bounds are withheld and available under NDA.

Proprietary IP layers

Validation status. Architecture withheld.

The table below reports only PASS / FAIL validation status and a sanitized verdict for each proprietary IP layer. Layer-internal parameters (firing rates, state-space spectra, fusion mixture coefficients, sparse-attention thresholds, adaptation rule rank / bounds / learning rate / optimizer, and per-tensor quantization bit allocations) are deliberately withheld and released only under a mutual NDA covering the corresponding patent filings.

Kernel layer A

PASS

Sparse activation pattern validated within target operating regime; homeostatic control verified active. Internal mechanism, parameters and operating set-points withheld.

Kernel layer B

PASS

Continuous-time evolution numerically stable across the validated input range; output is finite and bounded. Initialization scheme and spectral parameters withheld.

Fusion core

PASS

Cross-modal fusion produces finite, well-conditioned outputs; mixture is non-degenerate and asymmetric as designed. Mixture coefficients and routing math withheld.

Sparse attention path

PASS

Proprietary attention path preserves numerical correctness while reducing the dense pair count by an order of magnitude. Pruning thresholds and top-k parameters withheld.

Adaptation rule

PASS

Bounded weight update enforced at every step; adaptation footprint kept within the target sub-KB envelope. Rank, bounds, learning rate, optimizer and update mechanism withheld.

Quantization map

AVAILABLE

Expert-aware mixed-precision allocation map is implemented and reproducible from configuration; this validation run executed in FP16-native mode for a deterministic baseline. Per-tensor bit allocations, protected substrings and routing rules withheld.

Energy & thermal envelope

Inside the platform envelope. By a wide margin.

Power is estimated from a labelled utilization model (tegrastats utilization × NVIDIA Orin TDP envelope) over an 8-second concurrent workload of 180 iterations and 0 errors. On this OS image the on-board INA3221 sysfs interface requires root; the reviewer should treat power figures as utilization-anchored estimates rather than direct rail measurements.

Mean power

0W

peak 45.08 W

Platform envelope

0W

published TDP

GPU utilization

0%

mean across 8 s

CPU utilization

0%

GPU-dominant load

Junction temp (mean)

0°C

peak 42.06 °C

RAM in use

0GB

mean = peak (steady)

Sustained-load stability

60 seconds. 24 requests. Zero errors.

Status

0STABLE

soak verdict

Duration

0s

continuous workload

Errors

0

24 requests served

Latency mean

0ms

warm decode

Latency p95

0ms

warm decode

Latency drift 1st ↔ 2nd half

0%

thermal- and load-stable

Where Caroline matters

For environments where the cloud is the wrong answer.

Sovereign data, hard latency budgets, regulated facilities, intermittent or hostile connectivity. Caroline is built for the operating contexts where every other LLM architecture quietly assumes a backbone that doesn't exist.

Sovereign defense & intelligence edge

Disconnected reasoning on field hardware. Air-gap is enforced by the kernel, not by policy. Adaptation runs on the device with bounded sub-KB updates.

Disconnected opsSovereignBounded update

Industrial vision + reasoning

A single edge HPC node runs a real-time detector and the reasoning LLM concurrently at full envelope. Acts as both the perceptual layer and the explanation layer for the line.

Line-sideVision + LLMConcurrent

Regulated on-prem environments

Run a capable instruct LLM behind the firewall with verifiable network isolation and a chain-of-custody artefact set for compliance review. Sector-specific deployments shared under NDA.

On-premComplianceAuditable

Hard-latency edge control

Sub-second reasoning co-located with the asset, under a hard 60 W envelope, with no cellular round-trip. Target verticals released under NDA.

Edge HPCLow-latencyNo round-trip

Deployment tiers

Where Caroline runs.

Edge HPCActive
Jetson AGX Orin 64 GB
Air-gapped LLM + vision + agent + on-device adaptation
Edge
Jetson Orin Nano 8 GB
Offline assistant, vision triage (distilled profile)
Workstation
Discrete Ampere/Ada GPU
Pre-deployment soak and regression suite (internal)
Training
Internal cluster
Model factory — not customer-facing

Chain of custody

Every figure on this page is anchored.

All numbers are derived from a single forensic run of the validation pipeline. The artefacts below carry SHA-256 content hashes that any reviewer can re-compute. Full 64-character digests are released under NDA for independent verification.

ArtefactSizeSHA-256 (first 16)
Forensic JSON (raw measurements)12.1 KB72efb203fec2784f…
Forensic Markdown (human-readable)18.5 KB7cb9d1c00a88f32e…
Vision model weights (YOLOv8n, reference)6.25 MBf59b3d833e2ff32e…

Validation timestamp (UTC)

2026-04-22 08:42:42

Validation host

ubuntu · Jetson AGX Orin

Document issued (UTC)

2026-04-22 09:09:46

Classification

CONFIDENTIAL · IP-safe · NDA

FAQ

What reviewers ask first.

Caroline is the first model in the NeuratronLLM-Edge family: a 4-billion-parameter-class air-gapped hybrid neuromorphic LLM that runs fully on a single NVIDIA Jetson AGX Orin, with an integrated real-time vision pipeline and an on-device adaptation loop. No data leaves the device.

The inference process is launched inside a Linux network namespace (unshare -rn). Four representative external destinations (1.1.1.1:443, 8.8.8.8:53, huggingface.co:443, api.openai.com:443) are probed and 4 of 4 egress attempts are rejected by the kernel. Local UNIX-socket IPC remains functional as a positive control. No root privileges required.

On Jetson AGX Orin at FP16-native baseline: 9.74 tokens/second mean (min 9.66, max 9.78), per-request latency mean 8437.4 ms and p95 10249.0 ms. The concurrent vision pipeline (YOLOv8n) sustains 24.83 FPS with p95 latency 36.7 ms.

In a controlled drift experiment, baseline accuracy 67.2% dropped to 18.8% after drift, then recovered to 73.8% (113.71% of the original baseline) within 36 on-device steps. Per-update footprint is 576 bytes — under the 1 KB target.

Under concurrent LLM + vision workload at 98.25% GPU utilization: estimated mean power 44.52 W, peak 45.08 W — inside the published 60 W platform envelope. Junction temperature mean 40.92 °C, peak 42.06 °C.

No. Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary IP — hybrid neuromorphic kernel, sparse attention path, on-device adaptation rule and expert-aware quantization mapping — sits as a stack on top of the base model and is the subject of pending USPTO filings. Base model identity and version are released under NDA.

Sovereign defense and intelligence, regulated on-prem environments, industrial vision + reasoning and any operating context where data sovereignty, low latency and zero egress are non-negotiable. Vertical-specific deployments and references are released under NDA.

Every figure is derived from a single forensic run of the validation pipeline. Each artefact (forensic JSON, forensic Markdown, vision weights) carries a SHA-256 content hash that any reviewer can re-compute. Full digests are released under NDA.

Patent pending

The hybrid neuromorphic kernel, the on-device adaptation layer, the sparse attention path and the expert-aware quantization mapping that wrap the public base model are proprietary to Neuramorphic, Inc. and are protected by USPTO patent applications.

Implementation details, calibration parameters, adaptation dynamics and bit-allocation maps are confidential and shared under NDA. Contact legal@neuramorphic.ai.