What is the power and thermal envelope?

Under concurrent LLM + vision workload at 98.25% GPU utilization: estimated mean power 44.52 W, peak 45.08 W — well inside the published 60 W platform envelope. Junction temperature mean 40.92 °C, peak 42.06 °C.

Product Use Cases

Performance Blog Events

Generation 1 · Forensic-validated · Available under NDA

NeuratronLLMEdge4B

Name: NeuratronLLM-Edge 4B — Caroline
Brand: Neuramorphic
Availability: LimitedAvailability

Caroline

A 4-billion-parameter air-gapped hybrid neuromorphic LLM. Runs fully on a single Jetson AGX Orin with concurrent real-time vision and on-device adaptation. Sends nothing.

Request a briefing See forensic numbers

Scroll

LLM throughput

0tok/s

FP16-native baseline

Vision pipeline

0FPS

YOLOv8n · p95 36.7 ms

Air-gap egress

0/4 blocked

kernel-enforced

Adaptation

0steps

full recovery · 576 B

Direct measurement · Jetson AGX Orin · Power mode MAXN · 2026-04-22 08:42:42 UTC

What it is

A foundation model engineered for places where the cloud isn't welcome.

Caroline is the first model in the NeuratronLLM-Edge family. It runs an instruct LLM, a real-time computer-vision pipeline and a continual-learning loop on a single edge device — under a hard 60 W envelope, with kernel-enforced isolation, and with every number on this page traceable to a single forensic validation run.

Fully air-gapped

Inference, retrieval, tokenization and adaptation all execute on the device. The runtime is launched inside a Linux network namespace; 4 of 4 external egress probes are blocked by the kernel.

Hybrid neuromorphic

A proprietary multi-layer neuromorphic stack wraps the public base LLM. A sparse attention path prunes the dense pair count by an order of magnitude. Internal architecture withheld under NDA.

Adapts on-device

A bounded, integer-quantized update rule recovered 113.71% of the accuracy lost to a controlled distribution drift in 36 steps. Update footprint: 576 bytes — under the 1 KB target.

Baseline

Built on a public, US-origin base.

Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary Neuramorphic stack lives on top. Base model identity, version and provenance are released under NDA.

Base model topology

Public class

FamilyOpen-weights instruct LLM

OriginUS-licensed

Class~4 B parameters

AttentionGrouped-query (GQA)

Context window4 K tokens

Position encodingRoPE

Precision (validation)FP16-native

Loaded parameters~4.2 B

Weights on disk~8 GB

Detailed specUnder NDA

Neuramorphic stack

Proprietary · IP-safe

Application surface

Field-service UI · offline build · streaming I/O

Agentic orchestration

Tool-calling · reasoning traces · local memory

Local retrieval

Vector store + reranker — fully on-device

Adaptation layer

Bounded sub-KB on-device update rule (NDA)

Sparse activation engine

Proprietary attention path (NDA)

Quantization map

Expert-aware allocation (NDA)

Hybrid neuromorphic kernel

Architecture withheld (NDA)

Validation host

Jetson AGX Orin · Linux netns · MAXN

Air-gap enforcement

Network-isolated by the kernel. Not by promise.

The inference process is launched inside a Linux network namespace created withunshare -rn(CLONE_NEWUSER | CLONE_NEWNET). Inside that namespace, four representative external destinations are probed; all four egress attempts are rejected by the kernel. Local UNIX-socket IPC continues to work — confirming isolation is network-only and does not break local service wiring. No root privileges required.

Status

ENFORCED

Egress attempts

4 of 4 blocked

Local IPC

OK · positive control

Setup latency

65.79 ms

Namespace

user + net

Requires root

Target	Reached	Kernel response
1.1.1.1:443	NO	OSError [Errno 101] Network is unreachable
8.8.8.8:53	NO	OSError [Errno 101] Network is unreachable
huggingface.co:443	NO	gaierror [Errno -3] Temporary failure in name resolution
api.openai.com:443	NO	gaierror [Errno -3] Temporary failure in name resolution
local UNIX socket (positive control)	YES	—

Live inference server

Real numbers. Single Jetson AGX Orin 64 GB.

All throughput numbers come from the same REST endpoint that ships in the field. The validation run uses FP16-native execution (no quantization) for a clean, deterministic baseline. The proprietary mixed-precision stack is available, configurable and validated separately under NDA.

LLM mean throughput

0tok/s

min 9.66 · max 9.78

Per-request latency p95

mean 8.44 s · greedy decode

Vision FPS

0fps

YOLOv8n · 30 iterations

Vision latency p95

0ms

p99 37.62 ms · 4.5 det/frame

GPU utilization

concurrent LLM + vision

Resident GPU memory

0GB

LLM + vision co-resident

Load time

cold start · disk → GPU

Endpoint

http://127.0.0.1:8001 · LAN-only

FP32-safe norms

0layers

2 modules promoted · 64 FP16 tensors

Real-time vision

Concurrent perception. Same GPU. Same envelope.

A public YOLOv8n detector runs concurrent with the LLM as a reference vision pipeline. Detection counts are real model outputs at the standard confidence threshold — no detection synthesis is performed.

Vision pipeline

Reference · public

ModelYOLOv8n

Parameters3,151,904

Weights size6.25 MB

SHA-256 (first 16)f59b3d833e2ff32e…

Iterations30

Latency p50 / p95 / p9935.66 / 36.7 / 37.62 ms

Mean FPS24.83

Detections / frame (mean)4.5

Resident GPU memory28.3 MB · peak 169.9 MB

Verdict

Pass

YOLOv8n real inference at 24.8 FPS (p95 36.7 ms) on the Jetson Orin GPU, returning 4.5 detections per frame on real reference imagery — running concurrent with a 4 B-parameter LLM at full throughput.

The reference pipeline is intentionally a public model so that independent reviewers can re-run it against the published SHA-256 anchor.

On-device adaptation

Recovers from drift. Without the round-trip.

A controlled distribution-drift experiment was run on a held-out classification task on the validation device. A bounded, low-footprint update rule (proprietary; details under NDA) was applied iteratively from the post-drift state. Below is what a deployment site would observe.

Baseline

Post-drift

Recovered

Recovery vs baseline

Steps to 90% baseline

0steps

from post-drift state

Steps to 95% baseline

0steps

full recovery threshold

Per-update footprint

≤ 1 KB target envelope

Total recovery steps

0steps

bounded INT update rule

Status · FULLY-RECOVERED. The on-device adaptation rule recovered 113.71% of the accuracy lost to the controlled drift, reaching 95% of the pre-drift baseline within 36 on-device steps. Update footprint stayed within the sub-KB envelope. Adaptation rule architecture, rank, learning rate, optimizer and update bounds are withheld and available under NDA.

Proprietary IP layers

Validation status. Architecture withheld.

The table below reports only PASS / FAIL validation status and a sanitized verdict for each proprietary IP layer. Layer-internal parameters (firing rates, state-space spectra, fusion mixture coefficients, sparse-attention thresholds, adaptation rule rank / bounds / learning rate / optimizer, and per-tensor quantization bit allocations) are deliberately withheld and released only under a mutual NDA covering the corresponding patent filings.

Kernel layer A

PASS

Sparse activation pattern validated within target operating regime; homeostatic control verified active. Internal mechanism, parameters and operating set-points withheld.

Kernel layer B

PASS

Continuous-time evolution numerically stable across the validated input range; output is finite and bounded. Initialization scheme and spectral parameters withheld.

Fusion core

PASS

Cross-modal fusion produces finite, well-conditioned outputs; mixture is non-degenerate and asymmetric as designed. Mixture coefficients and routing math withheld.

Sparse attention path

PASS

Proprietary attention path preserves numerical correctness while reducing the dense pair count by an order of magnitude. Pruning thresholds and top-k parameters withheld.

Adaptation rule

PASS

Bounded weight update enforced at every step; adaptation footprint kept within the target sub-KB envelope. Rank, bounds, learning rate, optimizer and update mechanism withheld.

Quantization map

AVAILABLE

Expert-aware mixed-precision allocation map is implemented and reproducible from configuration; this validation run executed in FP16-native mode for a deterministic baseline. Per-tensor bit allocations, protected substrings and routing rules withheld.

Energy & thermal envelope

Inside the platform envelope. By a wide margin.

Power is estimated from a labelled utilization model (tegrastats utilization × NVIDIA Orin TDP envelope) over an 8-second concurrent workload of 180 iterations and 0 errors. On this OS image the on-board INA3221 sysfs interface requires root; the reviewer should treat power figures as utilization-anchored estimates rather than direct rail measurements.

Mean power

peak 45.08 W

Platform envelope

published TDP

GPU utilization

mean across 8 s

CPU utilization

GPU-dominant load

Junction temp (mean)

0°C

peak 42.06 °C

RAM in use

0GB

mean = peak (steady)

Sustained-load stability

60 seconds. 24 requests. Zero errors.

Status

0STABLE

soak verdict

Duration

continuous workload

Errors

24 requests served

Latency mean

0ms

warm decode

Latency p95

0ms

warm decode

Latency drift 1st ↔ 2nd half

thermal- and load-stable

Where Caroline matters

For environments where the cloud is the wrong answer.

Sovereign data, hard latency budgets, regulated facilities, intermittent or hostile connectivity. Caroline is built for the operating contexts where every other LLM architecture quietly assumes a backbone that doesn't exist.

Sovereign defense & intelligence edge

Disconnected reasoning on field hardware. Air-gap is enforced by the kernel, not by policy. Adaptation runs on the device with bounded sub-KB updates.

Disconnected opsSovereignBounded update

Industrial vision + reasoning

A single edge HPC node runs a real-time detector and the reasoning LLM concurrently at full envelope. Acts as both the perceptual layer and the explanation layer for the line.

Line-sideVision + LLMConcurrent

Regulated on-prem environments

Run a capable instruct LLM behind the firewall with verifiable network isolation and a chain-of-custody artefact set for compliance review. Sector-specific deployments shared under NDA.

On-premComplianceAuditable

Hard-latency edge control

Sub-second reasoning co-located with the asset, under a hard 60 W envelope, with no cellular round-trip. Target verticals released under NDA.

Edge HPCLow-latencyNo round-trip

Deployment tiers

Where Caroline runs.

Edge HPCActive

Jetson AGX Orin 64 GB

Air-gapped LLM + vision + agent + on-device adaptation

Edge

Jetson Orin Nano 8 GB

Offline assistant, vision triage (distilled profile)

Workstation

Discrete Ampere/Ada GPU

Pre-deployment soak and regression suite (internal)

Training

Internal cluster

Model factory — not customer-facing

Chain of custody

Every figure on this page is anchored.

All numbers are derived from a single forensic run of the validation pipeline. The artefacts below carry SHA-256 content hashes that any reviewer can re-compute. Full 64-character digests are released under NDA for independent verification.

Artefact	Size	SHA-256 (first 16)
Forensic JSON (raw measurements)	12.1 KB	72efb203fec2784f…
Forensic Markdown (human-readable)	18.5 KB	7cb9d1c00a88f32e…
Vision model weights (YOLOv8n, reference)	6.25 MB	f59b3d833e2ff32e…

Validation timestamp (UTC)

2026-04-22 08:42:42

Validation host

ubuntu · Jetson AGX Orin

Document issued (UTC)

2026-04-22 09:09:46

Classification

CONFIDENTIAL · IP-safe · NDA

FAQ

What reviewers ask first.

Caroline is the first model in the NeuratronLLM-Edge family: a 4-billion-parameter-class air-gapped hybrid neuromorphic LLM that runs fully on a single NVIDIA Jetson AGX Orin, with an integrated real-time vision pipeline and an on-device adaptation loop. No data leaves the device.

The inference process is launched inside a Linux network namespace (unshare -rn). Four representative external destinations (1.1.1.1:443, 8.8.8.8:53, huggingface.co:443, api.openai.com:443) are probed and 4 of 4 egress attempts are rejected by the kernel. Local UNIX-socket IPC remains functional as a positive control. No root privileges required.

On Jetson AGX Orin at FP16-native baseline: 9.74 tokens/second mean (min 9.66, max 9.78), per-request latency mean 8437.4 ms and p95 10249.0 ms. The concurrent vision pipeline (YOLOv8n) sustains 24.83 FPS with p95 latency 36.7 ms.

In a controlled drift experiment, baseline accuracy 67.2% dropped to 18.8% after drift, then recovered to 73.8% (113.71% of the original baseline) within 36 on-device steps. Per-update footprint is 576 bytes — under the 1 KB target.

Under concurrent LLM + vision workload at 98.25% GPU utilization: estimated mean power 44.52 W, peak 45.08 W — inside the published 60 W platform envelope. Junction temperature mean 40.92 °C, peak 42.06 °C.

No. Caroline wraps a publicly-licensed, US-origin open-weights instruct base model so its public properties remain independently verifiable. The proprietary IP — hybrid neuromorphic kernel, sparse attention path, on-device adaptation rule and expert-aware quantization mapping — sits as a stack on top of the base model and is the subject of pending USPTO filings. Base model identity and version are released under NDA.

Sovereign defense and intelligence, regulated on-prem environments, industrial vision + reasoning and any operating context where data sovereignty, low latency and zero egress are non-negotiable. Vertical-specific deployments and references are released under NDA.

Every figure is derived from a single forensic run of the validation pipeline. Each artefact (forensic JSON, forensic Markdown, vision weights) carries a SHA-256 content hash that any reviewer can re-compute. Full digests are released under NDA.

Patent pending

The hybrid neuromorphic kernel, the on-device adaptation layer, the sparse attention path and the expert-aware quantization mapping that wrap the public base model are proprietary to Neuramorphic, Inc. and are protected by USPTO patent applications.

Implementation details, calibration parameters, adaptation dynamics and bit-allocation maps are confidential and shared under NDA. Contact legal@neuramorphic.ai.

NeuraTensor SDK

NeuratronLLM-Edge4B

NeuratronLLMEdge4B

A foundation model engineered for places where the cloud isn't welcome.

Built on a public, US-origin base.

Base model topology

Neuramorphic stack

Network-isolated by the kernel. Not by promise.

Real numbers. Single Jetson AGX Orin 64 GB.

Concurrent perception. Same GPU. Same envelope.

Vision pipeline

Verdict

Recovers from drift. Without the round-trip.

Validation status. Architecture withheld.

Inside the platform envelope. By a wide margin.

60 seconds. 24 requests. Zero errors.

For environments where the cloud is the wrong answer.

Where Caroline runs.

Every figure on this page is anchored.

What reviewers ask first.