Architecture Notes · Neuramorphic Research

Neuromorphic LLM on NVIDIA Jetson AGX Orin

By Peter Fulle ·

The NVIDIA Jetson AGX Orin is the smallest piece of silicon today on which a serious foundation model can run without the cloud. It exposes 64 GB of unified LPDDR5, a 2048-core Ampere GPU and 64 third-generation Tensor cores in a 60 W envelope. At Neuramorphic we use it as the reference target for Caroline (NeuratronLLM-Edge 4B) — a 4B-parameter neuromorphic LLM designed from day one for air-gapped, on-device inference.

Why a neuromorphic LLM on Jetson, not a vanilla transformer

A dense 4B transformer fits in Jetson's memory, but pays for every token with full attention over the entire context. That wastes the one thing edge silicon never has enough of: energy per token. The neuromorphic approach replaces that uniform compute pattern with three substitutions:

  • Sparse activation — only the subset of neurons whose state crosses threshold contributes to the output of a given step.
  • State-space backbone — linear recurrence with fixed per-token cost replaces quadratic attention for the long context path.
  • Event-driven inference — when the input stream contains no salient change, the model literally does no work.

Engineering trade-offs that actually matter on Orin

  • Memory budget. A 4B parameter checkpoint at INT4 lands around 2.2 GB. KV-cache and activations dominate the remaining envelope, which is why the state-space backbone is not optional — it caps recurrent state at a constant per-layer.
  • Throughput vs. latency. On Orin we measure Caroline at sustainable single-stream latency suitable for conversational use, while keeping the device thermally stable in its 30 W mode for unattended deployments.
  • Air-gapped operation. The whole model, tokenizer and runtime ship as a single artifact loaded by NeuraTensor. No network access is required, no telemetry leaves the device, and the same binary runs in environments where cloud LLMs are explicitly forbidden.

What this unlocks

Running a real foundation model on a single Jetson AGX Orin moves the deployment story from “your data goes to a US hyperscaler” to “the model lives in the same room as the data”. That matters for defense, energy, healthcare, manufacturing and any sovereign-AI buyer who cannot — legally or operationally — call out to a third-party API.

For the underlying neuroscience and the architecture rationale, see Neuromorphic Computing Foundations. For a comparison between the SNN and SSM building blocks, see Spiking Neural Networks vs. State-Space Models.

Cite as: Fulle, P. (2026). Neuromorphic LLM on NVIDIA Jetson AGX Orin — Architecture Notes. Neuramorphic Research. https://neuramorphic.ai/research/neuromorphic-llm-on-jetson-agx-orin

← Back to ResearchSee Caroline (NeuratronLLM-Edge 4B) →