Running LLMs on Jetson: OpenClaw Benchmark

Published On: April 8, 2026Categories: Large Language Models

Running large language models (LLMs) on the Edge AI side is no longer just possible , it’s becoming practical. NVIDIA Jetson series devices, in particular, hold a significant position in this space by offering high performance with low power consumption.

In this article, we’ll cover:

My experience running models on Jetson devices with OpenClaw
Performance of different backends (vLLM, Ollama, llama.cpp)
The impact of quantization and model selection
Tool calling capabilities of models
And why the Jetson platform matters

Model Deployment with OpenClaw

OpenClaw simplifies the deployment process by enabling models to be spun up with a single command. It provides multi-backend support, allowing different inference engines to be used within the same unified framework, while its Jetson compatible architecture ensures smooth operation on edge devices. In addition, it makes benchmarking straightforward, enabling quick comparisons across models and backends.

In this setup, three different backends were used. vLLM stands out with its high throughput and server-side optimizations, Ollama offers ease of use and fast setup for rapid experimentation, and llama.cpp provides flexibility with its CPU/GPU hybrid design and strong support for quantization, making it well-suited for resource-constrained environments.

Tool Calling and Model Selection

Why Is It Necessary?
OpenClaw’s architecture isn’t built solely on “text generation.” The model is expected to:

Generate function calls
Return structured output (JSON, etc.)
Understand tool invocation formats

Therefore, models without tool calling support create significant limitations in real-world usage scenarios within OpenClaw. Such models cannot trigger tools, cannot integrate into agent pipelines, and in practice remain just a “chat model.” This is a major disadvantage that limits the system from a functional perspective.

The goal of this benchmark was not only to measure raw speed, but also to evaluate models that can be used in real agent and tool-driven systems. For this reason, models with tool calling support, strong instruction-following capabilities, and the ability to generate structured output were preferred in model selection.

What Does It Do?

API calling (e.g., weather, database)
Computation or code execution
File/system operations
Building agent-based systems

In short: LLM becomes not just a conversational system, but an action-taking system .

Benchmark Results

📊 All Results

Backend	Product	Model	Params	Tok/s	Quant
vLLM	AGX Thor	gemma-4-31B-it	31B	2.61	bf16
vLLM	AGX Thor	Nemotron-30B	30B	7.76	–
vLLM	AGX Thor	Qwen3.5-27B	27B	2.35	–
Ollama	AGX Thor	Qwen3.5-35B-A3B	35B	38	Q5_K_M
Ollama	AGX Thor	Qwen3 30B A3B	30B	34	–
Ollama	AGX Orin	Falcon-H1 3B	3B	29	FP16
Ollama	AGX Orin	Qwen3.5 4B	4B	21	–
Ollama	AGX Orin	Phi-3.5 MoE 42B	42B	8	–
Ollama	Orin NX 8G	Qwen3.5 9B	9B	14	–
Ollama	Orin NX 8G	Nemotron Nano 9B v2	9B	14	–
llama.cpp	DGX Spark	Gemma-4 31B bf16	31B	3.7	bf16
llama.cpp	DGX Spark	Gemma-4 31B int8	31B	6.5	AWQ int8
llama.cpp	DGX Spark	Gemma-4 31B int4	31B	10.6	AWQ int4
llama.cpp	DGX Spark	Gemma-4 26B-A4B MoE	26B	23.7	bf16

Conclusion

Running LLMs at the edge with Jetson is now realistic
Tool calling support is the critical feature that elevates these systems to real agent architectures
Right backend + quantization = maximum efficiency

Running LLMs on Jetson: OpenClaw Benchmark

Model Deployment with OpenClaw

Tool Calling and Model Selection

Benchmark Results

Conclusion

Related Posts

How to Run OpenClaw on Cordatus.ai

How to Install OpenClaw on AGX Orin/AGX Thor ?

How to Install OpenClaw on AMD64 Systems ?

How to Install OpenClaw on NVIDIA Jetson Nano & Orin NX/Nano ?