Rust Is Eating Python in Generative AI

Article cover image

Python is not going anywhere.

But it is being outrun.

Badly.

And by May 2026, the evidence is no longer a whisper.

It is a roar.

Hold on, Thomas,” you might say.

Python runs PyTorch.

Python runs vLLM.

Python runs the entire Hugging Face ecosystem.

What exactly are you talking about?

Fair enough.

And I hear you.

But here is the distinction that matters:

Python is brilliant for AI research and prototyping.

For the servers, the gateways, the tokenizers, the agent runtimes, the inference engines that serve millions of real users every single day?

Rust is taking over.

And the data from Q1 2026 doesn’t just suggest it.

It screams it.

Read on!


The GIL Is Not a Feature, But a Prison.

Every Python developer who has ever pushed a model to production has eventually hit it.

The infamous Global Interpreter Lock.

The GIL prevents true parallel execution of Python bytecode across multiple CPU threads.

For CPU-bound AI tasks — token processing, attention computation, multi-agent orchestration — this is a hard architectural ceiling.

It doesn’t matter if your inference server has 64 cores.

Python uses one at a time for compute-heavy work.

One!

Python 3.14 (October 2025) stabilised GIL-free builds and made them genuinely production-ready for the first time.

And Python 3.15 — currently in beta, targeting October 1, 2026 — goes further still:

It ships a stable ABI for free-threaded CPython (PEP 803), meaning C extension authors can finally compile once and support multiple free-threaded Python versions without rebuilding from scratch.

Progress.

Genuine, hard-won progress.

But here is where I have to be brutally honest with you.

Free-threaded builds in Python 3.15 still carry ~9% single-thread overhead on Linux x86-64 and ~6% on macOS ARM64 versus the GIL-enabled interpreter on the same version.

Every PyObject in free-threaded Python is now 32 bytes instead of 16 to support safe concurrent access.

The garbage collector switched from generational to non-generational with stop-the-world pauses.

And the silent killer?

Importing any C extension without Py_mod_gil support silently re-enables the GIL.

Your free-threaded Python becomes GIL-enabled the moment you import an unsupported package — and you might not even notice!

Making free-threaded the default has no PEP, no timeline, and a realistic arrival date of 2028–2029 at the earliest.

The GIL is removable.

The GIL is not removed.

Those are different things.

Rust has no GIL.

Rust’s ownership model guarantees thread safety at compile time — not at runtime, not with a lock, not with a caveat buried in a package’s setup.py.

The compiler simply refuses to compile code that would cause a data race.

That is the number one reason why Rust is the future of programming.

This is truly fearless concurrency—and Rust was built for it from the ground up.

And for agentic AI systems — where you’re coordinating dozens of parallel tool calls, model requests, and streaming outputs simultaneously — fearless concurrency is not a luxury.

It’s mandatory.


The Numbers Are Decisive

Rust programs run 10x–80x faster than equivalent Python programs in controlled benchmarks.

At the AI infrastructure layer, the gap is even more staggering.

Tokenization

Hugging Face’s tokenizers library is written in Rust.

The result? A 43x speed increase on the SQUAD2 dataset.

1 GB of text processed in under 20 seconds on a standard server CPU.

That speedup compounds when tokenization runs hundreds of millions of times per day.

Data Pipelines

Polars—the Rust-native DataFrame library—processes 10 million rows in 0.89 seconds, versus 2.37 seconds for pandas.

The advantage ranges from 2.6x on aggregations to 4.6x on filtering 1 GB datasets.

Hours of saved compute time. Every day. In production.

AI Gateways

LangDB built their entire AI gateway in Rust and achieved 75ms latency at 15,000 simultaneous connections.

Python simply cannot touch that without some serious acrobatics.

Embeddings

Baseten’s June 2025 benchmark showed a 12x throughput improvement by shifting embedding workloads to a Rust + PyO3 backend.

At 2 million parallel inputs, Rust finished in 71 seconds.

Python took over 15 minutes.

Same hardware.

Same network.

Different language.

Agent Runtimes

Independent benchmarks of the emerging Rust agent frameworks—Rig, AutoAgents, and OpenFang—show 5x memory reduction and 25–44% latency improvements over Python equivalents.

Is nobody else talking about this?


Why Rust Can Beat Even C and C++

This is the part that stops people cold.

C++ has had decades of compiler optimizations.

It runs as close to bare metal as anything alive.

And Rust matches it—and in specific conditions, beats it.

How?

Rust’s zero-cost abstractions—iterators, closures, traits—compile to tight native loops with zero extra overhead.

Both Rust and C++ use LLVM and get similar raw optimization passes.

Parity so far.

But here’s the divergence.

In C++, memory safety is your problem.

Every pointer dereference, every delete, every shared mutable reference is a potential bug.

A potential CVE.

A potential performance regression from defensive sanitizers applied after the fact.

Rust guarantees memory safety without runtime checks, while C++ requires code review, static analysis, and runtime sanitizers.

Those sanitizers are not free.

They cost CPU.

They cost memory.

They cost latency.

In real-world HTTP server benchmarks, a Rust server handled 43% more requests per second while using 18% less memory than its C++ equivalent.

In 2026, a comparison of concurrency models showed Rust-based LLM inference systems achieving 45% lower latency and 60% fewer race conditions compared to C++ implementations.

And because Rust has no garbage collector, there are no GC pauses.

Ever.

The memory latency profile is deterministic.

For real-time AI inference, deterministic latency is worth more than average-case throughput.

In a multi-core, multi-GPU world built for generative AI, Rust is structurally better-positioned than C++ ever was.


Q1 2026: The GitHub Inflection Point

Something shifted in the first quarter of 2026.

And the GitHub star data captured it exactly.

In Q1 2026, a pattern emerged across GitHub’s fastest-growing AI agent repositories: they were written in Rust.

Seven significant Rust AI agent repos launched in under 60 days.

Where 2023–2024 Rust AI tools averaged 25 stars per day,closures, and the 2026 wave averages 404 stars per day.

A 16x increase.

Every single top repo in the wave: Rust.

This is the same language stratification that databases and web infrastructure went through a decade ago.

Python is the scripting layer.

Rust is the runtime layer.

The line is being drawn right now—in GitHub star data, in benchmark results, and in engineering team decisions at companies you use every day.


The Open-Source Rust AI Ecosystem Today (May 2026)

Candle, Burn, mistral.rs, Polars, PyO3, Rig, Hipfire all have crossed production-readiness thresholds that weren’t there two years ago.

Here’s the full landscape as of today:

Candle (Hugging Face)

  • Minimalist Rust ML framework for serverless inference.
  • No 800MB LibTorch binaries.
  • Tiny builds.
  • Instant cold starts.
  • Hugging Face itself is betting on Rust for production inference paths.

mistral.rs

  • Blazing-fast LLM inference engine in pure Rust. GGUF quantization, speculative decoding, vision, audio, and full multimodal pipelines.
  • 6,300+ GitHub stars as of mid-2025 and growing aggressively.
  • Built on Candle, exposes both Rust and Python SDKs.

Burn (Tracel AI)

  • Next-generation deep learning framework in pure Rust.
  • Powered by CubeCL—which JIT-compiles Rust GPU kernels to CUDA, Metal, ROCm, Vulkan, and WebGPU from a single codebase.

Hipfire

  • Brand-new Rust-native AMD RDNA inference engine that beats llama.cpp on consumer AMD GPUs.
  • Delivers 59 tokens per second on Qwen3-8B from a consumer RX card.
  • Pure Rust, no ROCm headaches.
  • This is what a motivated open-source developer can ship when Rust is the foundation.

Rig

  • Modular LLM application framework in Rust.
  • Clean abstractions for building production-grade AI apps without touching Python.

Polars

  • The Rust-native DataFrame library systematically replacing pandas in high-throughput pipelines.
  • NVIDIA RAPIDS has integrated GPU acceleration directly into Polars.
  • It is now one of the fastest data processing tools in any language, in any ecosystem.

PyO3

  • The bridge that makes Rust and Python cooperate.
  • Rust for the compute.
  • Python for the interface.
  • This is exactly how Hugging Face ships tokenizers today – and how the entire industry is quietly migrating hot paths.

And according to the JetBrains State of Python 2025 report, Rust usage for Python binary extensions jumped from 27% to 33% in a single year.

The Python ecosystem is being rewritten in Rust.

One critical path at a time.


Rust-First GPU Programming: The Final Wall Has Fallen

GPU programming was Python’s last uncontested stronghold.

Not anymore.

Rust GPU compiles Rust directly to SPIR-V for Vulkan – any GPU, any vendor.

Rust CUDA compiles Rust to NVVM IR for NVIDIA hardware – rebooted and actively maintained in 2025.

CubeCL JIT-compiles portable Rust GPU kernels across CUDA, ROCm, and WGSL with automatic vectorization and hardware autotune built in.

And then – on May 9, 2026 – NVIDIA themselves shipped cuda-oxide 0.1.

The first official Rust-to-CUDA compiler from NVLabs.

It takes standard Rust code, annotated with #[kernel], and compiles it directly to PTX, NVIDIA’s GPU assembly format.

No DSLs.

No FFI glue.

No C++ device code.

Just Rust.

It achieved 868 TFLOPS on Blackwell B200 GPUs in early benchmarks.

This is NVIDIA Labs, the people who own the GPU market, shipping a Rust-first CUDA compiler as an official project.

If that doesn’t tell you which direction the wind is blowing, nothing will.

And the Rust adoption story has now gone far beyond AI:

Rust entered the Linux kernel permanently.

Android’s codebase is now 77% memory-safe code.

Windows has migrated 188,000+ lines from C++ to Rust.

The White House named Rust explicitly in its memory-safe language mandate.

CISA and the FBI followed.

My dear fellow developers, Rust is the future, being written today.


Where Python Still Wins; And That’s Perfectly Fine

Python still dominates research, training, experimentation, and model development.

PyTorch is not going away.

HuggingFace Transformers is not going away.

The winning architecture in 2026 is not a full Python rewrite.

It’s smarter than that.

It’s a clean split:

  • Rust for the inference server, the gateway, the agent runtime, the tokenizer, the data pipeline.
  • Python for the research lab, the training loop, the experiment notebook.

If you are building Generative AI systems that need to serve real users—

If you care about latency, throughput, and infrastructure cost at scale—

If you are tired of the GIL, of GC pauses, of Python’s runtime overhead eating your cloud bill—

Rust is your answer.

The ecosystem is mature.

The benchmarks are decisive.

The governments are mandating it.

NVIDIA is shipping Rust-first GPU compilers.

This is not hype.

This is the direction of AI infrastructure for the next decade.

And from the bottom of my heart, the developers who master Rust right now:

While the ecosystem is still young enough to become a genuine domain expert in but mature enough to build real production systems with:

Are going to be the engineers who define what AI infrastructure looks like in 2030.

Dream big.

Start today.

Watch this space.

All the very best!


References

  1. Rust Is Quietly Replacing Python in AI Infrastructure (Groundy, March 2026) https://groundy.com/articles/rust-quietly-replacing-python-ai/
  2. The Rust Shift: GitHub AI Agent Infrastructure Q1 2026 (OSS Insight, April 2026) https://ossinsight.io/blog/rust-ai-agent-infrastructure-2026
  3. Why Agentic AI Developers Are Moving from Python to Rust (Red Hat Developer, Sept 2025) https://developers.redhat.com/articles/2025/09/15/why-some-agentic-ai-developers-are-moving-code-python-rust
  4. Rust vs Python Benchmarks: 10x–80x (Ongraph) https://www.ongraph.com/rust-vs-python/
  5. Why We Built an AI Gateway in Rust (LangDB) https://blog.langdb.ai/why-we-built-an-ai-gateway-in-rust-a-performance-centric-decision
  6. 12x Higher Embedding Throughput with Rust (Baseten, June 2025) https://www.baseten.co/blog/your-client-code-matters-10x-higher-embedding-throughput-with-python-and-rust/
  7. Rust vs C++ Performance: Modern Take (The New Stack, Oct 2025) https://thenewstack.io/rust-vs-c-a-modern-take-on-performance-and-safety/
  8. Hipfire: Rust-Native AMD Inference Engine Beats llama.cpp (Startup Fortune, April 2026) https://startupfortune.com/hipfire-is-a-rust-native-amd-inference-engine-that-beats-llamacpp-on-consumer-gpus/
  9. NVIDIA CUDA-Oxide 0.1 Released — Rust to PTX Compiler (Phoronix, May 8, 2026) https://www.phoronix.com/news/NVIDIA-CUDA-Oxide-0.1
  10. Python vs Rust 2026: Governments Mandate Memory-Safe Languages (Tech Insider) https://tech-insider.org/python-vs-rust-2026/

Leave a Reply