Blackwell ready, pure Zig (0.15.2+) bindings to the NVIDIA CUDA Driver API
Dynamic loading of libcuda.so, clean high-level wrappers, and graceful stubs for non-CUDA environments.
No static linking, no CUDA toolkit required at runtime.
Tested on Blackwell (sm_120) — ready for low-level GPU programming, kernel launching, and basic BLAS in Zig.
git clone https://github.com/Aiurion/zigcuda.git && cd zigcuda
zig build runExample output:
=== ZigCUDA CLI Diagnostic Tool ===
INFO: cuInit succeeded
✓ CUDA Driver Initialized
✓ Device Count: 1
[GPU 0] NVIDIA RTX PRO 6000 Blackwell Workstation Edition
├─ Compute: 12.0
├─ SMs: 120
└─ VRAM: 95.59 GB
- Dynamic Driver Loading – Works on Linux native and WSL2, multiple symbol resolution paths
- Clean Zig API – Context, device, memory, streams, events, module loading, kernel launch
- Graceful Stubs – Compiles and runs basic checks without a GPU
- Zero External Dependencies – Only needs NVIDIA driver at runtime
- Test Coverage – 97 passing tests across core, bindings, and integrations
- Easy Library Usage – Single
@import("zigcuda")with init/deinit pattern
| Component | Status | Notes |
|---|---|---|
| Driver Loading | Complete | Dynamic + extensive fallbacks |
| Core API (memory, streams, contexts) | Complete | Full wrappers, async support |
| Kernel Launch | Complete | cuLaunchKernel + legacy fallback |
| cuBLAS Integration | Partial | Basic handle + common ops working |
.dependencies = .{
.zigcuda = .{
.url = "git+https://github.com/Aiurion/zigcuda.git#v0.0.1",
// Run `zig build` once to fill in hash
},
},const zigcuda_dep = b.dependency("zigcuda", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("zigcuda", zigcuda_dep.module("zigcuda"));Basic device enumeration:
const std = @import("std");
const zigcuda = @import("zigcuda");
pub fn main() !void {
try zigcuda.bindings.init();
const device_count = try zigcuda.bindings.getDeviceCount();
std.debug.print("Found {d} CUDA device(s)\n", .{device_count});
for (0..@min(device_count, 3)) |i| {
const props = try zigcuda.bindings.getDeviceProperties(@intCast(i));
std.debug.print("Device {d}: {s}\n", .{
i, @as([:0]const u8, @ptrCast(&props.name)),
});
}
}Kernel launch example:
const std = @import("std");
const zigcuda = @import("zigcuda");
pub fn main() !void {
try zigcuda.bindings.init();
// Load compiled CUDA binary (.cubin file)
const filename: [:0]zigcuda.bindings.@"c_char" = "my_kernel.cubin";
const module = try zigcuda.bindings.loadModule(filename);
var kernel_name_buf = "my_kernel".*;
const c_kernel_name: [:0]zigcuda.bindings.@"c_char" = @ptrCast(&kernel_name_buf);
const kernel_func = try zigcuda.bindings.getFunctionFromModule(module, c_kernel_name);
// Launch with correct parameter count (grid_dim_z is required!)
const empty_params: []?*anyopaque = &.{};
try zigcuda.bindings.launchKernel(kernel_func,
1, // grid_dim_x
1, // grid_dim_y
1, // FIXED: grid_dim_z (cannot be 0!)
32, // block_dim_x
1, // block_dim_y
1, // block_dim_z
0, // shared_mem_bytes
null, // stream
empty_params // kernel parameters
);
std.debug.print("Kernel launched successfully!\n", .{});
}This IS:
- A solid CUDA Driver API wrapper for Zig
- Ready for writing and launching kernels, memory management, streams/events
- Usable today for low-level GPU work and experimentation
This is NOT:
- A full ML framework
- Complete high-level tensor ops
- Optimized inference engine
- v0.0.x – Core polish and further validation
zig build run test # Run full suite (97 tests)
zig build run # Diagnostic toolSupported Platforms:
- Linux (x86_64) – Fully tested
- WSL2 – Working with dual-context handling
Open issues for bugs/features.
MIT (see LICENSE file)
ZigCUDA gives you real CUDA access in pure Zig with minimal overhead. The foundation is ready – start building GPU code today.