WebGPU & ParaLLEl-RDP
The moonshot: bitexact N64 rendering in the browser via WebGPU compute shaders.
The Accuracy Problem
Current browser N64 emulators all use HLE (High-Level Emulation) for graphics:
HLE works for ~90% of games using standard Nintendo microcodes. But:
- Games with custom microcodes (Rogue Squadron, World Driver Championship) break
- Texture filtering doesn't match real hardware
- Z-buffer and blending have subtle inaccuracies
- No way to support widescreen hacks properly
LLE with ParaLLEl-RDP executes the actual RSP microcode and uses GPU compute shaders to render exactly as real hardware would — including all the weird edge cases.
What Is ParaLLEl-RDP?
Created by Themaister, ParaLLEl-RDP is a Vulkan compute shader implementation of the N64's RDP (Reality Display Processor). It achieves bitexact accuracy with the Angrylion software renderer (the gold standard for N64 accuracy) while running at full speed on modern GPUs.
Used by:
- Ares (most accurate N64 emulator)
- gopher64 (Rust N64 emulator)
- simple64 (archived)
- RetroArch (ParaLLEl N64 core, native mode)
Why It Can't Run In Browsers Today
ParaLLEl-RDP requires Vulkan 1.1 with:
- Compute shaders
- Storage buffers (read/write)
- Subgroup operations
- 32-bit atomics on storage buffers
- Multiple compute dispatches per frame
Browsers don't expose Vulkan. They expose WebGL (no compute) or WebGPU (has compute!).
WebGPU: The Bridge
WebGPU provides compute shaders that could theoretically run ParaLLEl-RDP's workload:
WebGPU Compute Capabilities
| Vulkan Feature (ParaLLEl-RDP needs) | WebGPU Equivalent | Status |
|---|---|---|
| Compute shaders | @compute shaders in WGSL | Supported |
| Storage buffers (read/write) | var<storage, read_write> | Supported |
| 32-bit atomics | atomicAdd, atomicMax, etc. | Supported |
| Subgroup operations | subgroupBroadcast, etc. | Partial (Chrome 133+) |
| Push constants | Bind groups (different but equivalent) | Supported |
| Multiple dispatches | computePass.dispatchWorkgroups() | Supported |
| Shared memory (workgroup) | var<workgroup> | Supported |
| Memory barriers | workgroupBarrier() | Supported |
What's Missing
| Vulkan Feature | WebGPU Status | Workaround |
|---|---|---|
| Full subgroup ops | Partial | Emulate with shared memory |
| 64-bit atomics | Not available | Split into 32-bit pairs |
| Descriptor indexing | Limited | Use bind group arrays |
| Timeline semaphores | No equivalent | Use mapAsync / onSubmittedWorkDone |
The Porting Challenge
ParaLLEl-RDP consists of ~30 GLSL compute shaders. They'd need conversion to WGSL:
// ParaLLEl-RDP GLSL (Vulkan)
#version 450
layout(local_size_x = 64) in;
layout(set = 0, binding = 0) buffer RDRAM { uint data[]; };
layout(set = 0, binding = 1) buffer Framebuffer { uint pixels[]; };
layout(push_constant) uniform Params {
uint num_primitives;
uint fb_width;
};
void main() {
uint idx = gl_GlobalInvocationID.x;
if (idx >= num_primitives) return;
// ... RDP rasterization logic ...
}
// Equivalent WGSL (WebGPU)
@group(0) @binding(0) var<storage, read_write> rdram: array<u32>;
@group(0) @binding(1) var<storage, read_write> framebuffer: array<u32>;
struct Params {
num_primitives: u32,
fb_width: u32,
}
@group(0) @binding(2) var<uniform> params: Params;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
if (idx >= params.num_primitives) { return; }
// ... RDP rasterization logic (same algorithm, different syntax) ...
}
The algorithm is the same — it's primarily a syntax translation with some API differences for resource binding and synchronization.
Architecture With WebGPU
WebGPU Browser Support (2026)
| Browser | Status | Version |
|---|---|---|
| Chrome | Shipped | 113+ (April 2023) |
| Edge | Shipped | 113+ |
| Safari | Partial | 18+ |
| Firefox | In progress | Behind flag |
~75% global support with Chrome + Edge. Rising fast.
For browsers without WebGPU, we fall back to the HLE WebGL2 renderer (same as current emulators). Users with WebGPU get pixel-perfect accuracy.
Why This Is Our Moat
If we port ParaLLEl-RDP to WebGPU, we become the only browser N64 emulator with hardware-accurate rendering. This:
- Enables games that HLE can't handle (custom microcodes)
- Provides pixel-perfect output matching real hardware
- Enables HD resolution upscaling (2x, 4x, 8x) with proper filtering
- Enables widescreen hacks (ParaLLEl-RDP supports them natively)
- Enables HD texture pack support (load high-res textures, apply in real-time)
Nobody else is attempting this. The technical barrier is high enough that it's unlikely to be replicated quickly.
Timeline Estimate
| Phase | Work | Timeframe |
|---|---|---|
| Research | Analyze ParaLLEl-RDP shader set | 2 weeks |
| Prototype | Port 3 core shaders to WGSL | 1 month |
| MVP | Basic triangle rasterization in WebGPU | 2 months |
| Feature complete | All RDP modes ported | 4-6 months |
| Optimization | Match native ParaLLEl-RDP speed | 2 months |
This is Phase 4 work — after SIMD, threading, and audio are solid. But it's the thing that makes this project legendary.
The wgpu Path (Alternative)
If we ever rewrite the core in Rust, we could use wgpu (Rust WebGPU implementation):
- Write shaders once → runs on Vulkan (desktop) AND WebGPU (browser)
- The same binary targets both native and WASM
- gopher64 could theoretically adopt this approach
This is the long-long-term path — a unified codebase that runs at native speed on desktop and near-native in browsers.