Skip to main content

WebGPU & ParaLLEl-RDP

The moonshot: bitexact N64 rendering in the browser via WebGPU compute shaders.


The Accuracy Problem

Current browser N64 emulators all use HLE (High-Level Emulation) for graphics:

HLE works for ~90% of games using standard Nintendo microcodes. But:

  • Games with custom microcodes (Rogue Squadron, World Driver Championship) break
  • Texture filtering doesn't match real hardware
  • Z-buffer and blending have subtle inaccuracies
  • No way to support widescreen hacks properly

LLE with ParaLLEl-RDP executes the actual RSP microcode and uses GPU compute shaders to render exactly as real hardware would — including all the weird edge cases.


What Is ParaLLEl-RDP?

Created by Themaister, ParaLLEl-RDP is a Vulkan compute shader implementation of the N64's RDP (Reality Display Processor). It achieves bitexact accuracy with the Angrylion software renderer (the gold standard for N64 accuracy) while running at full speed on modern GPUs.

Used by:

  • Ares (most accurate N64 emulator)
  • gopher64 (Rust N64 emulator)
  • simple64 (archived)
  • RetroArch (ParaLLEl N64 core, native mode)

Why It Can't Run In Browsers Today

ParaLLEl-RDP requires Vulkan 1.1 with:

  • Compute shaders
  • Storage buffers (read/write)
  • Subgroup operations
  • 32-bit atomics on storage buffers
  • Multiple compute dispatches per frame

Browsers don't expose Vulkan. They expose WebGL (no compute) or WebGPU (has compute!).


WebGPU: The Bridge

WebGPU provides compute shaders that could theoretically run ParaLLEl-RDP's workload:

WebGPU Compute Capabilities

Vulkan Feature (ParaLLEl-RDP needs)WebGPU EquivalentStatus
Compute shaders@compute shaders in WGSLSupported
Storage buffers (read/write)var<storage, read_write>Supported
32-bit atomicsatomicAdd, atomicMax, etc.Supported
Subgroup operationssubgroupBroadcast, etc.Partial (Chrome 133+)
Push constantsBind groups (different but equivalent)Supported
Multiple dispatchescomputePass.dispatchWorkgroups()Supported
Shared memory (workgroup)var<workgroup>Supported
Memory barriersworkgroupBarrier()Supported

What's Missing

Vulkan FeatureWebGPU StatusWorkaround
Full subgroup opsPartialEmulate with shared memory
64-bit atomicsNot availableSplit into 32-bit pairs
Descriptor indexingLimitedUse bind group arrays
Timeline semaphoresNo equivalentUse mapAsync / onSubmittedWorkDone

The Porting Challenge

ParaLLEl-RDP consists of ~30 GLSL compute shaders. They'd need conversion to WGSL:

// ParaLLEl-RDP GLSL (Vulkan)
#version 450
layout(local_size_x = 64) in;

layout(set = 0, binding = 0) buffer RDRAM { uint data[]; };
layout(set = 0, binding = 1) buffer Framebuffer { uint pixels[]; };

layout(push_constant) uniform Params {
uint num_primitives;
uint fb_width;
};

void main() {
uint idx = gl_GlobalInvocationID.x;
if (idx >= num_primitives) return;
// ... RDP rasterization logic ...
}
// Equivalent WGSL (WebGPU)
@group(0) @binding(0) var<storage, read_write> rdram: array<u32>;
@group(0) @binding(1) var<storage, read_write> framebuffer: array<u32>;

struct Params {
num_primitives: u32,
fb_width: u32,
}
@group(0) @binding(2) var<uniform> params: Params;

@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
if (idx >= params.num_primitives) { return; }
// ... RDP rasterization logic (same algorithm, different syntax) ...
}

The algorithm is the same — it's primarily a syntax translation with some API differences for resource binding and synchronization.


Architecture With WebGPU


WebGPU Browser Support (2026)

BrowserStatusVersion
ChromeShipped113+ (April 2023)
EdgeShipped113+
SafariPartial18+
FirefoxIn progressBehind flag

~75% global support with Chrome + Edge. Rising fast.

For browsers without WebGPU, we fall back to the HLE WebGL2 renderer (same as current emulators). Users with WebGPU get pixel-perfect accuracy.


Why This Is Our Moat

If we port ParaLLEl-RDP to WebGPU, we become the only browser N64 emulator with hardware-accurate rendering. This:

  1. Enables games that HLE can't handle (custom microcodes)
  2. Provides pixel-perfect output matching real hardware
  3. Enables HD resolution upscaling (2x, 4x, 8x) with proper filtering
  4. Enables widescreen hacks (ParaLLEl-RDP supports them natively)
  5. Enables HD texture pack support (load high-res textures, apply in real-time)

Nobody else is attempting this. The technical barrier is high enough that it's unlikely to be replicated quickly.


Timeline Estimate

PhaseWorkTimeframe
ResearchAnalyze ParaLLEl-RDP shader set2 weeks
PrototypePort 3 core shaders to WGSL1 month
MVPBasic triangle rasterization in WebGPU2 months
Feature completeAll RDP modes ported4-6 months
OptimizationMatch native ParaLLEl-RDP speed2 months

This is Phase 4 work — after SIMD, threading, and audio are solid. But it's the thing that makes this project legendary.


The wgpu Path (Alternative)

If we ever rewrite the core in Rust, we could use wgpu (Rust WebGPU implementation):

  • Write shaders once → runs on Vulkan (desktop) AND WebGPU (browser)
  • The same binary targets both native and WASM
  • gopher64 could theoretically adopt this approach

This is the long-long-term path — a unified codebase that runs at native speed on desktop and near-native in browsers.