Rust + WASM Architecture
The reference architecture for a next-generation browser emulator — learned from mass-nes.
Why Study Rust Emulators?
While our Phase 1-3 approach uses the existing C/Emscripten pipeline (fork N64Wasm), the long-term ideal is a Rust-based architecture. This page documents what that looks like, based on the mass-nes reference implementation — the most sophisticated browser emulator ever built.
mass-nes: The Gold Standard
GitHub: nickmass/mass-nes | NES emulator in Rust targeting WASM
This project demonstrates every architectural pattern we want:
Key Stats
- WASM binary: 767.7KB (incredibly compact)
- JS glue: 66.2KB
- Threads: 3 (main, machine, gfx)
- Audio: AudioWorklet with WASM running inside the worklet
- Rendering: OffscreenCanvas + WebGL2 in a worker
- Frame pacing: Audio-driven (AudioWorklet requests samples → drives emulation speed)
The Workspace Pattern
mass-nes uses Cargo workspaces to cleanly separate concerns:
n64-wasm/ (hypothetical Rust N64 emulator)
├── Cargo.toml (workspace root)
├── n64-core/ (Pure emulation — no platform deps)
│ ├── src/
│ │ ├── cpu.rs (MIPS R4300i)
│ │ ├── rsp/
│ │ │ ├── scalar.rs (RSP scalar unit)
│ │ │ └── vector.rs (RSP vector unit — SIMD)
│ │ ├── rdp.rs (RDP renderer)
│ │ ├── memory.rs (RDRAM, ROM, MMIO)
│ │ ├── audio.rs (AI — sample generation)
│ │ └── video.rs (VI — framebuffer output)
│ └── Cargo.toml (deps: none or minimal)
│
├── n64-web/ (WASM frontend — cdylib)
│ ├── src/
│ │ ├── lib.rs (wasm_bindgen exports)
│ │ ├── app.rs (winit event loop)
│ │ ├── gfx_worker.rs (OffscreenCanvas rendering)
│ │ ├── emu_worker.rs (Emulation in Web Worker)
│ │ └── audio.rs (AudioWorklet integration)
│ └── Cargo.toml (deps: wasm-bindgen, web-sys, winit)
│
├── n64-desktop/ (Native frontend)
│ ├── src/main.rs
│ └── Cargo.toml (deps: winit, wgpu, cpal)
│
└── worklet/ (AudioWorklet JS + WASM init)
└── processor.js
The n64-core crate has zero platform dependencies. It outputs raw framebuffers and audio samples. This is testable, portable, and the same code runs in WASM and native.
Threading in Rust/WASM
Build Configuration
# .cargo/config.toml
[target.wasm32-unknown-unknown]
rustflags = [
"-C", "target-feature=+atomics,+bulk-memory,+mutable-globals",
]
[unstable]
build-std = ["panic_abort", "std"]
# rust-toolchain.toml
[toolchain]
channel = "nightly"
targets = ["wasm32-unknown-unknown"]
Shared Memory
// Shared state between threads via SharedArrayBuffer
use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};
struct SharedState {
frame_buffer: Vec<u16>, // 320*240 pixels
audio_buffer: Vec<i16>, // Ring buffer
input_state: AtomicU32, // Packed button state
write_head: AtomicU32, // Audio ring buffer position
frame_ready: AtomicU32, // Signal new frame
}
// Arc<SharedState> works across Web Workers because the
// underlying memory is a SharedArrayBuffer
Worker Spawning
use web_sys::{Worker, WorkerOptions, WorkerType};
fn spawn_worker(module: &JsValue, memory: &JsValue) -> Worker {
let opts = WorkerOptions::new();
opts.set_type(WorkerType::Module);
let worker = Worker::new_with_options("./worker.js", &opts).unwrap();
// Transfer WASM module + shared memory to worker
let msg = js_sys::Array::new();
msg.push(module);
msg.push(memory);
worker.post_message(&msg).unwrap();
worker
}
Audio-Driven Frame Pacing
The most elegant pattern from mass-nes — the audio system drives emulation speed:
Why this is brilliant:
- Audio hardware has the most consistent clock in the system
- Emulation speed automatically matches audio output rate
- No manual frame timing needed (no
requestAnimationFrametiming hacks) - If GPU is slow, audio still plays smoothly (just frame drops)
- If CPU is fast, it doesn't run ahead and waste power
gopher64: The Rust N64 Reference
gopher64 is the most mature Rust N64 emulator. Key architectural patterns worth adopting:
RSP Vector Unit (from gopher64)
// gopher64 uses x86 SSE4.1 intrinsics directly
use std::arch::x86_64::*;
fn rsp_vmulf(vs: __m128i, vt: __m128i) -> __m128i {
unsafe {
let lo = _mm_mullo_epi16(vs, vt);
let hi = _mm_mulhi_epi16(vs, vt);
let sign = _mm_srai_epi16(lo, 15);
let result = _mm_sub_epi16(_mm_add_epi16(hi, hi), sign);
result
}
}
For WASM, this would use std::arch::wasm32 SIMD intrinsics:
#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;
#[cfg(target_arch = "wasm32")]
fn rsp_vmulf(vs: v128, vt: v128) -> v128 {
let lo = i16x8_mul(vs, vt); // Lower 16 bits of each product
// ... accumulator logic with WASM SIMD ...
result
}
Why Not Rust Right Now?
For Phase 1-3, we use the C/Emscripten path because:
- N64Wasm already works — proven, MIT, just needs optimization
- Rewriting in Rust would take years — N64 is far more complex than NES
- The SSE2 → WASM SIMD path is free — Emscripten handles translation
- Risk is lower — we can ship something fast
When Rust Makes Sense
- Phase 5+ — if we want to build a truly custom emulator
- ParaLLEl-RDP in wgpu — Rust + wgpu targets both Vulkan and WebGPU
- Clean WASM SIMD — Rust's
std::arch::wasm32is more ergonomic than C intrinsics - Memory safety — emulator memory handling has many subtle bugs in C
The Long-Term Vision
The Rust rewrite is the endgame — a single codebase that produces:
- A desktop app (via wgpu/Vulkan)
- A browser app (via wgpu/WebGPU + WASM)
- A mobile app (via wgpu/Metal/Vulkan)
All with the same accuracy, the same code, the same features.