Skip to main content

Rust + WASM Architecture

The reference architecture for a next-generation browser emulator — learned from mass-nes.


Why Study Rust Emulators?

While our Phase 1-3 approach uses the existing C/Emscripten pipeline (fork N64Wasm), the long-term ideal is a Rust-based architecture. This page documents what that looks like, based on the mass-nes reference implementation — the most sophisticated browser emulator ever built.


mass-nes: The Gold Standard

GitHub: nickmass/mass-nes | NES emulator in Rust targeting WASM

This project demonstrates every architectural pattern we want:

Key Stats

  • WASM binary: 767.7KB (incredibly compact)
  • JS glue: 66.2KB
  • Threads: 3 (main, machine, gfx)
  • Audio: AudioWorklet with WASM running inside the worklet
  • Rendering: OffscreenCanvas + WebGL2 in a worker
  • Frame pacing: Audio-driven (AudioWorklet requests samples → drives emulation speed)

The Workspace Pattern

mass-nes uses Cargo workspaces to cleanly separate concerns:

n64-wasm/ (hypothetical Rust N64 emulator)
├── Cargo.toml (workspace root)
├── n64-core/ (Pure emulation — no platform deps)
│ ├── src/
│ │ ├── cpu.rs (MIPS R4300i)
│ │ ├── rsp/
│ │ │ ├── scalar.rs (RSP scalar unit)
│ │ │ └── vector.rs (RSP vector unit — SIMD)
│ │ ├── rdp.rs (RDP renderer)
│ │ ├── memory.rs (RDRAM, ROM, MMIO)
│ │ ├── audio.rs (AI — sample generation)
│ │ └── video.rs (VI — framebuffer output)
│ └── Cargo.toml (deps: none or minimal)

├── n64-web/ (WASM frontend — cdylib)
│ ├── src/
│ │ ├── lib.rs (wasm_bindgen exports)
│ │ ├── app.rs (winit event loop)
│ │ ├── gfx_worker.rs (OffscreenCanvas rendering)
│ │ ├── emu_worker.rs (Emulation in Web Worker)
│ │ └── audio.rs (AudioWorklet integration)
│ └── Cargo.toml (deps: wasm-bindgen, web-sys, winit)

├── n64-desktop/ (Native frontend)
│ ├── src/main.rs
│ └── Cargo.toml (deps: winit, wgpu, cpal)

└── worklet/ (AudioWorklet JS + WASM init)
└── processor.js

The n64-core crate has zero platform dependencies. It outputs raw framebuffers and audio samples. This is testable, portable, and the same code runs in WASM and native.


Threading in Rust/WASM

Build Configuration

# .cargo/config.toml
[target.wasm32-unknown-unknown]
rustflags = [
"-C", "target-feature=+atomics,+bulk-memory,+mutable-globals",
]

[unstable]
build-std = ["panic_abort", "std"]
# rust-toolchain.toml
[toolchain]
channel = "nightly"
targets = ["wasm32-unknown-unknown"]

Shared Memory

// Shared state between threads via SharedArrayBuffer
use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};

struct SharedState {
frame_buffer: Vec<u16>, // 320*240 pixels
audio_buffer: Vec<i16>, // Ring buffer
input_state: AtomicU32, // Packed button state
write_head: AtomicU32, // Audio ring buffer position
frame_ready: AtomicU32, // Signal new frame
}

// Arc<SharedState> works across Web Workers because the
// underlying memory is a SharedArrayBuffer

Worker Spawning

use web_sys::{Worker, WorkerOptions, WorkerType};

fn spawn_worker(module: &JsValue, memory: &JsValue) -> Worker {
let opts = WorkerOptions::new();
opts.set_type(WorkerType::Module);

let worker = Worker::new_with_options("./worker.js", &opts).unwrap();

// Transfer WASM module + shared memory to worker
let msg = js_sys::Array::new();
msg.push(module);
msg.push(memory);
worker.post_message(&msg).unwrap();

worker
}

Audio-Driven Frame Pacing

The most elegant pattern from mass-nes — the audio system drives emulation speed:

Why this is brilliant:

  • Audio hardware has the most consistent clock in the system
  • Emulation speed automatically matches audio output rate
  • No manual frame timing needed (no requestAnimationFrame timing hacks)
  • If GPU is slow, audio still plays smoothly (just frame drops)
  • If CPU is fast, it doesn't run ahead and waste power

gopher64: The Rust N64 Reference

gopher64 is the most mature Rust N64 emulator. Key architectural patterns worth adopting:

RSP Vector Unit (from gopher64)

// gopher64 uses x86 SSE4.1 intrinsics directly
use std::arch::x86_64::*;

fn rsp_vmulf(vs: __m128i, vt: __m128i) -> __m128i {
unsafe {
let lo = _mm_mullo_epi16(vs, vt);
let hi = _mm_mulhi_epi16(vs, vt);
let sign = _mm_srai_epi16(lo, 15);
let result = _mm_sub_epi16(_mm_add_epi16(hi, hi), sign);
result
}
}

For WASM, this would use std::arch::wasm32 SIMD intrinsics:

#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;

#[cfg(target_arch = "wasm32")]
fn rsp_vmulf(vs: v128, vt: v128) -> v128 {
let lo = i16x8_mul(vs, vt); // Lower 16 bits of each product
// ... accumulator logic with WASM SIMD ...
result
}

Why Not Rust Right Now?

For Phase 1-3, we use the C/Emscripten path because:

  1. N64Wasm already works — proven, MIT, just needs optimization
  2. Rewriting in Rust would take years — N64 is far more complex than NES
  3. The SSE2 → WASM SIMD path is free — Emscripten handles translation
  4. Risk is lower — we can ship something fast

When Rust Makes Sense

  • Phase 5+ — if we want to build a truly custom emulator
  • ParaLLEl-RDP in wgpu — Rust + wgpu targets both Vulkan and WebGPU
  • Clean WASM SIMD — Rust's std::arch::wasm32 is more ergonomic than C intrinsics
  • Memory safety — emulator memory handling has many subtle bugs in C

The Long-Term Vision

The Rust rewrite is the endgame — a single codebase that produces:

  • A desktop app (via wgpu/Vulkan)
  • A browser app (via wgpu/WebGPU + WASM)
  • A mobile app (via wgpu/Metal/Vulkan)

All with the same accuracy, the same code, the same features.