Skip to main content

Threading Model

Web Workers + SharedArrayBuffer + Atomics — the architecture that makes everything else possible.


Why Multi-Threading Changes Everything

The N64 has 3 processors running simultaneously:

  • VR4300 CPU — 93.75 MHz MIPS III processor
  • RSP — 62.5 MHz MIPS vector coprocessor
  • RDP — Fixed-function rasterizer

Current browser emulators serialize all of this onto a single thread. Our architecture parallelizes it:


Required HTTP Headers

SharedArrayBuffer requires Cross-Origin Isolation:

// next.config.js
module.exports = {
async headers() {
return [{
source: '/(.*)',
headers: [
{ key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
{ key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
],
}];
},
};

:::warning IMPORTANT These headers affect ALL resources on the page. Third-party scripts (analytics, ads, etc.) must also be served with CORP headers or loaded via crossorigin="anonymous". Plan for this from day one. :::


Thread Communication Patterns

Input: Main → Emulation Worker

// Input state is a simple struct in SharedArrayBuffer
// Main thread writes, emulation worker reads
// No locks needed — single writer, single reader, atomic values

const INPUT_OFFSET = 0x10104000; // Fixed address in shared memory

interface N64InputState {
buttons: Uint16Array; // 2 bytes: A,B,Z,Start,DUp,DDown,DLeft,DRight,L,R,CUp,CDown,CLeft,CRight
analogX: Int8Array; // 1 byte: -128 to +127
analogY: Int8Array; // 1 byte: -128 to +127
timestamp: Uint32Array; // 4 bytes: monotonic counter (for change detection)
}

// Main thread (1000Hz polling)
function pollInput() {
const gamepads = navigator.getGamepads();
const gp = gamepads[0];
if (!gp) return;

Atomics.store(inputState.analogX, 0, Math.round(gp.axes[0] * 127));
Atomics.store(inputState.analogY, 0, Math.round(-gp.axes[1] * 127));
// ... buttons ...
Atomics.add(inputState.timestamp, 0, 1); // Signal new input
}

Frame: Emulation Worker → GFX Worker

// Double-buffered framebuffer in SharedArrayBuffer
const FB_A = 0x10000000; // 320 * 240 * 4 = 307,200 bytes
const FB_B = 0x1004B000; // Second buffer

// Emulation worker: signal frame complete
Atomics.store(frameReady, 0, currentBuffer);
Atomics.notify(frameReady, 0);

// GFX worker: wait for frame, then render
while (true) {
Atomics.wait(frameReady, 0, lastFrame);
const buffer = Atomics.load(frameReady, 0);
uploadFramebufferToGPU(buffer);
lastFrame = buffer;
}

Audio: Emulation Worker → Audio Worklet

Lock-free ring buffer (see Audio Pipeline for details).


Emscripten pthread Implementation

Emscripten maps POSIX threads to Web Workers:

// C code in the emulator core
#include <pthread.h>

void* rsp_thread_func(void* arg) {
while (running) {
// Wait for CPU to signal RSP task
sem_wait(&rsp_semaphore);
// Execute RSP operations (with SIMD!)
execute_rsp_task();
// Signal completion
sem_post(&rsp_done);
}
return NULL;
}

int main() {
pthread_t rsp_thread;
pthread_create(&rsp_thread, NULL, rsp_thread_func, NULL);

// Main emulation loop (runs on emulation worker)
while (running) {
step_cpu();
if (rsp_task_pending) {
sem_post(&rsp_semaphore); // Wake RSP thread
}
}
}

Emscripten compiles this to Web Workers automatically with -pthread.


Thread Pool Pre-creation

Worker creation is expensive (~50ms). We pre-create the pool at startup:

# Build flag
-sPTHREAD_POOL_SIZE=4

This creates 4 Web Workers immediately when the WASM module loads. Thread creation (via pthread_create) then reuses pool workers instead of spawning new ones.


PROXY_TO_PTHREAD

# Build flag
-sPROXY_TO_PTHREAD

This moves the entire main() function to a Web Worker. The browser's main thread becomes a thin proxy that only:

  • Handles DOM events (input)
  • Forwards them to the emulation worker
  • Nothing else

Result: the main thread is essentially idle during gameplay — maximum responsiveness.


Synchronization Strategy


Performance Impact

OptimizationEstimated ImprovementMechanism
Move emulation off main threadInput latency: -5msMain thread free for polling
OffscreenCanvas in workerFrame delivery: 4x smootherNo main thread contention
AudioWorkletAudio latency: -20msDedicated thread, fixed timing
RSP on separate threadThroughput: +30-50%Parallel execution
PROXY_TO_PTHREADOverall: +10-15%No main thread overhead

Fallback for No-SharedArrayBuffer

If headers can't be set (some hosting environments), we gracefully degrade:

  1. Single-threaded emulation (like current emulators)
  2. postMessage for frame transfer (slower but works)
  3. ScriptProcessorNode for audio (worse but functional)

Detection:

const hasThreads = typeof SharedArrayBuffer !== 'undefined';
const hasCOOP = window.crossOriginIsolated === true;