Threading Model
Web Workers + SharedArrayBuffer + Atomics — the architecture that makes everything else possible.
Why Multi-Threading Changes Everything
The N64 has 3 processors running simultaneously:
- VR4300 CPU — 93.75 MHz MIPS III processor
- RSP — 62.5 MHz MIPS vector coprocessor
- RDP — Fixed-function rasterizer
Current browser emulators serialize all of this onto a single thread. Our architecture parallelizes it:
Required HTTP Headers
SharedArrayBuffer requires Cross-Origin Isolation:
// next.config.js
module.exports = {
async headers() {
return [{
source: '/(.*)',
headers: [
{ key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
{ key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
],
}];
},
};
:::warning IMPORTANT
These headers affect ALL resources on the page. Third-party scripts (analytics, ads, etc.) must also be served with CORP headers or loaded via crossorigin="anonymous". Plan for this from day one.
:::
Thread Communication Patterns
Input: Main → Emulation Worker
// Input state is a simple struct in SharedArrayBuffer
// Main thread writes, emulation worker reads
// No locks needed — single writer, single reader, atomic values
const INPUT_OFFSET = 0x10104000; // Fixed address in shared memory
interface N64InputState {
buttons: Uint16Array; // 2 bytes: A,B,Z,Start,DUp,DDown,DLeft,DRight,L,R,CUp,CDown,CLeft,CRight
analogX: Int8Array; // 1 byte: -128 to +127
analogY: Int8Array; // 1 byte: -128 to +127
timestamp: Uint32Array; // 4 bytes: monotonic counter (for change detection)
}
// Main thread (1000Hz polling)
function pollInput() {
const gamepads = navigator.getGamepads();
const gp = gamepads[0];
if (!gp) return;
Atomics.store(inputState.analogX, 0, Math.round(gp.axes[0] * 127));
Atomics.store(inputState.analogY, 0, Math.round(-gp.axes[1] * 127));
// ... buttons ...
Atomics.add(inputState.timestamp, 0, 1); // Signal new input
}
Frame: Emulation Worker → GFX Worker
// Double-buffered framebuffer in SharedArrayBuffer
const FB_A = 0x10000000; // 320 * 240 * 4 = 307,200 bytes
const FB_B = 0x1004B000; // Second buffer
// Emulation worker: signal frame complete
Atomics.store(frameReady, 0, currentBuffer);
Atomics.notify(frameReady, 0);
// GFX worker: wait for frame, then render
while (true) {
Atomics.wait(frameReady, 0, lastFrame);
const buffer = Atomics.load(frameReady, 0);
uploadFramebufferToGPU(buffer);
lastFrame = buffer;
}
Audio: Emulation Worker → Audio Worklet
Lock-free ring buffer (see Audio Pipeline for details).
Emscripten pthread Implementation
Emscripten maps POSIX threads to Web Workers:
// C code in the emulator core
#include <pthread.h>
void* rsp_thread_func(void* arg) {
while (running) {
// Wait for CPU to signal RSP task
sem_wait(&rsp_semaphore);
// Execute RSP operations (with SIMD!)
execute_rsp_task();
// Signal completion
sem_post(&rsp_done);
}
return NULL;
}
int main() {
pthread_t rsp_thread;
pthread_create(&rsp_thread, NULL, rsp_thread_func, NULL);
// Main emulation loop (runs on emulation worker)
while (running) {
step_cpu();
if (rsp_task_pending) {
sem_post(&rsp_semaphore); // Wake RSP thread
}
}
}
Emscripten compiles this to Web Workers automatically with -pthread.
Thread Pool Pre-creation
Worker creation is expensive (~50ms). We pre-create the pool at startup:
# Build flag
-sPTHREAD_POOL_SIZE=4
This creates 4 Web Workers immediately when the WASM module loads. Thread creation (via pthread_create) then reuses pool workers instead of spawning new ones.
PROXY_TO_PTHREAD
# Build flag
-sPROXY_TO_PTHREAD
This moves the entire main() function to a Web Worker. The browser's main thread becomes a thin proxy that only:
- Handles DOM events (input)
- Forwards them to the emulation worker
- Nothing else
Result: the main thread is essentially idle during gameplay — maximum responsiveness.
Synchronization Strategy
Performance Impact
| Optimization | Estimated Improvement | Mechanism |
|---|---|---|
| Move emulation off main thread | Input latency: -5ms | Main thread free for polling |
| OffscreenCanvas in worker | Frame delivery: 4x smoother | No main thread contention |
| AudioWorklet | Audio latency: -20ms | Dedicated thread, fixed timing |
| RSP on separate thread | Throughput: +30-50% | Parallel execution |
| PROXY_TO_PTHREAD | Overall: +10-15% | No main thread overhead |
Fallback for No-SharedArrayBuffer
If headers can't be set (some hosting environments), we gracefully degrade:
- Single-threaded emulation (like current emulators)
postMessagefor frame transfer (slower but works)ScriptProcessorNodefor audio (worse but functional)
Detection:
const hasThreads = typeof SharedArrayBuffer !== 'undefined';
const hasCOOP = window.crossOriginIsolated === true;