Inside NSL's Bytecode VM: 158 Opcodes, Threaded Dispatch, and JIT

NSL TeamFeb 20, 20263 min readDeep Dive

Two Execution Modes

NSL programs can run via the tree-walking interpreter (default) or the bytecode VM:

nsl program.nsl          # interpreter
nsl --compile program.nsl  # compile to .nslc
nsl --vm program.nslc      # execute bytecode

The VM supports everything the interpreter does -- cognitive primitives, knowledge graphs, constraints, and all 40 stdlib modules.

Register-Based Architecture

256 registers (R0-R255):
  R0-R15:     General purpose (16)
  R16-R31:    Function arguments (16)
  R32-R175:   Local variables (144)
  R176-R191:  Temporaries (16)
  R192-R255:  System/reserved (64)

Frame-based call stack with saved PC, base register, and return register.

158 Opcodes in 14 Categories

| Range | Category | Examples |

|-------|----------|----------|

| 0x00-0x0F | Memory | LOADC, LOADN, LOADB, MOVE, PUSH, POP |

| 0x10-0x1F | Variables | GLOAD, GSTORE, LLOAD, LSTORE, UPVAL |

| 0x20-0x2F | Arithmetic | ADD, SUB, MUL, DIV, MOD, NEG, POW, INC, DEC |

| 0x30-0x3F | Bitwise | AND, OR, XOR, NOT, SHL, SHR |

| 0x40-0x4F | Immediate | ADDI, SUBI, MULI, DIVI |

| 0x50-0x5F | Comparison | CMP, EQ, NE, LT, LE, GT, GE, TEST |

| 0x60-0x6F | Control | JMP, JZ, JNZ, CALL, RET |

| 0x70-0x7F | Collections | MKLIST, MKDICT, INDEX, SPREAD, SLICE |

| 0x80-0x8F | Strings | CONCAT, INTERP, FMTSTR |

| 0x90-0x9F | Functions | CLOSURE, INVOKE, CALLNAT |

| 0xA0-0xAF | Cognitive | BLFSET, BLFGET, OBSADD, REVISE, CRYST, CAPSET |

| 0xB0-0xBF | I/O | PRINT, INPUT, FREAD, FWRITE |

| 0xC0-0xCF | Constraints | CONSTR, REQUIRE, INVARIANT, SATISFIES |

| 0xD0-0xDF | Extended | ITER, YIELD, AWAIT, SPAWN, MONITOR |

Three Optimization Tiers

Tier 1: Threaded Dispatch (default)

Replaces the central switch(opcode) with per-handler dispatch:

GCC/Clang: Computed goto (&&label + goto *table[opcode])

MSVC: Switched-goto pattern (replicates switch at each handler's tail)

Implemented via X-macro (opcodes.def) -- single source generates both the Op enum and dispatch tables. Expected speedup: 1.5-2x for dispatch-bound workloads.

Tier 2: NaN Boxing (opt-in)

Encodes values in 64 bits using IEEE 754 NaN space:

Regular double:  raw IEEE 754
Nil:             QNAN | tag 00
Bool:            QNAN | tag 01 | (0 or 1)
SmallInt:        QNAN | tag 10 | 48-bit signed (+/- 140 trillion)
Heap pointer:    QNAN | SIGN_BIT | 48-bit address

Hot path (arithmetic, comparison, jumps) operates entirely on NaN-boxed values -- zero heap allocation. Cold path (cognitive ops, I/O) converts via bridge functions.

Expected speedup: 5-10x for numeric-heavy workloads.

Tier 3: JIT Compilation (opt-in)

Method-based JIT detects hot functions and compiles to native x64:

Bytecode -> Interpreter (cold) -[100 calls]-> JIT Compiler -> Native x64 (hot)

Profiles per-function call counts; compiles at threshold (default 100)
JIT-able: memory, arithmetic, comparison, control flow (~50 opcodes)
Non-JIT-able: falls back to interpreter via helper call-outs
W^X compliant: PAGE_READWRITE during codegen, PAGE_EXECUTE_READ before execution
Windows x64 (VirtualAlloc/VirtualProtect) and Linux x64 (mmap/mprotect)

Compilation Pipeline

Source (.nsl) -> Lexer -> Parser -> AST -> Compiler -> Bytecode (.nslc) -> VM

The compiler performs symbol resolution (lexical scoping with frame offsets), code generation to register-based opcodes, and optimization passes (constant folding, dead code elimination).