We were surprised to discover that the task of adding a new instruction to the Icon/Unicon virtual machine was under-documented in Icon or Unicon implementation documentation. It was likely spelled out in Ralph Griswold's Icon Implementation graduate course at one point, for which lecture notes might still exist somewhere. While we are looking for that, if you need to perform this admittedly rare task, here are some notes.
If you add a new instruction, old virtual machines will not know how to run
the new opcode. The first thing to do may well be to change the version
number of the ucode and icode files. Changing version numbers is a bit of a
gnarly thing to do, and you may want to keep a duplicate copy of your source
tree around to mitigate bootstrapping issues when you do it. The file that
holds the version number is src/h/version.h. There are a number
of macros that need to be updated.
VersionNumber VersionDate DVersion UVersion IVersion
VersionNumber and VersionDate are human readable
and straightforward. DVersion denotes the rtt runtime system
"database version". UVersion refers to the version for the
human-readable text ucode files, which serve as Unicon's object and VM
bytecode assembler format. It will definitely need to change if you add a
new instruction in ucode. IVersion specifies the icode version,
the platform-dependent binary VM bytecode format that is executed by iconx.
While a new instruction in theory requires only new UVersion and IVersion
numbers, it is typical to update all the version numbers together. Version
numbers get used in various files that may have to be regenerated after this
change; for example there is a file src/runtime/rt.db that
holds a database of type information for the runtime system. Be sure the
version number at the top of that file gets updated. You might get away with
modifying this file manually, but re-building rt.db from scratch will
probably do it automatically.
Next you'll want to define a new opcode for your instruction. The file
src/h/opdefs.h holds the macro definitions for each VM
instruction. You'll want to pick a number that is unused by another
instruction and define a new macro for your instruction. The name doesn't
really matter but you should follow the naming convention used by
the other instructions (Op_<your instruction's
name>). Next navigate to the Unicon translator directory at
src/icont and find the opcode.c file. This file
holds the opcode table. This table defines the string that corresponds with
each opcode. The optable is sorted alphabetically, so find out
where your instruction should belong in the table and add a new entry for
your instruction.
Now that the opcode is defined, we move on to some more complicated
stuff. The file src/icont/tcode.c is the translator file that
traverses the syntax tree and writes out VM code. The file contains some
functions to assist with more complicated syntax structures and the
traverse() function. The traverse() function is a
recursive function that traverses the syntax tree and executes a switch for
every node it encounters. Each case of the switch has some code to write VM
instructions using the emit family of functions. Remember the string
representation of your opcode? If your new instruction is to be produced as
part of the code generation for some piece of Unicon syntax, you'll need to
emit that new opcode string somewhere in this switch as part of the code
generated for one of the syntax tree nodes. Where you place your emit depends on the context of your VM
instruction. You may also need to use some of the helper functions to emit
the code you want. (Note : use grep to find the source code for unfamiliar
functions. Grep is your friend!)
Now we need to make some changes to the linker so that the ASCII readable
icode gets translated to binary. The file you want is
src/icont/lcode.c. This file also consists of a bunch of helper
functions and a main function called gencode(). The
gencode() function reads
the ASCII ucode and generates icode. It has a loop with another big switch
statement that switches on each opcode. Luckily, if your instruction falls
into a common category, like a new operator, or a simple instruction, you
probably won't have to write any complex code. The cases are nicely
organized into groups. If your instruction falls into one of those groups,
you can probably just add another case at the bottom of the group for your
instruction and call it a day. If your instruction is more complicated, you
may have to look into how the helper functions work to implement your icode
translation.
After the changes to the linker are made, you can finally move on to the
main interpreter. The main interpreter loop code is located in
src/runtime/interp.r. In interp.r, there is an
infinite loop (for(;;){...}), containing another big switch
statement. This one also has a case for every opcode. This is where you'll
want to add the main code to implement your instruction. Just add a new case
for your opcode and write your code underneath.