In C-based languages, a switch statement corresponds to a
case expression in Unicon. While this article will talk about
case expressions, most of the techniques would be applicable to
switch statements and similar constructs in other languages.
The value used to select which case is executed will be called the
selector value, while the values that are selected from will be
called the case values.
The current implementation of Icon and Unicon executes case
expressions about the same as it would a corresponding sequence of
if-then-else statements to check each case, comparing the
selector value against each case value, one after another. This gets slower
(O(n)) as the number of cases increases, which becomes not OK as n gets
large. Study of this subject starts with ways to implement case
expressions with simple, contiguous sequences of integer case values, and
then considers gradually more difficult case expressions. The
article will only consider optimizing a subset of case
expressions in which all the case values are constants known at
compile time, which is the most common occurrence and for which the problem
is obviously tractable.
Space is cheap, and if you want a fast-ish implementation for case expressions with a large number (> 10, probably >> 10) of cases, one option may be to throw a bit of space at the problem. In the Unicon language as of April 2022, this is a real need. Unicon programs such as the Unicon translator itself, and the collaborative virtual environment CVE, contain code with a large number of case values to select between. For example, there are approximately 150 different cases in the CVE server's protocol command processor.
case expressions from Icon, which are described a little bit in
the Implementation Compendium. Arbitrary expressions including generators
may be used as case values. The implementation works for arbitrary types of
values. Cases are evaluated one at a time, in order, in the usual way. Each
in turn is compared against the selection value with semantics of the ===
operator. The notation is concise and general but its performance is similar
to a long chain of if-then-else-if statements, and in short programs that
was just fine. But we can do better for the common special cases. Most case
expressions in real Unicon programs use either all integer labels, or all
string labels. In Icon and Unicon's graphics facilities, the Event() function
returns integers for mouse or special events, while it returns one-letter
strings for
regular keyboard events, so this language has a third common pattern,
in which case expressions have integer and string constants
as labels.
case x of {
1: write("one")
2: write("two")
3: write("three")
}
We might want to also include case labels that
contain several integer literals in alternation expressions such as (1|2|3)
or (3 to 11 by 2) or (!10).
case x of {
1: write("one")
2|3|5|7|11: write("prime")
4|6|8|10: write("even")
9: write("multiple of three")
default: write("too large to handle")
}
Basic properties of integer case expressions include:
tableswitch
instruction for simple/contiguous cases, and a lookupswitch
instruction for other more general cases. My impression is
that Java's switch VM instructions have a good design.
evaluate selector value to top of stack subtract min from selector value gocase ; pop stack, advance program counter N instructions goto X goto Y goto ZThis representation depends on an instruction (
gocase) that increments the program counter by some value it
pops off the top of the stack. If min is not zero, subtract or add as
necessary so that the first goto instruction refers to the minimum case
value. The instructions that follow waste some space on redundant opcodes;
if you implement an instruction that indexes directly into
an array of labels, such as Java's tableswitch instruction,
you can save some space on the goto opcodes.
If not all values in the range have code to execute, the jump table will contain jumps to either a default case, or to the instruction after the case expression. If density is low, most of the jump table might consist of redundant jumps to this default target.
In a toy virtual machine, adding a new virtual machine instruction is pretty easy. Adding a VM instruction to Unicon almost never happens and is so rare that it is not really documented. So, the next article will walk through that process.