Help Wanted!
The Unicon project is looking for help on the following topics as of 2/6/2013.
Thanks to Hugh Sasse for improving the HTML and adding a table of contents!
Most of these topics are requests from the user community. Many of these would
make excellent independent study or thesis topics. Net volunteers are also
welcome. Anyone willing to pay for any of these projects should drop me a
note, and I will hire appropriate students to do it at their bargain wage
rates.
An asterisk (*) in the title indicates that Somebody is believed
to be working on that topic, or has implemented a feature not yet
adopted in the Unicon baseline.
Contents
The Unicon translator is written in Unicon and has evolved since around 2000.
Purpose: core language. Skills needed: Unicon expert, compilers.
- The Unicon translator is an appropriate place to implement a
number of classic, easy optimizations such as constant folding
and common subexpression elimination.
Amazingly, icont does not do these. Icont should also do
strength reduction, not only for classic arithmetic operations
but for string processing; for example,
changing
upto('s') to find("s").
Also, there are obvious conversions to avoid at compile time, such as
changing write(1) to write("1") and the like.
In 2006-2007 Unicon re-incorporated Icon's optimizing compiler,
which is built separately and accessed using "unicon -C" on Unix-based
systems with an available C compiler.
Purpose: faster execution. Skill needed: C expert.
- Dead Code Elimination --
Iconc needs to remove unreferenced procedures and classes prior to type
inferencing in order to speed up compilation and reduce memory requirements.
- Compiler Optimization --
Iconc can use further optimization, both of its own
operation and
particularly of its generated code. An earlier successful project
by Anthony Jones showed me than even an enterprising undergraduate
student can make a real difference (in his case, a 2/3 reduction in
iconc's memory requirements).
- Windows Compiler -- Iconc needs to be ported
to Windows, preferably using a free Windows C compiler such as gcc or lcc.
The win32/gcc configuration provides a starting point.
The Unicon virtual machine is the Icon virtual machine, with extensions.
It was written in C in the early 1980's. The VM runtime system was
altered in the early 1990's to use an extended-C syntax called RTL
(runtime language). This allowed it to be used for both Iconx and Iconc.
Skills needed: C expert.
- VM optimization-- The virtual
machine translator (icont), and the interpreter and runtime system
(iconx) can be further enhanced for better performance. For example,
the memory allocations performed during most string subscripts can be
avoided with a relatively simple addition of a new virtual machine
instruction. As another example, in many or most calling contexts,
the translator can identify when a generator cannot be resumed. If this
information were passed into the invocation, a suspend might be promoted
into a (much faster) return expression.
- VM translator type inferencing --
The type inferencing mechanism used by iconc has been sped up to the
point where type inferencing could be used to direct VM optimizations,
not just C-compiles.
- Compact structure representations --
Common special-cases of data structures should have special-purpose
representations to save space and time. One example was a user request
for "tiny tables" -- if a program needs millions of small tables, the
memory overhead of such tables becomes important. If HSegs is 20, one
would seemingly be able to store tables of size <= 5 in a single Table
block instead of allocating an array of buckets.
Also, study (and some implementation) is needed to decide if the
implemented array() representation should be used as the default
behavior of list() when its initial value is numeric.
- Dynamic Interpreter Stack --
Short of infinite recursion,
it should be almost impossible to cause an interpreter stack overflow.
Perhaps Icon's own list data type could be used to implement a
dynamic interpreter stack. Alternatively, checking and
realloc'ing the interpreter stack might work.
- VM dynamic code-- The dynamic loading facility built-in to
Unicon needs to be supplemented and extended with dynamic linking to
allow new code to be generated and executed on the fly.
- Portable bytecode-- It would be nice if Unicon executables
could be delivered in a machine neutral format, similar to the Java VM.
- cset keyword conversions -- are keywords such as
&lcase
converted to strings often enough to warrant special-cases in the cset
conversion code?
- Avoid one-char allocations-- Many functions such as map()
would not need to allocate a string from the heap, if that string were
of length 1, they could just return a pointer to that character in
static memory.
- Improve large integer string conversion --
A large integer such as 5^4^3^2 takes a long time to convert to a string,
like 4+ minutes on an older amd64! This could be made much faster,
possibly by reimplementing large integers using GMP or altering their
representation to be base-10-compatible on a per-largeint-chunk basis.
We have gone to considerable effort (like, my Ph.D. dissertation) to enable
the authoring of advanced tools for Unicon in Unicon. Things are in a bit of
flux right now and our debugging facilities need to be extended to be able
to handle new features such as threads.
Skills needed: Unicon expert.
- Unicon Debugger Enhancements --
The monitoring facilities described in the book
"Program Monitoring
and Visualization". have been used to produce an extensible source level
debugger, udb.
This debugger is relatively new and can use further refinement.
- Unicon Profiler --
A good profiler would tell time and space information about Unicon program
executions, including runtime system time and and space, not just source
code modules' time and space. Line-level, and built-in level details are
needed. *Status: a simple profiler prototype named uprof was developed by
a student as a semester project. It is useful enough that it has made it
into the language distribution, but needs further refinement. &time on some
platforms can benefit from improved resolution using high resolution timers.
On typical Linux machines the current 10ms resolution limits uprof's precision.
- Unicon Lint --
A "lint"
for Unicon would detect bugs and probable bugs
by static analysis. For example, redundant/repeated type conversions.
Fonts are an important aspect of widening Unicon's suitability to more
applications. They are at present the single biggest obstacle to portability
across platforms.
Skills needed: C expert.
- Unicon Freetype* --
Unicon should add support for the
Freetype
font engine and provide a set of
portable fonts that match, pixel-for-pixel, on all window systems.
*Status: basic freetype support was added to 3D facilities.
It should be extended to work with 2D graphics,
and needs further development.
- Unicon Unicode --
Unicon should add support for Unicode and/or other >8 bit character sets.
- Unicon Native Fonts --
It would be nice if Unicon could add new fonts dynamically, in order
to support interesting languages that are not well supported by
operating systems.
- Unicon Deadkeys --
The iconx X11 client code should be updated to use X11R5+ support for
locales and "dead keys" to compose accent characters using XmbLookupString
and/or the LC_CTYPE stuff.
Skills needed: Unicon expert and/or C expert.
- Class Variables
- Some additional syntax is needed to make it more convenient to declare
variables who are shared among all instances of a class. Currently you
can achieve this effect using globals and packages, and method static
variables are shared among instances, but a more direct syntax would
be handy.
- Private and Read-Only-Publics
- Unicon's predecessor Idol had private semantics and a public keyword.
Private semantics were dropped because they added to complexity and
space consumption without adding functionality. But arguably they
have value and should be an option. While a distinction between
private and protected does not seem very useful in Unicon, a scope
that would be really useful would be a read-only public designation,
to avoid the need for many accessor methods.
The existing work on a SNOBOL-style pattern data type needs to be integrated
better with string scanning. Skills needed: C expert.
Iconx needs to be extended to support directly executing .icn source files.
Also, support for "one-liners" where the source code is supplied as a
command line option. Icon 9.5 added some support for this on UNIX;
for Unicon we need a multiplatform solution if possible.
Better programming tools are always in demand. An interactive interpreter,
or an incremental compilation system, would make an excellent project.
There are several ways to execute new unicon on the fly that was typed in
interactively:
- using
system()
- slow, doesn't pass non-string parameters easily
- using
load()
- but
load() does not "link" into the current program, and currently
does not support calling procedures in another program directly, one would
have to use a co-expression to change control to the other "program" and then
call a desired procedure via some wrapper code. Also,
load()'ing a lot may
have garbage collection issues that haven't been discovered yet.
- Undergrad-level project: develop a "library" model for Unicon modules,
calling them through a co-expression interface using wrapper procedures
- develop a new mechanism for linking and loading COMPILED Unicon code
as a .so/.dll per loadfunc()
- developing a pure interpreter
- for strings or syntax trees constructed from a parse of the code.
As an experiment, I wrote a little program that reads lines from the user,
and for each one, calls an eval(s) function that writes it to a file,
compiles it, uses load(), and activates it. This is "slow", but runs in
well under a second, it is not obvious that we have to discard unicon/icont
and go with some pure interpreter in order to provide this type of service
on modern machines. Handling stored procedures and globals in such an
interpretive environment requires more thought, but still seems doable, and
would be useful to experimenters and new users.
Udaykumar Batchu performed a project to simplify the calling of
C functions from within the runtime system, improving on the traditional
Icon loadfunc() dynamic loading utility. His work needs some refinement,
and student Vincent Ho suggested an "inline C" capability that would fit
in nicely. It would
be interesting to add such a capability to the compiler and
to the interpreter.
Skills needed: Unicon expert. C expert.
It has been requested that we make the interpreter embeddable within
C/C++ applications. Developing a standard mechanism for turning the
Unicon VM into a callable C library would make an interesting project.
Skills needed: C expert.
The graphics facilities would benefit from multiplatform printing support,
including the generation of postscript or pdf. The database facilities
would benefit from a report generator similar to crystal reports.
Skills needed: C expert.
The messaging facilities done by Steve Lumos support popular protocols such
as HTTP and POP. One thing we need to do is port these from UNIX to Win32.
Another thing we need to do is add protocols. We would especially like to
see SSL support added, using OpenSSL or some other free implementation of
SSL. A critical extension for e-mail support is SMTP AUTH, the authenticated
version of the SMTP protocol. We also need FTP, IMAP, NNTP, ...
Single-platform enhancements are uninteresting to users on other platforms,
but occasionally they are necessary or useful in making Unicon suitable for
applications that it otherwise would not be used in.
Skills needed: C expert.
There are currently 11 Windows-native functions in the Windows versions
of Unicon, implementing buttons, scrollbars, menubars, edit regions, and
various dialogs. A larger set-of Windows native GUI capabilities might
allow applications to look more "native" on Windows and be usable by
screen readers.
One of the oft-requested Windows-specific features is COM support.
The technical questions are: (a) is a platform
independent interface possible (to support CORBA or javabeans as well,
for example, and (b) how high-level can we make this API?
Porting iconx to be an Active Script Engine (at one time documented in the
"Visual Programmer" column from Microsoft Systems Journal online) would
allow Icon to be an embedded scripting language for many Windows
applications.
Skills needed: C expert.
Additional means of automating the transmission of structured or
binary data would be valuable to Unicon -- Google's Protocol Buffers
are an example.
Skills needed: C expert.
New platforms of particular interest are PDAs.
David Price performed a preliminary WinCE port including much of
the 2D graphics facilities. It needs extensions in several areas,
such as networking, and a strategy for adapting existing GUI windows
to the small screen (scaling them, or adding automatic scrolling).
It would be neat if Unicon handled common archive and compressed archive
formats such as .zip as easily as it does other file types.
Skills needed: C intermediate.
It would be useful to add I/O modes in which arbitrary structure
values (tables, objects, etc) could be written to and read from disk,
making something like encode()/decode() a built-in.
Skills needed: C intermediate.
Unicon's structure types are all mutable, making them next to useless as
hash keys. Adding a "freeze" bit, a promise from then on that a structure
would never be modified, would enable them to hash on contents instead of
on serial number, and might enable various optimizations.
Skills needed: C intermediate.
Skills needed: C expert. Graphics API expert.
- Direct3D port. Such a port would unable Unicon to run on
Windows-based platforms that do not support OpenGL well (Vista? Xbox?).
- When a window (possibly 2d, offscreen) is used as a Texture on a
3D object, it should be updated with current contents every time
the 3D object is redrawn. :-)
- Unicon should support PNG as a standard graphics format.
*This has been implemented but seems to behave incorrectly on some images.
- Subwindows (at least) should support a borderwidth attribute,
and have the option of having no border. Perhaps main windows too.
- Mac - We need a Macintosh programmer,
proficient in (or willing to learn) native Mac graphics API's,
to complete the Mac port
of the graphics facilities. A QuickDraw version
reached alpha-stage at Icon 9.3.1
but was not finished. At this point we probably should axe that
codebase and instead pursue a Quartz port for Mac OS X. Prototyping for
this effort showed that a Cocoa GUI thread creating and calling a VM
thread was the right way to organize this execution model.
- Other platforms? - We want our portable graphics on all
platforms for which our community wants to program. For example, we
had an earlier port to OS/2 Presentation Manager, back when that was
in wide use.
The rise of dual-core CPU's makes it inevitable that Unicon should be
extended to support parallel computation. The interesting questions are
whether it should support implicit or explicit parallelism or both.
Skills needed: C expert
- DataParallel Operators* --
Unicon should support (deep) structure-at-a-time operators, such as
L1+L2 producing a list L3 with elements of L1 pairwise-summed with L2.
*Status: experimental modifications to support element-wise addition
is in the runtime system under the #ifdef symbol DataParallel.
It would be neat if Unicon supported persistent structures, structures
that survive across program executions. An approximation of this can
be accomplished by storing xencoded structures in GDBM files, but it
would be nice if it were easier and more direct.
Skills needed: C expert
The error messages, particularly from the runtime system, can be enhanced to
improve readability and help the programmer have a clue of how to fix the
problem encountered. Long error tracebacks should be written to a file and
a terser summary printed to standard error output. The default diagnostics
style should be friendlier to new Unicon programmers. It might be possible
to load/attach udb when a runtime error occurs.
Skills needed: C intermediate.
Icont's parser needs to be modified to work with any YACC
implementation. At present it fails on some 64-bit Linuxes
if -O2 is turned on, apparently due an issue in the old AT&T YACC
parser skeleton.
The unicon IDE and IVIB need to to be joined together.
Skills needed: Unicon intermediate.
Icon's benchmark suite from long ago is inadequate to compare performances
usefully on modern systems. We need a new benchmark suite that (a) exercises
the various features of the language, and (b) ideally would fit in with
a portable benchmark suite used to compare performance across other
very high level languages.
Skills needed: Unicon intermediate.