.Dd Tue, Sep 24 2024
.Dt minimunger 1
.Sh NAME
.Nm MiniMunger
.Nd Language for writing text-processing filters
.Sh SYNOPSIS
.Nm minimunger Ao source-file Ac
.Sh DESCRIPTION
MiniMunger is a simple, non-optimizing compiler-to-C for a small variant
of Munger(1).  The compiled language is more "functional" but less
functional, having first-class continuations, but lacking lists,
first-class symbols, local side-effects, macros, "eval", "extend", and
runtime error-checking.  An interface to the SQLite library is provided.
MM is specialized for, and limited to, writing filters.
.Pp
This manual page describes only the differences between Munger and MM.  For
more information, see the Munger manual page.  The minimunger compiler
itself is written in Munger.
.Pp
Some example programs are included.  These are installed in
${PREFIX}/share/minimunger.
.Bl -tag -width "transform.munger"
.It grep.mm
is an egrep-like filter.
.It fmt.mm
is a fmt-like filter.
.It options.mm
helps process command-line arguments.
.It stacks.mm
provides higher-order functions to apply functions to the elements of
stacks.
.El
.Pp
To build the example filters, invoke:
.Bd -literal -offset left
% make fmt grep
.Ed
.Pp
The filters' source code describes their command-line options.
.Sh IMPLEMENTATION NOTES
The MM compiler is a whole-program compiler, reading one input file, and
producing two output files of C code, which may then be compiled by the C
compiler with the MM runtime to produce an executable.  Instructions for
using the compiler follow this section.
.Bl -bullet
.It
MM does not support lists nor any list-related functions.  Programs are
written as S-expressions, but programs may not create S-expressions.  The
standard aggregate type of MM is the stack, a dynamically-resizable,
one-dimensional array.
.It
Side-effects are only permissable on globals.
.It
The first-class data types supported are:  stacks, tables, records,
closures, continuations, compiled regular expressions, 8-bit-clean strings
and fixnums.  A fixnum is the size of a C int on the hardware on which MM
is running.
.It
The MM runtime does no type-checking for maximum execution speed.  If you
call an intrinsic with an argument of the wrong type, your program will
crash.
.It
Global variables are automatically created when their symbols are
encounted.  If you attempt to access a global before it has been
initialized, your program will crash.
.It
Although lambda-expressions can be bound to variables in "let" and "letn"
forms, there is no "labels" nor "letf" to allow temporary functions to
see their own bindings.  Any function which calls itself must have a
toplevel binding.  MM forces the programmer to break out all but the most
trivial helper functions, into separately-defined functions.
.It
There are no looping constructs.  All iteration is done via recursion.  CPS
conversion is performed during compilation, turning all calls into
tailcalls.  Despite this, tail-recursion will be more space-efficient than
recursion from non-tail positions, because the CPS-converted code of
functions which recurse in non-tail positions will create closures to
capture state.  The stack won't grow, but the heap will.
.It
First-class continuations are captured with the "call_cc" intrinsic which
behaves like "call/cc" does in Scheme.
.It
User-defined functions have fixed-size argument lists.
.It
User-defined macros are not supported in minimunger code, but you can hack
the compiler to define your own compiler macros.  See the function
"make_initial_cte".
.It
All equality comparisons use "eq".
.El
.Sh COMPILING MM PROGRAMS
MM depends upon the the SQLite database library, which must installed
before you can compile MM programs.  Invoke "pkg sqlite3 install"
.Pp
The MM compiler is written in Munger, and compiles MM code to an
intermediate language, defined as a set of macros in the source of the C
runtime.  Most of the macros expand to in-line code for speed, resulting in
larger executables than might be expected from the size of the original MM
programs.
.Pp
To compile a MM program the compiler must be invoked on the main source
file:
.Bd -literal -offset left
% minimunger grep.mm
.Ed
.Pp
The compiler will take some time to perform source-to-source conversions
before it begins to emit code, printing status messages as it does so.
When it has finished, two files will have been created, one named
"functions.c" and one name "functions.h".  To create an executable from
these files one must invoke the C compiler on the MM runtime source, which
will include the other two files.  The command line below will be the same
when building any program compiled by MM, except for the argument to the -o
option, specifying the output file.
.Bd -literal -offset left
% cc -o grep /usr/local/share/minimunger/runtime.c \\
-I./ -I/usr/local/include -L/usr/local/lib -ltre -lsqlite3
.Ed
.Pp
The main source file of a program may include other source files with the
"include" directive.  The "include" directive resembles its similarly-named
C preprocessor counterpart, and consists of the word "include" preceded by
an octothorpe (#), and succeed by a double-quote delimited filename.  For
example:
.Bd -literal -offset left
#include "options.mm"
.Ed
.Pp
If the filename itself contains double quotes, they do not need to be
escaped.  Include directives must start in column zero to be recognized.
Otherwise, they will be treated as comments.  Included files themselves may
also "include" other source files.
.Sh THE INTRINSICS
The MM intrinsics bear strong resemblence to their similarly-named Munger
counterparts.  Some behave differently.  Some accept a differing number of
arguments.  Some accept differing types of arguments.  Some have different
names.  The differences, in all cases, however, are minor.  This summary
does not completely document the operation of the intrinsic functions, but
merely lists which are available and how they differ from their Munger
counterparts.  For complete documentation of an intrinsic, see the
Munger(1) manual page.
.Ss Control Flow / Side-Effects
The empty string and 0 are boolean false values.  All other objects are
considered boolean true values.  The forms below function identically to
their Munger counterparts, with the exception of the conditionals.  Note
that "setq" is the only means of accomplishing side-effects on variables,
and that side-effects are only permissable upon globals.
.Pp
When "if" is invoked with only a "true" subsequent clause, and the test
condition evaluates to a false value, 0 is returned, and not the value of
the failed test condition.  Similarly, if all test clauses of an invocation
of "cond" fail, then 0 is returned, rather than the value of the last
failed test condition.  Both "when" and "unless" also return 0 if their
test conditions fail.
.Bl -column -offset left "unless" "(letn ((symbol expr)+) expr+)"
.It Sy Form Ta Sy Use
.It Li setq   Ta   (setq symbol expr)
.It Li if     Ta   (if test expr1 expr2 ...)
.It Li cond   Ta   (cond (test_expr subsequent ...)+ )
.It Li when   Ta   (when test expr ...)
.It Li unless Ta   (unless test expr ...)
.It Li progn  Ta   (progn expr ...)
.It Li eq     Ta   (eq expr1 expr2)
.It Li or     Ta   (or expr ...)
.It Li and    Ta   (and expr ...)
.It Li not    Ta   (not expr)
.It Li let    Ta   (let ((symbol expr)+) expr+)
.It Li letn   Ta   (letn ((symbol expr)+) expr+)
.It Li exit   Ta   (exit expr)
.It Li die    Ta   (die ...)
.El
.Pp
call_cc is used to capture the current continuation.  It functions
exactly as call/cc does in Scheme:
.Bl -column -offset left "call_cc" "(call_cc monadic_function)"
call_cc  (call_cc monadic_function)
.El
.Pp
.Ss Regular Expressions
.Bl -column -offset left "substitute" "(substitute rx rep str count)" "0 or stack of 2 fixnums"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li regcomp    Ta (regcomp str)               Ta compiled rx
.It Li match      Ta (match rx str)              Ta 0 or stack of 2 fixnums
.It Li matches    Ta (matches rx str)            Ta stack of 20 strings
.It Li substitute Ta (substitute rx rep str cnt) Ta string
.It Li regexpp     Ta (regexpp expr)             Ta 0 or 1
.El
.Ss Tables
.Bl -column -offset left "unhash" "(hash table expr1 expr2)" "associated expr"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li table      Ta (table)                       Ta new table
.It Li tablep     Ta (tablep expr)                 Ta 0 or 1
.It Li hash       Ta (hash table expr1 expr2)      Ta table
.It Li unhash     Ta (unhash table expr1)          Ta table
.It Li lookup     Ta (lookup table expr)           Ta associated expr
.It Li keys       Ta (keys table)                  Ta stack of keys
.It Li values     Ta (values table)                Ta stack of values
.El
.Ss Stacks
Note that the "unshift", "push", and "store", intrinsics all return the
affected stack instead of their second arguments.
.Pp
The "append" intrinsic appends one or more stacks into a single stack.  The
function creates a new stack and fills it will all the members of all its
arguments, in order.  The "substack" intrinsic returns a contiguous subset
of the elements of a stack, as a new stack.  The first must evaluate to a
stack, while the second and third arguments must evaluate to numbers
specifying the range of indices to be included in the substack.
.Bl -column -offset left "sort_numbers" "(substack stack expr expr)" "item at index expr"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li stack   Ta (stack)                    Ta new stack
.It Li shift   Ta (shift stack)              Ta item at index 0
.It Li unshift Ta (unshift stack expr)       Ta stack
.It Li push    Ta (push stack expr)          Ta stack
.It Li pop     Ta (pop stack)                Ta item at topidx
.It Li assign  Ta (assign stack ...)         Ta stack
.It Li append  Ta (append stack ...)         Ta new stack
.It Li substack Ta (substack stack expr expr) Ta new stack
.It Li index   Ta (index stack expr)         Ta item at index expr
.It Li store   Ta (store stack fixnum expr)  Ta stack
.It Li clear   Ta (clear stack)              Ta stack
.It Li used    Ta (used stack)               Ta stored item count
.It Li sort_numbers Ta (sort_numbers stack)  Ta stack (sorted in situ)
.It Li sort_strings Ta (sort_strings stack)  Ta stack (sorted in situ)
.It Li topidx       Ta (topidx stack)        Ta index of top item
.It Li stackp       Ta (stackp expr)         Ta 0 or 1
.El
.Ss Records
.Bl -column -offset left "Intrinsic" "(setfield expr1 expr2 expr3)" "new record of size n"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li record   Ta (record n)                   Ta new record of size n
.It Li setfield Ta (setfield expr1 expr2 expr3) Ta expr3
.It Li getfield Ta (getfield expr1 expr2)       Ta item in pos expr2
.El
.Ss Fixnums
Each of these functions accept only TWO arguments, unlike their Munger
counterparts.
.Bl -column -offset left "Intrinsic" "(>= expr1 expr2)" "absolute value"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li eq        Ta (eq expr1 expr2)  Ta 0 or 1
.It Li <         Ta (< expr1 expr2)   Ta 0 or 1
.It Li <=        Ta (<= expr1 expr2)  Ta 0 or 1
.It Li >         Ta (> expr1 expr2)   Ta 0 or 1
.It Li >=        Ta (>= expr1 expr2)  Ta 0 or 1
.It Li +         Ta (+ expr1 expr2)   Ta sum
.It Li -         Ta (- expr1 expr2)   Ta difference
.It Li *         Ta (* expr1 expr2)   Ta product
.It Li %         Ta (% expr1 expr2)   Ta remainder
.It Li /         Ta (/ expr1 expr2)   Ta quotient
.It Li abs       Ta (abs expr)        Ta absolute value
.El
.Pp
Note that "stringify" accepts only one argument, which must evalute to a
fixnum.
.Bl -column -offset left "stringify" "(stringify expr)" "string representation of expr"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li stringify    Ta (stringify expr)  Ta string representation of expr
.It Li numberp      Ta (numberp expr)    Ta 0 or 1
.It Li char         Ta (char expr)       Ta one-character string
.El
.Pp
.Ss I/O
Theses are the general I/O functions.  Note that both "getline" and
"reachars" return 0 upon encountering EOF, and the empty string on error.
"flush" does what "flush_stdout" does in Munger.
.Bl -column -offset left "display_error" "(display_error expr)" "value of last expr"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li print         Ta (print expr ...)        Ta 1
.It Li println       Ta (println expr ...)      Ta 1
.It Li flush         Ta (flush)                 Ta fixnum
.It Li die           Ta (die ...)               Ta does not return
.It Li warn          Ta (warn expr ...)         Ta 1
.It Li getline       Ta (getline)               Ta string or 0
.It Li readchars     Ta (readchars expr)        Ta string or 0
.El
.Pp
These are the intrinsics redirecting the standard descriptors onto files
and processes.  These functions return 1 upon success, or a string describing
an error condition.
.Bl -column -offset left "with_output_file_appending" "(with_output_file_appending file expr ...)"
.It Sy Intrinsic Ta Sy Use
.It Li pipe                       Ta (pipe desc program)
.It Li with_input_process         Ta (with_input_process program expr ...)
.It Li with_output_process        Ta (with_output_process program expr ...)
.It Li redirect                   Ta (redirect desc file appending)
.It Li with_input_file            Ta (with_input_file file expr ...)
.It Li with_output_file           Ta (with_output_file file expr ...)
.It Li with_output_file_appending Ta (with_output_file_appending file expr ...)
.It Li resume                     Ta (resume desc)
.El
.Pp
.Ss System-Related
"random" returns a fixnum in the range of 0 to one less than its argument.
"setenv" returns the value of the setenv system call, therefore 0 indicates
success.  The "time" intrinsic returns a string represention of the UNIX
time value, padding with leading zeros to become sixteen-character strings,
so they may be compared with each other using "strcmp".  The "stat"
intrinsic returns a five element stack, containing all strings:  owner name
or uid, group name or uid, time of last access, time of last modification,
and size, with the time values formatted similary to those returned by
"time".  The "date" intrinsic returns a textual representation of the
current date and time.
.Bl -column -offset left "directory" "(getenv str str)" "stack of filenames"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li basename      Ta (basename path)      Ta string
.It Li dirname       Ta (dirname path)       Ta string
.It Li directory     Ta (directory expr)     Ta stack of filenames
.It Li symlink       Ta (symlink from to)    Ta 0 or error string
.It Li rename        Ta (rename from to)     Ta 0 or error string
.It Li remove        Ta (remove expr)        Ta 0 or error string
.It Li rmdir         Ta (rmdir expr)         Ta 0 or error string
.It Li stat          Ta (stat expr)          Ta stack or error string
.It Li setenv        Ta (setenv str str)     Ta fixnum
.It Li getenv        Ta (getenv string)      Ta string or 0
.It Li system        Ta (system string)      Ta 0 or error code
.It Li exec          Ta (exec expr)          Ta does not return
.It Li fork          Ta (fork)               Ta same as fork(2)
.It Li random        Ta (random expr)        Ta fixnum
.It Li time          Ta (time)               Ta string
.It Li date          Ta (date)               Ta string
.El
.Ss Command-Line Args
These function identically to their Munger counterparts.
.Bl -column -offset left "previous" "(previous)" "0 or string"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li next      Ta (next)      Ta 0 or string
.It Li previous  Ta (previous)  Ta 0 or string
.It Li current   Ta (current)   Ta string
.It Li rewind    Ta (rewind)    Ta string
.El
.Ss Strings
The ability of the "split" intrinsic in Munger to explode a string
into a list of one-character strings, is not present in the MM "split".
The "explode" intrinsic does this.
.Bl -column -offset left "expand_tabs" "(substring string expr1 expr2)" "stack of strings"
.It Sy Intrinsic Ta Sy Use Ta Sy Return Value
.It Li chop      Ta (chop expr)                    Ta string
.It Li chomp     Ta (chomp expr)                   Ta string
.It Li length    Ta (length expr)                  Ta fixnum
.It Li digitize  Ta (digitize expr)                Ta fixnum
.It Li code      Ta (code expr)                    Ta fixnum
.It Li explode   Ta (explode expr)                 Ta stack of strings
.It Li stringp   Ta (stringp expr)                 Ta 0 or 1
.It Li join      Ta (join delim expr ...)          Ta string
.It Li split     Ta (split delims string [limit])  Ta stack of strings
.It Li concat    Ta (concat expr1 expr2 ...)       Ta string
.It Li substring Ta (substring string expr1 expr2) Ta string
.It Li strcmp    Ta (strcmp expr1 expr2)           Ta fixnum
.It Li expand_tabs Ta (expand_tabs expr1 string)   Ta string
.El
.Ss SQLite
These functions provide the interface to the SQLite library.  Only one
database file may be open at any one time.  The database handle is managed
internally by the runtime engine.  Column data is returned as a stack of
strings.
.Bl -column -offset left "sqlite_finalize" "(sqlite_bind expr expr expr)" "sql object or string"
.It Sy intrinsic       Ta Sy Use                       Ta Sy Return Value
.It Li sqlite_open     Ta (sqlite_open expr)           Ta error string or 1
.It Li sqlite_close    Ta (sqlite_close)               Ta 0 or 1
.It Li sqlite_exec     Ta (sqlite_exec expr)           Ta stack or string
.It Li sqlite_prepare  Ta (sqlite_prepare expr)        Ta sql object or string
.It Li sqlp            Ta (sqlp expr)                  Ta 0 or 1
.It Li sqlite_bind     Ta (sqlite_bind expr expr expr) Ta 0, 1 or string
.It Li sqlite_step     Ta (sqlite_step expr)           Ta 1 or string
.It Li sqlite_row      Ta (sqlite_row expr)            Ta stack or string
.It Li sqlite_reset    Ta (sqlite_reset expr)          Ta 1 or string
.It Li sqlite_finalize Ta (sqlite_finalize expr)       Ta 1 or string
.El
.Sh AUTHORS
.An James Bailie Aq jimmy@mammothcheese.ca
.br
http://www.mammothcheese.ca
