Collaborative articulation of how abstraction and language is employed in the computational manifestation of numbers -- including analysis of the role of syntax, semantics, and meaning in the specification and use of software interfaces.

[update 2008-05-06T01:23Z: I corrected a small typo before linking to this post from another one in this series on Miser notions. I also included a reference to the work of Church where he introduces f(x, y) as equivalent to (f x) y.]

update 2008-04-12T17:49Z: I am still niggling with this post. There will be a separate post on some of these ideas, but I wanted to add something about being committed to treating functions as definite mathematical objects and viewing procedures as algorithms for (some of) them, even though we don't have a way to uniquely identify and distinguish functions (or procedures, for that matter). I think I'm content with this for now.

update 2008-04-12T02:45Z: After completing this post, I realized that all expressions that stand for functions are ones that characterize functions, but none of them "are" the function. I have revamped the post, today, to emphasize the introduction of functions by characterizing them. This makes some of the prose very awkward, especially if you just wanted to learn Frugal and use a Miser engine to produce something useful. It seems like a necessary foundational exercise for me, to ensure that I have successfully wrestled the relevant concepts to the ground. I took this occasion to do some tweaking to make some other points more clear, although adding the fuss about functions only being known by their characterization is probably a net loss. -- dh.]

On occasion, I mention that Miser is an applicative-programming system. There is a connection with the notion of functional programming. I want to clarify that terminology and, especially, illustrate the impact it has on notations for programming in applicative style.

The key idea of applicative systems is that the fundamental operation is one of application of functions and their arguments or operands. This also involves the notion that functions are definite (mathematical) objects.

Consider, for example, the familiar notation f(x) from mathematics. In mathematics, this notation tends to be used

when the function is being used(that is, mentioned) in some larger expression and

when it is being describedor characterized in terms of some essential relationships:

(1) ( f(x_{2}) - f(x_{1}) ) / (x_{2} - x_{1})

(2) f(i) = f(i-1) + f(i-2); i > 1

f(1) = 1; f(0) = 0

illustrating the respective cases. In (2), f(n) is known as the n-th Fibonacci Number; we will allow ourselves to say that function f determines the Fibonacci Numbers.

In logic and mathematics, the mixture of usages to define/represent/characterize a function and to appeal to a function is commonplace. It is generally rare to consider f as identifying the function itself, as if functions have an independent existence, with f(x) as separately identifying an application of it.

In modern mathematics (i.e., that of the last 100 years or so), it has become useful to consider functions more directly. It happens

when defining the type of the function and

when characterizing the function in the form of an independent entity:

(3) g = Y λfλi.(ifi > 1 thenf(i-1) + f(i-2) else 1)

The form (1) asserts that the function f has its domain of arguments in the natural numbers, N (according to the usual convention), and its range of results (→) is also in the natural numbers, N.

It is very easy to think of functions as depicting operations that produce something, thanks to the prevalence of computation in modern life, in teaching, and in our learning arithmetic (computation) before algebra and higher mathematics. Programming languages use "function" in their codes and their descriptions in a way that adds to this collapsing-together of functions, algorithms, procedures, and program code.

A more abstract way to think of individual functions is as specific relationships or correspondences of each distinct member of the domain with at most one member of the range. An element of the range may stand in the functional relationship to multiple members of the domain, but never vice versa. It is also possible that a function might not be defined at all, have no corresponding member of the range, for a given member of the domain.

Although this sense of functions as correspondences is perhaps more contemporary, our ordinary way of speaking is in terms of actions that determine results from given arguments. It is easy to addpresume causality, speaking as if functions produce members of the range from members of the domain. The notations used in computer programming encourage this confusion of relationship and action or procedure. I notice that I tend to speak of the member of the range being determined by the function together with the selection of a member of the domain.

Computer folk can also think of this correspondence/relation characterization of functions as being realized by a database table with unique keys, each row having a field for the range value that is determined for the domain value represented by each key. This works in the case of finite domains and ranges, but we need something else when there is no such limitation (as is often the mathematical case).

In all of the forms that we've shown, the function is not being exhibited, it is being characterized. There are many characterizations of the same function and many different ways to characterize functions (as we are seeing here). There is no way to see or get your hands on "the function" because the concept is, in effect, abstracted from the relationship that it embodies (abstractly speaking). Nevertheless, we seem to be able to grasp what many functional relationships are from succinct characterizations of them. We rely on functional relationships in day-to-day use of mathematical entities (such as numbers) without having to think about all of this very much.

I'm bringing up this cloudy subject now because digging into the foundations of computing require that we have a sharper notion about functions and what it takes to trace the relationships from domain to range by computation.

The form 1.1(2) involves a λ-expression (pronounced lambda-expression). This particular one, interpreted as the characterization of a function, happens to refer to the function that the expression is intended to characterize (singled out on the left) in the characterizing expression itself (on the right). The given characterization of f is recursive.

The form 1.1(3) specifies the same function in a way that has the characterization be completely accomplished in the right-hand side expression, with no appeal to the name of the function on the right-hand side of the equation. The special Y-operator is a function which transforms other functions into ones that operate recursively. In this case,

illustrating that applications of functions having functions as operands can transform functions to other functions (as characterized).

The λ-calculus is important in Frugalese and for Miser. It is used here to emphasize that there are ways to characterize functions via self-contained λ-expressions. These λ-expressions can appear anywhere in the left or right of application operations, whether as functions, as in

(Y λfλi.(ifi > 1 thenf(i-1) + f(i-2) else 1)) n

or as operands, as

λfλi.(ifi > 1 thenf(i-1) + f(i-2) else 1)

is in

Y λfλi.(ifi > 1 thenf(i-1) + f(i-2) else 1).

The use of "=" with regard to formulae that characterize functions is a bit different than the "=" where we tend to think that a unique determination is happening. To remove any confusion between uniquelly-expressible mathematical entities (such as natural numbers) and the more-ellusive functions, we could have used "≡" or "∽" or some other indicator of equivalence. We won't be so fussy. So how do we determine that two characterizations that are not identical are still for the same function?

The usual approach is some version of the following procedure.

If it can be shown that f(x) = g(y) whenever x = y, then f = g, and vice versa. (We are begging some questions here, but that is the gist of it.)

In applicative systems functions are treated as definite mathematical entities that are equally usable as operands (something a function can be applied to) and as functions (something that can be applied to an operand).

characterizes a function that is very nasty to compute (using the formula as a rule of computation) for increasing values of its natural-number operand.

By having functions be first-class candidates for use as operands and as functions, we have great freedom in characterizing functions, including in terms of application of functions in the establishment of other functions.

Although this seems pretty esoteric, it is useful to keep in mind that programming languages and compilers are literally embodiments of functions that derive (procedure codes for algorithms for) functions from their data (the source-program texts). There is a deep connection with some of the fundamental power of digital computers, and Miser is one way to explore that.

Frugalese is a language for expressing computations carried out by applicative systems such as Miser. Miser is a computational system, not a mathematical one. There is a mathematical connection, and that motivates our giving so much attention to the business of characterizing functions and what that does and does not accomplish.

For Miser and Frugalese, the fundamental operation is called "apply." Apply has two operands. Both operands are Obs. The first Ob is interpreted as a coding for the procedure to be followed. The second Ob is the data that the procedure will manipulate in some way.

We need to be clear, here, about what we mean by computation of a function. We mean that in the usual sense. I have here an operand, the representation of of a (definite) member of the function's domain, and I want to know the (definite) member of the function's range that the function determines, what we commonly refer to as "the result" of the function. An algorithm for a function, if we know one, is a (computational) procedure for arriving at the function-determined result given any valid operand. A coding of the procedure is an expression as data in a form that a computer can follow.

What puts the "functional programming" and "applicative-programming" into Miser and Frugalese is that one kind of computation of a function gives rise to the code for a procedure which can itself be used in the computation of another function, and so on. This doesn't change the computational idea, it just includes some very powerful and important cases at the heart of the power of digital computers.

Because of the connection between functions and procedures for their calculation, we re-use the mathematical notions as computational ones:

The Frugalese notation

pa

signifies an apply operation, where

p determines the Ob used as the first operand (the procedure code) and

a determines the Ob taken as the apply's second operand (the data to the procedure).

Apply operations are the Miser embodiment of the stored program principle:

Every Miser Ob can be used as either first or second operand of an apply.

Programs can treat data as programs and programs as data.

Programs can produce data results that works as programs.

The sense in which an Ob is being used as program or data is entirely determined by context. The same Ob may be used one way or the other at any instant.

Note that the apply operation is signified by simple juxtaposition of the term signifying the procedure and the term signifying the data.

What justifies our appropriating the mathematical notion of functional application as a notation for computation?

It is this connection.

If we can show that (Frugalese) (p a) = (q b) whenever a = b (a and b are the same Obs), then Obs p and q code (not necessarily the same) procedures for the same function: the same function is computed.

We cannot go so far as to say that p = q, because the codes can be different. Because we are here using "=" for identity of Obs, we would need a different symbol to signify that two obs taken as procedures are procedures for the same function. There is no such symbol in Frugalese, however. One motivation of the Miser Project is to sharpen our understanding of why there cannot be such a determination by computational means.

The vice versa case always applies in Miser and Frugalese, of course. If (as Obs), p = q, then (pa) = (pb) whenever (as Obs) a = b (and a result is determined by whatever p is, taken as code of a Miser procedure).

The Miser apply operation is also called ob.ap. It has been sketched a little in earlier posts.

From now on, we are illustrating Frugalese, not mathematical notation.

Parentheses are used when it is necessary to control grouping into the operands for an apply operation:

pa

p(a)

(p) a

(p)(a)

all work.

In general,

pqra

and p( q( r(a) ) )

express the same applicative operations. We will often use variations of the streamlined form, pqra.

These are not equivalent to pqra:

(pqr) a

(pq)(ra)

[Historical Note: Some applicative systems have pqra be equivalent to ((pq) r) a. That is not the case for Frugalese. If you want that form, you will have to write it that way (or use the alternative form introduced below). If you run into one of these systems, you will have to be careful when transliterating between that notation, Frugalese, and vice versa.]

In Frugalese there are no additional operands in a single apply operation. In Miser terms, there's one Ob for the procedure and one Ob for the data. That's it. One way to introduce multiple-operands of functions is by multiple apply operations:

p(q, r, a) is equivalent to ((pq) r) a

keeping in mind that however p is constructed, it has to be such that we end up with what we have in mind for p(q, r, a).

In general,

f(x_{1}, x_{2}, ..., x_{n})

= f(x_{1}) (x_{2}, ..., x_{n})

= f(x_{1}, x_{2}) (..., x_{n})

= f(x_{1}, x_{2}, ...) (x_{n})

= (... (fx_{1}) x_{2}) ...) x_{n}.

This particular device was introduced by Alonzo Church, the inventor of the λ-calculus [1]. The basic idea is as follows.

In an applicative expression such as

plus(1, 2)

the 1 and 2 are not two operands of the function plus, the applicative interpretation is

plus(1) 2

where plus(1) is a new procedure that is then applied to to 2 to determine the result.

If we're very serious about wanting to have a procedure that works on several elements of data at once, rather than deliver intermediate procedures, we can bundle several operand values into a list which is then used as a single data operand in the apply operation:

sum[1, 2, 3]

has the single operand [1, 2, 3], which is a list of the individual operand values 1, 2, and 3.

The choice between an applicative evaluation such as

plus(plus(1, 2), 3)

and sum[1, 2, 3]

will depend on the context and whether it is important to single out plus(plus(1,2)), say, as an important intermediate function. (This is a trivial case meant to illustrate a broader useful practice.)

In the previous section, there is an additional shorthand that needs to be explained.

pqra

is p( q( r(a) ) )

but pq(r) a

is p((qr) a) or pq(r, a)

and not p( q( (r)(a))).

The rule is that bracketed elements are data of the procedure immediately to the left, if any. This is worked out in left-to-right order. For example,

sp(q)(r) a

= s( ((pq) r) a )

= sp(q, r, a)

= sp(q, r) a

among others. The choice of particular form depends on the circumstances, the need for parentheses to control essential grouping, and how one might want the features of the applicative expression to stand out.

For Miser, all of these variations are simply syntactic sugar, a sweetening of expression sequence and bracketed groupings that always boils down to a particular combination of apply operations. The many different ways to group in applicative expressions is a way to express some sense of what the purpose of the expression is. At some point, there are variants of the notation that I find myself using quite naturally and in preference to conventional functional notation.

The grouping rules are straightforward but it is easy to make mistakes with them. Small differences can have a big impact. For example, it is inappropriate to rewrite

sp(q, r) a

as s (p)(q, r) a

which is the same as s(p, q, r) a instead of s(p(q, r) a).

So long as everything is an Ob in the use of Frugalese notation for Miser, there is very little to help detect unintended combinations. The beauty of Miser is that everything is an Ob and Obs are interchangeable as data and as procedure (codes). The difficulty of Miser is that there is no such thing as an inadmissible apply combination.

This also means that there is no perfect way to pretty-print any Miser applicative expression, although using a pretty-printer transformation into some sort of standard form might reveal an unintended consequence. That's something worth looking at.

There are more notational features in Frugalese. The focus here has been on the different ways that apply operations can be expressed for carrying out in a Miser implementation. The purpose, beside taking a try at explaining it, is to equip ourselves to start putting the notation to good use.

Here Church introduces λ-expressions as part of exploring a bigger question with regard to the unavailability of effective procedures for an infinitely large set of problems. The original formulation of Church's Thesis is a footnote in this paper. Church's tendency was to bracket formulae for functions in {...} and for operands in (...), stating: "A formula {F}(X) may be abbreviated as F(X) in any case where F is or is represented by a single symbol. A formula {{F}(X)}(Y) may be abbreviated as {F}(X, Y), or, if F is represented by a single symbol, as F(X, Y). And ... ." Here Church is careful to use F, X, Y, etc., as variables for formulas that may be substituted in the given forms.