Collaborative articulation of how abstraction and language is employed in the computational manifestation of numbers -- including analysis of the role of syntax, semantics, and meaning in the specification and use of software interfaces.

Find a decent way to have Windows Live Writer produce pretty-printed code that appears cleanly on both blog pages and in my RSS feeds.

Refine my definition of a function that has a critical role in the Miser Project.

Object to a strange set of commandments about the size and other characteristics of "functions," exploring how to make them work in functional programming while recognizing the compromises involved.

How Big Should a Function Be? When I first saw the commandment to describe computational functions in around 5 lines, I thought it was a joke. With little consideration for the language or the situation, I found the injunction about keeping function descriptions small (as opposed to functional) simply bizarre. Yet it is from a very smart fellow [1], and I was stopped in my tracks. Although Martin concedes that this is situational, there is no guidance about that. I worry that budding cybersmiths will take this as gospel and a commandment not to be broken.

The echo chamber in the comments, and the dismissal of those who objected to context-free generalizations is surprising. That supports my concern. There's too much fussiness over indentation, nesting of conditionals and other layout considerations. This seemed to flaunt the principled sophistication of guidelines that I favor, such as the C++ Coding Standards of Sutter and Alexandrescu [2].

Breaking the Rules on Principle

Here's the test case that came to mind as soon as I read Martin's article.

The programming notation is Frugalese, an informal, unimplemented language that I use to illustrate functional-programming concepts. That's unimportant. It is important that it is neither Java nor C# and it is completely functional. Frugalese is an applicative notation for computation: the fundamental operation consists of carrying out a procedure described by one operand against the data represented by the second.

The operation ob.ap[f, x] defines the basic interpretive process of oMiser[3]. The operand f is taken as a program and the operand x is taken as data which the program is to process. In oMiser these are both data structures called obs. The Frugalese example is an applicative expression of the oMiser interpreter's basic applicative operation. (That is, we are using an applicative-language notation to describe the implementation of an applicative-machine interpreter. There's no harm in that and there is no chicken-and-egg problem: the oMiser implementation used by oFrugal is not implemented in oFrugal or any higher-level Frugalese language. But ob.ap, below, is a faithful simulation, capturing the essential behavior and functionality.)

I choose this example because I am in the process of rethinking its formulation and how to present it in a direct way. It struck me that this definition illustrates the difficulties and trade-offs of arbitrarily choosing smaller function (procedure) definitions over larger ones.

In this example, there is only the one function, ob.ap, that is defined for use from elsewhere. There are 23 lines, not counting blank lines introduced for layout purposes:

1: defrec/* SELF-CONTAINED SINGLE FUNCTION DEFINITION */ 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */ 4: 5: = if is-individual(f) /* Atomic (single-instruction) operation */ 6: 7: theniff = ob_Athen ob.a x /* a-part of x */ 8: 9: eliff = ob_Bthen ob.b x /* b-part of x */ 10: 11: eliff = ob_Ethen ob.e x /* enquote x */ 12: 13: else ob.c(f, x) /* prefix f to x, etc. */ 14: 15: elif is-singleton(f) then ob.a f /* quoted constant */ 16: 17: else eval f /* Evaluate applicative form f */ 18: 19: where rec eval p 20: 21: = if is-singleton(p) /* argument reference */ 22: 23: then ifp = ob_ARGthenx 24: 25: elifp = ob_SELFthenf 26: 27: else ob.a p /* or (quoted) literal */ 28: 29: elseletop = ob.a p, 30: 31: arg = ob.b p 32: 33: inifop = ob_C /* pairing form */ 34: then ob.c( eval ob.a arg, 35: eval ob.b arg ) 36: 37: elifop = ob_D /* comparison form */ 38: then ob.d( eval ob.a arg, 39: eval ob.b arg ) 40: /* applicative form */ 41: else ob.ap[ eval op, eval arg ];

The Principle Function

There are at least two functions defined above. The main one is ob.ap[f, x]. This definition is self-referential (signified by defrec), where the defined function is referenced within its definition at line 41. The operation of evaluating this function can lead to multiple recursive evaluations using the same function. The essence of the function is this simple structure:

1: defrec 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */ 4: 5: = if is-individual(f) 6: 7: then /* do the simple cases for f being simple */

17: else eval f /* handle the case of f being a composed form */

But Wait, There's More

The function for the composed-form evaluation, eval(f), is defined in the following lines. Here's the key definition:

1: defrec 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */

...

19: where rec eval p /* Evaluate applicative form p */ 20: 21: = if is-singleton(p) /* argument reference */ 22: 23: then ifp = ob_ARGthenx 24: 25: elifp = ob_SELFthenf 26: 27: else ob.a p /* or (quoted) literal */ 28: 29: else /* cases for composite p */

41: ... ob.ap[ eval op, eval arg ];

This function is being defined local to the fragment

17: else eval f /* handle the case of f being a composed form */

That is, the function is only usable (and only defined for use) in conjunction with the expression eval f.

The where introduces a local definition and it introduces it right there. This establishes that this is the only place that definition is required. This makes the dependency exactly clear in both directions: ob.ap(f, x) only requires eval in that one place and eval is only used in one way (other than the ways that eval is recursive).

Equally important is the fact that eval depends on the parameters of ob.ap[f, x]. This is at lines 23 and 25. This is another isolated dependency. It is global to eval yet local to the instance of ob.ap[f, x] in which line 17 is reached. In effect, the definition of eval(p) occurs dynamically in each instance of ob.ap[f, x] evaluation. That instance-specific definition is reused for all recursions of the defined eval.

These kinds of "inner function" have been around since the introductions of the LISP and ALGOL 60 programming languages.

This use of the inner function, a recursive one at that, is very economical. It may not the clearest, but it emphasizes locality and cohesion that are important for this particular algorithm.

Breaking Up Is Maybe Not So Hard To Do?

It is possible to break out the subfunctions used in defining ob.ap[f, x]. These considerations must be weighed:

The contracted interface and definition is that of ob.ap[f, x]. Everything else is incidental and the definition can be replaced by anything that preserves the required behavior. In one sense, this makes refactoring of the interior of the definition permissible. But,

The exposure of additional interfaces creates a separation of concerns that is not required by the definition of ob.ap[f, x]. When a subfunction is no longer local, there is the problem of the maintenance of the dependency with ob.ap[f, x].

The extraction of a subfunction requires expansion of variability beyond that required by locally-incorporated subfunctions. This makes the extracted subfunction more complex than the embedded one.

To see what's involved, pull the definition of eval p outside of the definition of ob.ap[f, x]:

1: defrec 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */ 4: 5: = if is-individual(f) /* Atomic (single-instruction) operation */ 6: 7: theniff = ob_Athen ob.a x /* a-part of x */ 8: 9: eliff = ob_Bthen ob.b x /* b-part of x */ 10: 11: eliff = ob_Ethen ob.e x /* enquote x */ 12: 13: else ob.c(f, x) /* prefix f to x, etc. */ 14: 15: elif is-singleton(f) then ob.a f /* quoted constant */ 16: 17: else ev[f,x] f, /* Evaluate applicative form f */ 18: 19: ev[f,x] p 20: 21: = if is-singleton(p) /* argument reference */ 22: 23: then ifp = ob_ARGthenx 24: 25: elifp = ob_SELFthenf 26: 27: else ob.a p /* or (quoted) literal */ 28: 29: else... /* UH OH */ 30:

We have succeeded in pulling out ev[f, x] p in place of the internal eval p definition.

The additional parameters, [f, x], are required because we are no longer in any context where those variables are visible globally. In effect, it is necessary to replace eval everywhere in the definition of eval p with ev[f, x].

I don't want to write a redundant ev[f, x] for every eval in the original version. There is not meant to be any variability here, so I choose another approach. This involves defining a third function. There are many other ways to introduce that third function. My choice takes advantage of the fact that a let... in form is equivalent to introduction of a function definition and application:

1: defrec/* INTER-DEPENDENT MULTIPLE-FUNCTION DEFINITION */ 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */ 4: 5: = if is-individual(f) /* Atomic (single-instruction) operation */ 6: 7: theniff = ob_Athen ob.a x /* a-part of x */ 8: 9: eliff = ob_Bthen ob.b x /* b-part of x */ 10: 11: eliff = ob_Ethen ob.e x /* enquote x */ 12: 13: else ob.c(f, x) /* prefix f to x, etc. */ 14: 15: elif is-singleton(f) then ob.a f /* quoted constant */ 16: 17: else ev[f,x] f, /* Evaluate applicative form f */ 18: 19: ev[f,x] p 20: 21: = if is-singleton(p) /* argument reference */ 22: 23: then ifp = ob_ARGthenx 24: 25: elifp = ob_SELFthenf 26: 27: else ob.a p /* or (quoted) literal */ 28: 29: else evform(ev[f,x], ob.a p, ob.b p), 30: 31: evform(eval, op, arg) 32: 33: = ifop = ob_C /* pairing form */ 34: then ob.c( eval ob.a arg, 35: eval ob.b arg ) 36: 37: elifop = ob_D /* comparison form */ 38: then ob.d( eval ob.a arg, 39: eval ob.b arg ) 40: /* applicative form */ 41: else ob.ap[ eval op, eval arg ];

This refactoring works easily because ev[f, x] is a legitimate intermediate result that can be defined and used as an individual operand.

The use of evform(ev[f, x], ob.a p, ob.b p) in the definition of ev[f, x] p is very much like saying

29: else let eval = ev[f,x], 30: op = ob.a p, 31: arg = ob.b p 32: 33: in ifop = ob_C /* pairing form */ 34: then ob.c( eval ob.a arg, 35: eval ob.b arg ) 36: 37: elifop = ob_D /* comparison form */ 38: then ob.d( eval ob.a arg, 39: eval ob.b arg ) 40: /* applicative form */ 41: else ob.ap[ eval op, eval arg ];

This second form preserves locality at the cost of additional indentation and coupling directly into the superior function. At the same time, it avoids surfacing a function that is only needed in one place.

Confessions and Compromise

It is intriguing that the form with definitions of three functions satisfies Martin's requirements[1].

Confession: When I explain the operation of ob.ap[f, x], I do so using auxiliary function definitions[3]. I use more of them than the three factored out here. One important reason for additional functions is the prospect that some of them are extended as more features are added to the system.

Compromise: When I want to control the coupling among functions, it is important to have an explicit way to reflect that. Locality of auxiliary function definitions allows that. My compromise version would preserve the three functions for explanatory purposes in this locality-controlling form:

1: defrec/* SELF-CONTAINED WITH AUXILARY FUNCTION DEFINITIONS */ 2: 3: ob.ap[f, x] /* Apply the procedure defined by f to operand x */ 4: 5: = if is-individual(f) /* Atomic (single-instruction) operation */ 6: 7: theniff = ob_Athen ob.a x /* a-part of x */ 8: 9: eliff = ob_Bthen ob.b x /* b-part of x */ 10: 11: eliff = ob_Ethen ob.e x /* enquote x */ 12: 13: else ob.c(f, x) /* prefix f to x, etc. */ 14: 15: elif is-singleton(f) then ob.a f /* quoted constant */ 16: 17: else ev[f,x] f, /* Evaluate applicative form f */ 18: 19: where ev[f,x] p 20: 21: = if is-singleton(p) /* argument reference */ 22: 23: then ifp = ob_ARGthenx 24: 25: elifp = ob_SELFthenf 26: 27: else ob.a p /* or (quoted) literal */ 28: 29: else evform(ev[f,x], ob.a p, ob.b p), 30: 31: where evform(eval, op, arg) 32: 33: = ifop = ob_C /* pairing form */ 34: then ob.c( eval ob.a arg, 35: eval ob.b arg ) 36: 37: elifop = ob_D /* comparison form */ 38: then ob.d( eval ob.a arg, 39: eval ob.b arg ) 40: /* applicative form */ 41: else ob.ap[ eval op, eval arg ];

The use of where at the "top level" is a bit like saying private in the method declarations for a class. But we do not need to introduce classes, saving us from having to consider the interactions between access specifiers and inheritance.

More confession: I just made up this additional use of where as an "access" specifier similar to use of rec to impose self-reference in definitions. I invented it on the spot. (The internal form of where clause has been known at least since the work of Burge and Landin[4, 5] and was used by Strachey) I introduced this specifier flavor because I am jealous of the improved readability of the interdependent multi-function definition form.

I will pay a penalty in the complication of the Frugalese grammar and in explanation of how this use of where impacts the definitions in a definition list where it occurs.

This device is not going to assure that Martin's injunction can or should be satisfied in all cases[1]. I'm pleased that I can take advantage of that style without surrendering localization of interdependencies among function definitions. I'm going to use this Frugalese feature much more before I am satisfied that it carries the day well enough.

In testing Martin's guidelines against a specific example of my own, I found a way where following the rules slavishly would surrender other qualities of functional definitions. Since I have the privilege of defining the Frugalese notation, I am able to adopt a compromise approach that keeps auxiliary functions auxiliary while flattening the layout and the presentation. It remains to be determined whether this addition is a significant improvement overall.

"Functions should be small. Very small. A well written function might be 4, 5, perhaps 8 lines long; but no longer. ... Specifically, functions that process information as part of an algorithm should be very small." The ob.ap[f, x] definition is definitely that of an algorithm that processes information, so it is not at all clear to me how that fact correlates with having to be small, to have a brief description. Even though I manage to break out the subfunctions, I could not be satisfied without also enforcing their auxiliary nature. "High level functions should be followed by lower level functions that should be followed by still lower level functions. The calls should all point downwards in the source file if possible." This is certainly satisfied in the definition of ob.ap[f, x] in all of the variations shown. "Of course these rules cannot be followed 100% of the time, but they are good rules nonetheless. Our goal, as programmers, is to write programs that humans can understand and maintain." is a reasonable statement, but for its sequel that seems to suggest there is an obvious boundary between "good" and "bad": "Remember, a product that works, but that has a bad internal structure is a bad product." I suppose my view can be summarized as a willingness to support such practices along with a reluctance to enforce them.

The principle that struck me in this context was "0. Don't sweat the small stuff." Some others that apply to this example are "5. Give one entity one cohesive responsibility," "10. Minimize global and shared data," "18. Declare variables as locally as possible," and "20. Avoid long functions. Avoid deep nesting" (with statement of the principle involved). You can decide how well "6. Correctness, simplicity, and clarity come first."

[3] Dennis E. Hamilton: oMiser Sketch [latest], Miser Theory Notes page N020600f 0.05 2007-05-03.

I am in the process of shrinking and correcting much of this material, my motivation for finding a succinct statement of ob.ap(f, x).

[4] William H. Burge: Recursive Programming Techniques. Addison-Wesley (Reading, MA: 1975). ISBN 0-201-14450-8.

I rely on this approach to functional programming for many idioms and test cases for Frugalese. The structure of Frugalese is different from ISWIM in several respects, although it is generally easy to translate between the notations.

The refactoring performed between my variations of ob.ap[f, x] definition is based on a few of the rewriting cases described by Peter Landin in this paper. Although I also use indentation heavily when writing Frugalese, that is not essential to the language (so far).