2025 Jun 3

> Space-grouping expressions

While listening to an episode of Software Unscripted I was reminded of a little language syntax I had hashed out a while ago, which I thought might be nice to write up here. Specifically, at around 1:55:00, they were discussing some syntax that looks like x |> y.z() and whether it should be parsed as (x |> y).z() or x |> (y.z()), where the latter looks correct, but the former is what is intended. The problem here, is that we visually group things based on the amount of whitespace, and since y.z() has no whitespace, that is grouped tighter than x |> y. This is a headache in their case, but it reminded me of a little experiment I did a bit ago based on a question: Is this actually a problem, or could we use this to our advantage?

Whitespace-sensitivity

Some people love whitespace sensitive languages like python, others detest it. I haven't met anyone yet that dislikes using indentation as a visual indicator of grouping things, and in fact, even in languages that don't require it, it is generally bad style to write all source code without any indentation whatsoever.

For instance,

int main() {
int x = 5;
int y = 6;
for (i=0; i<x; i++) {
for (j=0; j<y; j++) {
printf("%d,%d\n", i, j);
}
}
}

is valid C code, but is generally considered poor style. Rather we'd expect some indentation of blocks to indicate nesting, eg

int main() {
  int x = 5;
  int y = 6;
  for (i=0; i<x; i++) {
    for (j=0; j<y; j++) {
      printf("%d,%d\n", i, j);
    }
  }
}

There's a lot of pixel spilled on what is "good style" but that often touches on spacing to indicate grouping. Languages like python go a bit further in their embrace of spacing, and raise it from simply a style choice to the syntactical rules of the language, so in python, the above example would look more like

def main():
  x = 5
  y = 6
  for i in range(x):
    for j in range(y):
      println(f"{i},{j})

and the indentation is not optional, something like

def main():
x = 5
y = 6
for i in range(x):
for j in range(y):
println(f"{i},{j})

is not correct, and may not even be valid python.

Where python requires indentation to delineate nesting in blocks, I had two ideas to lean even more into using whitespace to parse things.

inline spacing

My first idea was to simply group things base on the number of spaces between different elements. So something like 5+2 * 7 would parse as (5+2) * 7, breaking PEMDAS rules, but looking more (imo) correct. To implement it, I parse from left-to-right into a tree structure, and every time I encounter a space, I either go up or down the tree.

"5+2 * 7"

| (5)             | L "5"       |
| (5,+,_)         | R "5+"      |
| (5,+,2)         | L "5+2"     |
| ((5,+,2))       | L "5+2 "    | <- L value, so go up/outside of tree
| ((5,+,2),*,_)   | R "5+2 *"   |
| ((5,+,2),*,(_)) | R "5+2 * "  | <- R value, so go down/inside of tree
| ((5,+,2),*,(7)) | L "5+2 * 7" |

It is a bit resilient, so some more ambiguous expressions still parse, like the following becomes

"5 *2+ 7"

| (5)             | L "5"       |
| ((5))           | L "5 "      |
| ((5),+,_)       | R "5 +"     |
| ((5),+,2)       | L "5 +2"    |
| ((5),+,2,*,_)   | R "5 +2*"   |
| ((5),+,2,*,(_)) | L "5 +2* "  |
| ((5),+,2,*,(7)) | L "5 +2* 7" |

I think there's some precendant in actual hand writing, like when writing negative numbers, we (at least I) always put the negation - as close to the number as the digits are to each other, like it would be weird to write 7 + - 5 or 7+-5. Similarly when I'm actually writing stuff that should follow PEMDAS rules, I still space out the groups of operations to give myself a visual cue rather than only relying on the order of operations. There is also a historical precedant in the dot syntax of Principia Mathematica , but I've never heard anyone say "Principia Mathematica's notation is something I wish we'd copied". It's a bit weirder, while there is a hierarchy of like "more dots means higher in the scope", it uses a bunch of "group starts here and goes as far right/left in the expression as you can without hitting bigger dots or the end of the line" rules. Not to mention the notation of a single dot is a bit overloaded to mean different things depending on whether it's between letters p . q or surrounding operators _ . = . _. I think my whitespace grouping is a bit more obvious to visually parse, but has some ambiguous edge-cases.

I had a second idea about grouping across "paragraphs" using indentation and newlines, but it was even less clear to me how this would work.It's interesting to think that maybe we'd have some other way of grouping operations together than little markers that indicate "group start" ([{ and "group end" }])

-JD