Imagine you have an expression a + b. That is in its infix form, but a compiler converts it to postfix, where it looks like this: a b +.
All well and good, but I have had to look at complicated expressions and try to work out whether they are well-formed or not. I end up with a type expression like
“ foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET”
and have to
test whether it is well-formed; the operators in this case are SET and PAIR. The whole thing has to be well-formed and a certain part in the middle has to be well-formed;
“ foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET”
must be well-formed too. [In fact the bit on the left in
strikeout, " foo bar PAIR" also has to be well-formed but we are taking that as read for the time being.]
So, we looked at that and thought about a parse tree. That expression can be parsed into a tree as
The left part " foo bar PAIR" is well-formed but ought to fit at the left of the topmost "PAIR". So that expression was not of the requisite type, but that was not possible to tell by just looking at the text.
So I thought about how you could check that automatically without drawing a tree. I thought, each token in the expression denotes a value, or an operator. As I said, we have two operators here. SET is a unary postfix operator consuming 1 token and PAIR is a binary postfix operator which (obviously) consumes two values. At first I thought about counting from right to left, but soon realised it would be easier from left to right. So let's have a look at counting tokens. SET is regarded as +1 for being a token and -1 for consuming 1 token, so it changes the count by 0; similarly PAIR changes the count by -1. So we use that method to count tokens in that
String:
but the middle, removing the struck-out text comes out like this:
Note the 0 in the middle. So the middle part is ill-formed. So it appears that if you count tokens like that, the top expression comes out as well-formed, but the " foo bar PAIR" is in the wrong place on the tree. The middle bit turns out to be ill-formed, and doesn't represent a complete single expression on its own. Signs of an expression being ill-formed:
you get a 0 token count somewhere in the middleYou don't get 1 at the endNow that seems a very simple way of testing expressions; I don't remember seeing it in the parsing books, but it must have been thought of before.
Has anybody come across this, or anything similar before? If so, any idea where I could find more information, please?