aspose file tools*
The moose likes General Computing and the fly likes Counting tokens allows us to test whether a postfix expression is wellformed. Anyone seen it before? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "Counting tokens allows us to test whether a postfix expression is wellformed. Anyone seen it before?" Watch "Counting tokens allows us to test whether a postfix expression is wellformed. Anyone seen it before?" New topic
Author

Counting tokens allows us to test whether a postfix expression is wellformed. Anyone seen it before?

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39396
    
  28
Imagine you have an expression a + b. That is in its infix form, but a compiler converts it to postfix, where it looks like this: a b +.

All well and good, but I have had to look at complicated expressions and try to work out whether they are well-formed or not. I end up with a type expression like
“ foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET”
and have to test whether it is well-formed; the operators in this case are SET and PAIR. The whole thing has to be well-formed and a certain part in the middle has to be well-formed;
foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET
must be well-formed too. [In fact the bit on the left in strikeout, " foo bar PAIR" also has to be well-formed but we are taking that as read for the time being.]

So, we looked at that and thought about a parse tree. That expression can be parsed into a tree asThe left part " foo bar PAIR" is well-formed but ought to fit at the left of the topmost "PAIR". So that expression was not of the requisite type, but that was not possible to tell by just looking at the text.
So I thought about how you could check that automatically without drawing a tree. I thought, each token in the expression denotes a value, or an operator. As I said, we have two operators here. SET is a unary postfix operator consuming 1 token and PAIR is a binary postfix operator which (obviously) consumes two values. At first I thought about counting from right to left, but soon realised it would be easier from left to right. So let's have a look at counting tokens. SET is regarded as +1 for being a token and -1 for consuming 1 token, so it changes the count by 0; similarly PAIR changes the count by -1. So we use that method to count tokens in that String:
but the middle, removing the struck-out text comes out like this:Note the 0 in the middle. So the middle part is ill-formed. So it appears that if you count tokens like that, the top expression comes out as well-formed, but the " foo bar PAIR" is in the wrong place on the tree. The middle bit turns out to be ill-formed, and doesn't represent a complete single expression on its own. Signs of an expression being ill-formed:
  • you get a 0 token count somewhere in the middle
  • You don't get 1 at the end
  • Now that seems a very simple way of testing expressions; I don't remember seeing it in the parsing books, but it must have been thought of before.

    Has anybody come across this, or anything similar before? If so, any idea where I could find more information, please?
    Omar Al Kababji
    Ranch Hand

    Joined: Jan 13, 2009
    Posts: 357
    yes but you have to consider the number you are getting when adding up operands, so you have to add two new rules:

    1) before a PAIR operator your operand counter should be 2 (because its a binary operator)
    2) before a SET operator your operand counter should be 1 (because its a unary operator)

    so when you reach the sequence

    bez PAIR SET
    3 2 2

    you have a bez count of 3 before a PAIR operator which is required to have only an operand count of 2, but its 3 so this means something is going wrong, and the expression is ill-formed.



    did this help??


    Omar Al Kababji - Electrical & Computer Engineer
    [SCJP - 90% - Story] [SCWCD - 94% - Story] [SCBCD - 80% - Story] | My Blog
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39396
        
      28
    No, because the operand no 1 can be the operand of a following PAIR operator. The 3 is permissible. Look at an arithmetical formula

    5 + 6 * 7

    That turns into

    5 6 7 * + in postfix. Doing token counting givesSo it is a well-formed expression with a 3 in the middle. The operands for the * are 6 7 and the operands for the + are 5 42.

    The PAIR operator immediately after bez takes buz and bez as its operands. The PAIR is passed to the SET, which is one of the operands for the next PAIR. The other operand for that PAIR is the PAIR before buz. The larger expression is well-formed, but the smaller part created by deleting " foo bar PAIR" and " . . . PAIR SET" was not well-formed.
    Omar Al Kababji
    Ranch Hand

    Joined: Jan 13, 2009
    Posts: 357
    Yes you are completely right, I think we need a tree to maintain the structure, and insert to the tree in a depth first order from left to right (while reading from right to left from the expression), keeping in mind that a parent node can have "n" childs where "n" is the number of operands the operator can have, and operands can't have childs.

    for example for the second expression:

    baz PAIR biz PAIR boz PAIR buz bez PAIR SET ”

    we start parsing from right to left


    we reached to an end because we have only operands at the bottom (buz and bez) left that can't have childs and there are still tokens in the expression so it is ill-formed.



    for the first expression:
    foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET


    we could build a full tree structure and consumed all the expression tokens, so its well-formed.

    what about this solution??

    drawing the tree was very difficult
    Omar Al Kababji
    Ranch Hand

    Joined: Jan 13, 2009
    Posts: 357
    Which by the way can be implemented simply by a stack, having only one result at the termination of the process.

    For each item in the postfix expression from the left:
    if the item is a number push it on the stack
    if the item is an operator (PAIR) then pop two operands off the stack
    if the item is an operator (SET) then pop one operand off the stack
    make the calculation
    push the result on the stack
    When the loop is done the answer is the only item remaining on the stack.
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39396
        
      28
    Agree, the numbers represent how many values are on the stack at different stages through the process.

    But my actual question was:

    Has anybody come across this technique before? It seems so simple I am a bit surprised not to have seen it in a book. Maybe I have looked at the wrong page.
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39396
        
      28
    Omar Al Kababji wrote: . . . drawing the tree was very difficult
    You go from tight to left. Every time you have an operand, you reach a "leaf". Every time you pass an operator you need branches underneath. SET (like unary minus) takes one value underneath, so you write

    SET
    |
    . . .

    Pair is a binary operator, like + or >, so you would write

    PAIR
    / \
    . . .

    The first value after SET (going right-to-left) goes underneath SET, and the first value after PAIR goes under PAIR\ and the second value underneath /PAIR. If the right-hand value is an operator, you have to start a new branch and you keep going on that branch until you have filled all its leaves. So " foo bar PAIR baz PAIR biz PAIR boz PAIR * buz bez PAIR SET PAIR SET" gives this parsing treeI added a * left of buz bez so you can see where that terminates a branch.
    Omar Al Kababji
    Ranch Hand

    Joined: Jan 13, 2009
    Posts: 357
    drawing the tree was very difficult


    I meant drawing the tree correctly in the editor with all the correct spacing, not drawing the real tree

    back to the days of my Engineering studies I remember we studied these stuff in a course named "compilers" and I think we encountered stuff like this in "system programming" too.
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39396
        
      28
    Yes, this came up first in our "language systems" module, which is largely about compilers.I see what you mean about drawing on the editor. Even "code" only helps a little.

    Thank you for your help, Omar.
    Omar Al Kababji
    Ranch Hand

    Joined: Jan 13, 2009
    Posts: 357
    I have to say that back that days, working on compiler and system programming projects was more interesting than developing web applications, you go deep to the bottom of the computer architecture, you write less code, but coding is much more difficult to write, you need to think a lot to find a good performing solution.

    you are welcome Campbell.
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39396
        
      28
    Sorry for not replying earlier.


    Omar Al Kababji wrote: . . . coding is much more difficult to write . . .
    Too true . . . too true . . .

    We are having similar changes round here; the compilers module is being dropped. It deprives our undergraduates of an opportunity to write anything at all difficult.
     
    Don't get me started about those stupid light bulbs.
     
    subject: Counting tokens allows us to test whether a postfix expression is wellformed. Anyone seen it before?