Here’s something a bit more computer-sciencey than usual. For something at work, I have had to implement a function parser, so you can type ‘1+2*b‘ and it’ll work it all out, with correct operator precedence and so forth. To do this, I use a three-step process:
- Tokenise the input, and at the same time convert any variables found to their numeric values
- Convert the input tokens to RPN using the Shunting Yard algorithm
- Evaluate the RPN form of the equation using an RPN parser, which is really easy to write
This did the job nicely, however the time came when it had to be extended, and able to accept functions, such as ‘max(1,2,3)‘. The standard shunting yard algorithm can handle functions, but with the restriction that the number of arguments to them is known (really, this is a limitation of the RPN algorithm, but it’s at this point in the process that we need to deal with the problem). In this case, I wanted to handle functions with a variable number of arguments, so ‘max(1,2)‘ would work, and so would ‘max(1,2,3,4,5)‘. To do this, I extended the standard algorithm. Below is the algorithm from Wikipedia. My additions are in bold. This requires two more stacks, a ‘were values‘ stack, and an ‘arg count‘ stack. It also requires that you can attach the number of arguments to an instance of a function. In my case, I did it with a small class that took the function and an argument count, with one of these created during tokenisation for each function encountered.
- While there are tokens to be read:
-
- Read a token.
- If the token is a number, then add it to the output queue. If the were values stack has a value on it, pop it and push true.
- If the token is a function token, then push it onto the stack. Push 0 onto the arg count stack. If the were values stack has a value on it, pop it and push true. Push false onto were values.
- If the token is a function argument separator (e.g., a comma):
-
- Until the topmost element of the stack is a left parenthesis, pop the element onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched. Pop were values into w. If w is true, pop arg count into a, increment a and push back into arg count. Push false into were values.
- If the token is an operator, o1, then:
-
- while there is an operator, o2, at the top of the stack, and either
-
-
- o1 is associative or left-associative and its precedence is less than (lower precedence) or equal to that of o2, or
- o1 is right-associative and its precedence is less than (lower precedence) that of o2,
- pop o2 off the stack, onto the output queue;
-
- push o1 onto the operator stack.
- If the token is a left parenthesis, then push it onto the stack.
- If the token is a right parenthesis:
-
- Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
- Pop the left parenthesis from the stack, but not onto the output queue.
- If the token at the top of the stack is a function token
- Pop stack into f
- Pop arg count into a
- Pop were values into w
- If w is true, increment a
- Set the argument count of f to a
- Push f onto output queue
- If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
- When there are no more tokens to read:
-
- While there are still operator tokens in the stack:
-
- If the operator token on the top of the stack is a parenthesis, then there are mismatched parenthesis.
- Pop the operator onto the output queue.
- Exit.
Note that because I didn’t feel like correctly listifying most of it, consider an if to apply to the end of that sentence only. Operation order is usually important.
With this done, the part of the RPN algorithm that says:
It is known that the function takes n arguments.
can now be satisfied.
eric | 20-Feb-08 at 2:33 am | Permalink
Hi,
first of all, nice job adding this feature to the original algorithm, it’s a must !
But imho there may be a little bug in yours, do you think that an expression like f(g() ) will work ?
I don’t think so, but I may have misread your algorithm.
regards
eric | 20-Feb-08 at 2:45 am | Permalink
sorry, I’ve implemented and tested it, and you were right, I misread : “If the were values stack has a value on it, pop it and push true. Push false onto where values.”
there are two were_value operations, I only read one!
Robin | 20-Feb-08 at 8:17 am | Permalink
Yeah, nested functions were an issue, my first attempt didn’t have the stack for argument counts because I forgot about the possibility. However, it was part of my test case for the implementation
Efstratios Gavves | 11-Apr-08 at 9:47 pm | Permalink
Hi. My name is Stratis. I am in a group that develops an open source Matlab-like program. I am implementing the parser. Your suggestions were very useful. I have added already some new feautures, like using variables without having to subsitute them at the start of the algorithm, suing matrices, like a[1,2,3] and initializing them like
[[1,2],[3,4]]. When I finish, would you like to post the algorithm here?
Robin | 12-Apr-08 at 12:01 am | Permalink
Hi, if I find it interesting, I’d be happy to post it. Most of what I write on my blog is stuff that I’ve done or find useful, but I’m willing to extend that, it’s all free-form
You may be interested, one extension I’ve made to my implementation is allowing boolean functions, in the C-style. basically, if a value is 1 (or 1.0) it is true, else it is false. This allows more spreadsheet-like functions, such as if(1=2, 3, 4) which will return 4. This doesn’t really impact on the algorithm that I documented any, just adding more binary operators.
Out of curiosity, although I’m all for more open source code, why not combine forces with octave or scilab rather than start something new?
Urs | 08-Jun-08 at 1:52 am | Permalink
Hello Robin!
Thank you very much for your extension of the original algorithm. It’s great and it works perfect for me. I wrote a spreadsheet-like processor in SAP’s programming language ABAP Object using the shunting yard algorithm. I find it a very good example for solving recursive problems in a elegant and easy understandable way.
Cheers! Urs
Audi Nugraha | 06-Jul-08 at 8:51 am | Permalink
Is Shunting Yard algorithm could be a syntax check?
Is it better than Recursive-Decent parser?
Robin | 06-Jul-08 at 11:59 am | Permalink
You can use it as a syntax check if you do a full evaluation of the RPN, otherwise it’ll only give you a basic check. I don’t know how it compares to R-D in that sense, I’ve never used it.