Here’s something a bit more computersciencey than usual. For something at work, I have had to implement a function parser, so you can type ‘1+2*b‘ and it’ll work it all out, with correct operator precedence and so forth. To do this, I use a threestep process:
 Tokenise the input, and at the same time convert any variables found to their numeric values
 Convert the input tokens to RPN using the Shunting Yard algorithm
 Evaluate the RPN form of the equation using an RPN parser, which is really easy to write
This did the job nicely, however the time came when it had to be extended, and able to accept functions, such as ‘max(1,2,3)‘. The standard shunting yard algorithm can handle functions, but with the restriction that the number of arguments to them is known (really, this is a limitation of the RPN algorithm, but it’s at this point in the process that we need to deal with the problem). In this case, I wanted to handle functions with a variable number of arguments, so ‘max(1,2)‘ would work, and so would ‘max(1,2,3,4,5)‘. To do this, I extended the standard algorithm. Below is the algorithm from Wikipedia. My additions are in bold. This requires two more stacks, a ‘were values‘ stack, and an ‘arg count‘ stack. It also requires that you can attach the number of arguments to an instance of a function. In my case, I did it with a small class that took the function and an argument count, with one of these created during tokenisation for each function encountered.
 While there are tokens to be read:

 Read a token.
 If the token is a number, then add it to the output queue. If the were values stack has a value on it, pop it and push true.
 If the token is a function token, then push it onto the stack. Push 0 onto the arg count stack. If the were values stack has a value on it, pop it and push true. Push false onto were values.
 If the token is a function argument separator (e.g., a comma):

 Until the topmost element of the stack is a left parenthesis, pop the element onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched. Pop were values into w. If w is true, pop arg count into a, increment a and push back into arg count. Push false into were values.
 If the token is an operator, o_{1}, then:

 while there is an operator, o_{2}, at the top of the stack, and either


 o_{1} is associative or leftassociative and its precedence is less than (lower precedence) or equal to that of o_{2}, or
 o_{1} is rightassociative and its precedence is less than (lower precedence) that of o_{2},
 pop o_{2} off the stack, onto the output queue;

 push o_{1} onto the operator stack.
 If the token is a left parenthesis, then push it onto the stack.
 If the token is a right parenthesis:

 Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
 Pop the left parenthesis from the stack, but not onto the output queue.
 If the token at the top of the stack is a function token
 Pop stack into f
 Pop arg count into a
 Pop were values into w
 If w is true, increment a
 Set the argument count of f to a
 Push f onto output queue
 If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
 When there are no more tokens to read:

 While there are still operator tokens in the stack:

 If the operator token on the top of the stack is a parenthesis, then there are mismatched parenthesis.
 Pop the operator onto the output queue.
 Exit.
Note that because I didn’t feel like correctly listifying most of it, consider an if to apply to the end of that sentence only. Operation order is usually important.
With this done, the part of the RPN algorithm that says:
It is known that the function takes n arguments.
can now be satisfied.
{ 25 } Comments
Hi,
first of all, nice job adding this feature to the original algorithm, it’s a must !
But imho there may be a little bug in yours, do you think that an expression like f(g() ) will work ?
I don’t think so, but I may have misread your algorithm.
regards
sorry, I’ve implemented and tested it, and you were right, I misread : “If the were values stack has a value on it, pop it and push true. Push false onto where values.”
there are two were_value operations, I only read one!
Yeah, nested functions were an issue, my first attempt didn’t have the stack for argument counts because I forgot about the possibility. However, it was part of my test case for the implementation
Hi. My name is Stratis. I am in a group that develops an open source Matlablike program. I am implementing the parser. Your suggestions were very useful. I have added already some new feautures, like using variables without having to subsitute them at the start of the algorithm, suing matrices, like a[1,2,3] and initializing them like
[[1,2],[3,4]]. When I finish, would you like to post the algorithm here?
Hi, if I find it interesting, I’d be happy to post it. Most of what I write on my blog is stuff that I’ve done or find useful, but I’m willing to extend that, it’s all freeform
You may be interested, one extension I’ve made to my implementation is allowing boolean functions, in the Cstyle. basically, if a value is 1 (or 1.0) it is true, else it is false. This allows more spreadsheetlike functions, such as if(1=2, 3, 4) which will return 4. This doesn’t really impact on the algorithm that I documented any, just adding more binary operators.
Out of curiosity, although I’m all for more open source code, why not combine forces with octave or scilab rather than start something new?
Hello Robin!
Thank you very much for your extension of the original algorithm. It’s great and it works perfect for me. I wrote a spreadsheetlike processor in SAP’s programming language ABAP Object using the shunting yard algorithm. I find it a very good example for solving recursive problems in a elegant and easy understandable way.
Cheers! Urs
Is Shunting Yard algorithm could be a syntax check?
Is it better than RecursiveDecent parser?
You can use it as a syntax check if you do a full evaluation of the RPN, otherwise it’ll only give you a basic check. I don’t know how it compares to RD in that sense, I’ve never used it.
I myself stumbled on this problem and I find it easier to implement output queue as queue of trees. Only change to the algorithm is that when operator is pushed to it, you make last two elements its children (at the end you get only 1 tree, which you read recursively to RPN). Functions with arbitrary arguments are handled this way: when f is pushed to operator stack, you push dummy token on the queue. When f is pushed on the queue, you make everything after last dummy its children.
I recently wrote a calculator app for my Mobile 6 phone because the one that came with it did not support anything other than basic arithmetic. I performed the exact same 3 highlevel steps that you used. In addition, I also found it helpful to modify the Shunting Yard algorithm. However, my modification was designed to validate that the correct number of parameters was passed to a function. I counted the number of commas in the parameter list using an arg count stack just like you did. One thing my modification did not do was to ensure that a value was in fact passed for each parameter. My modification was not able to detect null parameters (e.g. min(,3) – first parameter missing). It appears your ‘were values’ stack is designed for just such a purpose.
I had a thought on how to handle a varialbe number of parameters and stil maintain a single postfix expression. If the Shunting Yard algorithm is modified to put the number of arguments passed to a function as the function’s last parameter, you could easily handle variable numbers of parameters when you evaluate the postfix expression.
‘Min(1,2,3,4,5)’ would be converted to postfix as ‘1 2 3 4 5 5 Min’. The Min function would pop the last ‘5’ (which would be on the top of the stack) and know it needed to pop 5 more values to get all it’s parameters. Similarly, ‘Min(1,2)’ would be converted to ‘1 2 3 Min’.
I have not, actually, implemented this but was thinking that it would probably be my approach if I were to. Of course, I would probably want to implement the null value check to so I didn’t corrupt the stack by reading parameters meant for other operations.
Correction:
‘Min(1,2) would be converted to ‘1 2 2 Min’
Do you have a reference to or an idea of how one would extend the Shunting Yard algorithm to handle ternary operations such as a conditional:
a ? b : c
This could be done (as was posted earlier in the comments) using a function, but can the algorithm be modified to handle the above syntax?
Off the top of my head, I can’t think of a way. It’s perhaps possible with some preparsing.
It’s possible to implement ?: with shunting yard algorithm, but it requires inline processing. I suggest studying the Forth programming language and how it implements IF ELSE THEN constructs (no, that’s not a typo). In Forth, IF is an expression, not a statement, for Forth is also RPN.
Hey Robin, I had a similar problem and found your blog post helpful. Eventually I did not use this algorithm but I used something along the lines of:
If the token is a function token, call a function that parses a function call. (That function calls the shunting yard algorithm again, so it becomes recursive). Place the parsed function in a global dictionary that keeps track of parsed functions, and then push the funciton token to the output queue. Continue parsing where the function call stopped.
If the token is a function seperate token (a comma): Break loop
If the token is a right parenthesis
[..] normal operation
If the stack runs out without finding a left parenthesis: break loop
If any nonvalid tokens (other keywords, comments, endline tokens, etc.) are found: break loop
Essentially what this does is treat a function token like any other primitive or variable. And it allows for recursion by automatically stopping to parse. Afterwards, in your RPN algorithm you can get the entire function token from the dictionary again.
It’s a bit dirty, but it is easily applied if you already have much of the other classes that interface with this ready.
Good work! I am currently working on a graphing calculator for a project for my A Levels and didn’t even think about varying numbers of arguments. Definatelly going to include it if I have time though (and of course give you credit for your modification to the algorithm).
I am dealing with the same problem. I just ended up with the same solution that Steven Hughes proposes and it’s implementation doesn’t need any extra stack.
The objective is to add an extra argument to the functions that take a variable number of arguments indicating how many argumentsts the function is using. So, Min (1,2,3) will become 1,2,3,3,Min. To do that you need to be able to tell when a token is actually a function taking a variable number of arguments. That it is easily done. Somewhere you must write down how many arguments each function take. You need that to evaluate the RPN. Just write 1 for the functions taking a variable number of arguments.
I just describe the modifications to the wikipedia algorithm.
When a ‘(‘ is found, check if the operator atthe top of the stack is a variablearguments function, if it is, push number 0 in the stack.
Every time a token is a number, check if the element at the top of the stack is a number. If it is, increment it by 1 and push it back in the stack.
That’s it.
When the rpn is evaluates you only have to:
if the operator is a variableargs function, pop the number of arguments from the stack and then pop that number of arguments from the stack.
This can deal with nested variablenumber args functions. For example Min(1,2, Min(3,4) will become 1,2,3,4,2,Min,3,Min.
It can also deal with left associative function with a variable number of operators. For example, I have to implement the ‘IN’ operator like the mysql one. That is written as ‘A’ IN (‘B’, ‘C’, ‘D’). You handle with that when you evaluate the rpn, actually. Just add this simple rule:
If the token is a varaibleargs function, pop the number of arguments from the stack, if it is also leftassociative, add 1 to the number of arguments.
In this way you will pop the argument to the left of the function.
Hi again,
I have realized that the algorithm changes in my previous post are wrong. The idea is the good one but the solution is a bit more complicated than that. Still, the solution does not require any extra stack.
Here are the changes you need with respect to the algorithm on wikipedia:
Before staring the parsing define an integer like:
int args_num = 1;
The add this rules to the algorithm:
For each token you examin, if (token != ‘)’ and args_num == 0) then args_num = 1;
If (token == ‘,’) then args_num++; then as per wikipedia algorithm
if (token == ‘(‘) then stack.push(args_num); args_num = 0; stach.push(‘(‘);
if (token == ‘)’) then {
pop arguments as per wikipedia algorithm up to first ‘(‘;
args_num_temp = args_num;
stack.pop(); # this pops the ‘(‘;
args_num = stack.pop(); # this pops the args_num pushed just before ‘(‘
if (elemen on top of stack == variable args operator) {
if (elemen on top of stack is left associative) {
args_num_temp++;
}
send args_num_temp to output stream.
}
that’s it. This is tested and working.
The idea of these changes is as follow:
Keep the args_num integer as a running count of the argument number
Before pushing ‘(‘ into the stack, push args_num into the stack. In this way you store the current argument count and start a new count for the ‘(‘ you just pusched in the stack.
Once the args_num has been set to 0, that is, we found a ‘(‘, any token, except ‘)’ will indicate that at least one argument inside the bracket is present. So, any token except ‘)’ will move the counter from 0 to 1.
Once the args_num has move to 1, then only ‘,’ will increase it.
When we found a ‘)’, args_num will contain the arguments found in the bracket pair.
Notice that args_num counts argument every time a left bracket is found. We would not always need to keep the count as not all leftbrackets will be used to include arguments. Leftbrackets can be used just for precedence as in (1 + 2) * 3. Anyway, once the ‘)’ is found we can tell if the args_num is needed or not. In fact, after popping the ‘(‘ we know that the next element will be a number, the args_num of the external count. So pop that and the next element in the stack will either be a variableargs function, in which case the args_num was actually keeping the count, so we need to send it to the output stream, of it was not in which case we ignore it. The args_num we popped from the stack will carry on the count of any external variableargs function that was enclosing our bracke pair.
In case the variableargs function is left associative there will also be the value on the left of the function. Adding 1 to the args_num before sending it to the output stream will keep into account of that too.
I was involved in the design and implementation of the 1. PL/I language and compiler (S/360 DOS) in the sixties (IBM Lab Germany). I used the Dystra method and extended it successfully to implement all mixed data type attributes conversion and functions. 19661967 I implented a study project in the IBM Endicott/USA Lab (Advanced technolgy) a PL/conversational compiler which converted a whole program to RPN. The execution was by an RPN interpreter. it was presented in one of IBM conference. It is a bitty that I did not these all documented and and I have no access to to all of this info .. Does anyone find something ..
Hello Robin,
I was stucked trying to introduced user function into my own math expression evaluator developped in C. Following your pseudo code solved my issues ! Thanx a lot for you explanations of shunting yard algorithm extension. I m still surprised to not see any simple/free implementation on internet, I will certainly post mine to provide this basic tool.
Good Job, Regards !
Xavier
Hey , I just wanted to ask you what do you mean by
w and a when you say like pop were values into w ?
Thank you and you’ve done a great job !!!
I spent a few days on this, and wound up with a pretty simple kludge:
When the token is ListStart (an open paren in most cases), I add it to the output queue.
Then, when converting or executing the RPN output, and I encounter a function call token, I pop items from the stack until I encounter a ListStart (again, an open paren), discard it, and consider everything in between to be an argument to the function.
It was a oneline change to my implementation which was basically transcoded from the Wikipedia algorithm description, and worked great.
Not a bad idea, David! Thank you!
But does it work with “f(a*(b+c),d)”style function calls?..
I have found another, the best way to perform this job.
I treat quantifiers, function and predicate names as unary operators of different priorities. I also treat commas inside function calls and predicate calls as binary leftassociative (or rightassociative, it is not important) operators. So I have only unary and binary operators, and atoms. I store them in structures with all their properties to not confuse a function name and a variable name, for example. All advantages of RPN without its disadvantages. Simple and efficient. Elegant, I would say. đŸ˜€
You are free to use and improve this method.
{ 1 } Trackback
[…] http://www.kallisti.net.nz/blog/2008/02/extensiontotheshuntingyardalgorithmtoallowvariablen… Share this:TwitterFacebookLike this:LikeBe the first to like this post. This entry was posted in Uncategorized. Bookmark the permalink. ← Flee – Fast Lightweight Expression Evaluator […]
Post a Comment