Skip to content

Extension to the ‘Shunting Yard’ algorithm to allow variable numbers of arguments to functions

Here’s something a bit more computer-sciencey than usual. For something at work, I have had to implement a function parser, so you can type ‘1+2*b‘ and it’ll work it all out, with correct operator precedence and so forth. To do this, I use a three-step process:

  1. Tokenise the input, and at the same time convert any variables found to their numeric values
  2. Convert the input tokens to RPN using the Shunting Yard algorithm
  3. Evaluate the RPN form of the equation using an RPN parser, which is really easy to write

This did the job nicely, however the time came when it had to be extended, and able to accept functions, such as ‘max(1,2,3)‘. The standard shunting yard algorithm can handle functions, but with the restriction that the number of arguments to them is known (really, this is a limitation of the RPN algorithm, but it’s at this point in the process that we need to deal with the problem). In this case, I wanted to handle functions with a variable number of arguments, so ‘max(1,2)‘ would work, and so would ‘max(1,2,3,4,5)‘. To do this, I extended the standard algorithm. Below is the algorithm from Wikipedia. My additions are in bold. This requires two more stacks, a ‘were values‘ stack, and an ‘arg count‘ stack. It also requires that you can attach the number of arguments to an instance of a function. In my case, I did it with a small class that took the function and an argument count, with one of these created during tokenisation for each function encountered.

  • While there are tokens to be read:
  • Read a token.
  • If the token is a number, then add it to the output queue. If the were values stack has a value on it, pop it and push true.
  • If the token is a function token, then push it onto the stack. Push 0 onto the arg count stack. If the were values stack has a value on it, pop it and push true. Push false onto were values.
  • If the token is a function argument separator (e.g., a comma):
  • Until the topmost element of the stack is a left parenthesis, pop the element onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched. Pop were values into w. If w is true, pop arg count into a, increment a and push back into arg count. Push false into were values.
  • If the token is an operator, o1, then:
  • while there is an operator, o2, at the top of the stack, and either
o1 is associative or left-associative and its precedence is less than (lower precedence) or equal to that of o2, or
o1 is right-associative and its precedence is less than (lower precedence) that of o2,
pop o2 off the stack, onto the output queue;
  • push o1 onto the operator stack.
  • If the token is a left parenthesis, then push it onto the stack.
  • If the token is a right parenthesis:
  • Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
  • Pop the left parenthesis from the stack, but not onto the output queue.
  • If the token at the top of the stack is a function token
    • Pop stack into f
    • Pop arg count into a
    • Pop were values into w
    • If w is true, increment a
    • Set the argument count of f to a
    • Push f onto output queue
  • If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
  • When there are no more tokens to read:
  • While there are still operator tokens in the stack:
  • If the operator token on the top of the stack is a parenthesis, then there are mismatched parenthesis.
  • Pop the operator onto the output queue.
  • Exit.

Note that because I didn’t feel like correctly listifying most of it, consider an if to apply to the end of that sentence only. Operation order is usually important.

With this done, the part of the RPN algorithm that says:

It is known that the function takes n arguments.

can now be satisfied.

{ 18 } Comments

  1. eric | February 20, 2008 at 02:33 | Permalink

    Hi,

    first of all, nice job adding this feature to the original algorithm, it’s a must !
    But imho there may be a little bug in yours, do you think that an expression like f(g() ) will work ?
    I don’t think so, but I may have misread your algorithm.

    regards

  2. eric | February 20, 2008 at 02:45 | Permalink

    sorry, I’ve implemented and tested it, and you were right, I misread : “If the were values stack has a value on it, pop it and push true. Push false onto where values.”
    there are two were_value operations, I only read one!

  3. Robin | February 20, 2008 at 08:17 | Permalink

    Yeah, nested functions were an issue, my first attempt didn’t have the stack for argument counts because I forgot about the possibility. However, it was part of my test case for the implementation :)

  4. Efstratios Gavves | April 11, 2008 at 21:47 | Permalink

    Hi. My name is Stratis. I am in a group that develops an open source Matlab-like program. I am implementing the parser. Your suggestions were very useful. I have added already some new feautures, like using variables without having to subsitute them at the start of the algorithm, suing matrices, like a[1,2,3] and initializing them like
    [[1,2],[3,4]]. When I finish, would you like to post the algorithm here?

  5. Robin | April 12, 2008 at 00:01 | Permalink

    Hi, if I find it interesting, I’d be happy to post it. Most of what I write on my blog is stuff that I’ve done or find useful, but I’m willing to extend that, it’s all free-form :)

    You may be interested, one extension I’ve made to my implementation is allowing boolean functions, in the C-style. basically, if a value is 1 (or 1.0) it is true, else it is false. This allows more spreadsheet-like functions, such as if(1=2, 3, 4) which will return 4. This doesn’t really impact on the algorithm that I documented any, just adding more binary operators.

    Out of curiosity, although I’m all for more open source code, why not combine forces with octave or scilab rather than start something new?

  6. Urs | June 8, 2008 at 01:52 | Permalink

    Hello Robin!
    Thank you very much for your extension of the original algorithm. It’s great and it works perfect for me. I wrote a spreadsheet-like processor in SAP’s programming language ABAP Object using the shunting yard algorithm. I find it a very good example for solving recursive problems in a elegant and easy understandable way.
    Cheers! Urs

  7. Audi Nugraha | July 6, 2008 at 08:51 | Permalink

    Is Shunting Yard algorithm could be a syntax check?

    Is it better than Recursive-Decent parser?

  8. Robin | July 6, 2008 at 11:59 | Permalink

    You can use it as a syntax check if you do a full evaluation of the RPN, otherwise it’ll only give you a basic check. I don’t know how it compares to R-D in that sense, I’ve never used it.

  9. orion | November 21, 2008 at 03:26 | Permalink

    I myself stumbled on this problem and I find it easier to implement output queue as queue of trees. Only change to the algorithm is that when operator is pushed to it, you make last two elements its children (at the end you get only 1 tree, which you read recursively to RPN). Functions with arbitrary arguments are handled this way: when f is pushed to operator stack, you push dummy token on the queue. When f is pushed on the queue, you make everything after last dummy its children.

  10. Steven Hughes | February 5, 2009 at 04:47 | Permalink

    I recently wrote a calculator app for my Mobile 6 phone because the one that came with it did not support anything other than basic arithmetic. I performed the exact same 3 high-level steps that you used. In addition, I also found it helpful to modify the Shunting Yard algorithm. However, my modification was designed to validate that the correct number of parameters was passed to a function. I counted the number of commas in the parameter list using an arg count stack just like you did. One thing my modification did not do was to ensure that a value was in fact passed for each parameter. My modification was not able to detect null parameters (e.g. min(,3) – first parameter missing). It appears your ‘were values’ stack is designed for just such a purpose.

    I had a thought on how to handle a varialbe number of parameters and stil maintain a single postfix expression. If the Shunting Yard algorithm is modified to put the number of arguments passed to a function as the function’s last parameter, you could easily handle variable numbers of parameters when you evaluate the postfix expression.

    ‘Min(1,2,3,4,5)’ would be converted to postfix as ’1 2 3 4 5 5 Min’. The Min function would pop the last ’5′ (which would be on the top of the stack) and know it needed to pop 5 more values to get all it’s parameters. Similarly, ‘Min(1,2)’ would be converted to ’1 2 3 Min’.

    I have not, actually, implemented this but was thinking that it would probably be my approach if I were to. Of course, I would probably want to implement the null value check to so I didn’t corrupt the stack by reading parameters meant for other operations.

  11. Steven Hughes | February 5, 2009 at 04:49 | Permalink

    Correction:

    ‘Min(1,2) would be converted to ’1 2 2 Min’

  12. Bri | November 17, 2010 at 10:17 | Permalink

    Do you have a reference to or an idea of how one would extend the Shunting Yard algorithm to handle ternary operations such as a conditional:

    a ? b : c

    This could be done (as was posted earlier in the comments) using a function, but can the algorithm be modified to handle the above syntax?

  13. Robin | December 3, 2010 at 23:05 | Permalink

    Off the top of my head, I can’t think of a way. It’s perhaps possible with some pre-parsing.

  14. Munch | January 24, 2011 at 06:00 | Permalink

    It’s possible to implement ?: with shunting yard algorithm, but it requires inline processing. I suggest studying the Forth programming language and how it implements IF ELSE THEN constructs (no, that’s not a typo). In Forth, IF is an expression, not a statement, for Forth is also RPN.

  15. Manadar | February 15, 2011 at 04:27 | Permalink

    Hey Robin, I had a similar problem and found your blog post helpful. Eventually I did not use this algorithm but I used something along the lines of:

    If the token is a function token, call a function that parses a function call. (That function calls the shunting yard algorithm again, so it becomes recursive). Place the parsed function in a global dictionary that keeps track of parsed functions, and then push the funciton token to the output queue. Continue parsing where the function call stopped.

    If the token is a function seperate token (a comma): Break loop

    If the token is a right parenthesis
    [..] normal operation
    If the stack runs out without finding a left parenthesis: break loop

    If any non-valid tokens (other keywords, comments, endline tokens, etc.) are found: break loop

    Essentially what this does is treat a function token like any other primitive or variable. And it allows for recursion by automatically stopping to parse. Afterwards, in your RPN algorithm you can get the entire function token from the dictionary again.

    It’s a bit dirty, but it is easily applied if you already have much of the other classes that interface with this ready.

  16. Ibby | December 28, 2011 at 08:16 | Permalink

    Good work! I am currently working on a graphing calculator for a project for my A Levels and didn’t even think about varying numbers of arguments. Definatelly going to include it if I have time though (and of course give you credit for your modification to the algorithm).

  17. FrankZ | September 1, 2012 at 10:50 | Permalink

    I am dealing with the same problem. I just ended up with the same solution that Steven Hughes proposes and it’s implementation doesn’t need any extra stack.

    The objective is to add an extra argument to the functions that take a variable number of arguments indicating how many argumentsts the function is using. So, Min (1,2,3) will become 1,2,3,3,Min. To do that you need to be able to tell when a token is actually a function taking a variable number of arguments. That it is easily done. Somewhere you must write down how many arguments each function take. You need that to evaluate the RPN. Just write -1 for the functions taking a variable number of arguments.

    I just describe the modifications to the wikipedia algorithm.

    When a ‘(‘ is found, check if the operator atthe top of the stack is a variable-arguments function, if it is, push number 0 in the stack.
    Every time a token is a number, check if the element at the top of the stack is a number. If it is, increment it by 1 and push it back in the stack.

    That’s it.

    When the rpn is evaluates you only have to:
    if the operator is a variable-args function, pop the number of arguments from the stack and then pop that number of arguments from the stack.

    This can deal with nested variable-number args functions. For example Min(1,2, Min(3,4) will become 1,2,3,4,2,Min,3,Min.

    It can also deal with left associative function with a variable number of operators. For example, I have to implement the ‘IN’ operator like the mysql one. That is written as ‘A’ IN (‘B’, ‘C’, ‘D’). You handle with that when you evaluate the rpn, actually. Just add this simple rule:

    If the token is a varaible-args function, pop the number of arguments from the stack, if it is also left-associative, add 1 to the number of arguments.

    In this way you will pop the argument to the left of the function.

  18. FrankZ | September 6, 2012 at 07:44 | Permalink

    Hi again,

    I have realized that the algorithm changes in my previous post are wrong. The idea is the good one but the solution is a bit more complicated than that. Still, the solution does not require any extra stack.

    Here are the changes you need with respect to the algorithm on wikipedia:

    Before staring the parsing define an integer like:

    int args_num = -1;

    The add this rules to the algorithm:

    -For each token you examin, if (token != ‘)’ and args_num == 0) then args_num = 1;

    -If (token == ‘,’) then args_num++; then as per wikipedia algorithm

    -if (token == ‘(‘) then stack.push(args_num); args_num = 0; stach.push(‘(‘);

    -if (token == ‘)’) then {
    pop arguments as per wikipedia algorithm up to first ‘(‘;
    args_num_temp = args_num;
    stack.pop(); # this pops the ‘(‘;
    args_num = stack.pop(); # this pops the args_num pushed just before ‘(‘
    if (elemen on top of stack == variable args operator) {
    if (elemen on top of stack is left associative) {
    args_num_temp++;
    }
    send args_num_temp to output stream.
    }

    that’s it. This is tested and working.

    The idea of these changes is as follow:
    -Keep the args_num integer as a running count of the argument number
    -Before pushing ‘(‘ into the stack, push args_num into the stack. In this way you store the current argument count and start a new count for the ‘(‘ you just pusched in the stack.
    -Once the args_num has been set to 0, that is, we found a ‘(‘, any token, except ‘)’ will indicate that at least one argument inside the bracket is present. So, any token except ‘)’ will move the counter from 0 to 1.
    -Once the args_num has move to 1, then only ‘,’ will increase it.
    -When we found a ‘)’, args_num will contain the arguments found in the bracket pair.
    -Notice that args_num counts argument every time a left bracket is found. We would not always need to keep the count as not all left-brackets will be used to include arguments. Left-brackets can be used just for precedence as in (1 + 2) * 3. Anyway, once the ‘)’ is found we can tell if the args_num is needed or not. In fact, after popping the ‘(‘ we know that the next element will be a number, the args_num of the external count. So pop that and the next element in the stack will either be a variable-args function, in which case the args_num was actually keeping the count, so we need to send it to the output stream, of it was not in which case we ignore it. The args_num we popped from the stack will carry on the count of any external variable-args function that was enclosing our bracke pair.
    -In case the variable-args function is left associative there will also be the value on the left of the function. Adding 1 to the args_num before sending it to the output stream will keep into account of that too.

{ 1 } Trackback

  1. [...] http://www.kallisti.net.nz/blog/2008/02/extension-to-the-shunting-yard-algorithm-to-allow-variable-n… Share this:TwitterFacebookLike this:LikeBe the first to like this post. This entry was posted in Uncategorized. Bookmark the permalink. ← Flee – Fast Lightweight Expression Evaluator [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *