Skip to content

Extension to the ‘Shunting Yard’ algorithm to allow variable numbers of arguments to functions

Here’s something a bit more computer-sciencey than usual. For something at work, I have had to implement a function parser, so you can type ‘1+2*b‘ and it’ll work it all out, with correct operator precedence and so forth. To do this, I use a three-step process:

  1. Tokenise the input, and at the same time convert any variables found to their numeric values
  2. Convert the input tokens to RPN using the Shunting Yard algorithm
  3. Evaluate the RPN form of the equation using an RPN parser, which is really easy to write

This did the job nicely, however the time came when it had to be extended, and able to accept functions, such as ‘max(1,2,3)‘. The standard shunting yard algorithm can handle functions, but with the restriction that the number of arguments to them is known (really, this is a limitation of the RPN algorithm, but it’s at this point in the process that we need to deal with the problem). In this case, I wanted to handle functions with a variable number of arguments, so ‘max(1,2)‘ would work, and so would ‘max(1,2,3,4,5)‘. To do this, I extended the standard algorithm. Below is the algorithm from Wikipedia. My additions are in bold. This requires two more stacks, a ‘were values‘ stack, and an ‘arg count‘ stack. It also requires that you can attach the number of arguments to an instance of a function. In my case, I did it with a small class that took the function and an argument count, with one of these created during tokenisation for each function encountered.

  • While there are tokens to be read:
  • Read a token.
  • If the token is a number, then add it to the output queue. If the were values stack has a value on it, pop it and push true.
  • If the token is a function token, then push it onto the stack. Push 0 onto the arg count stack. If the were values stack has a value on it, pop it and push true. Push false onto were values.
  • If the token is a function argument separator (e.g., a comma):
  • Until the topmost element of the stack is a left parenthesis, pop the element onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched. Pop were values into w. If w is true, pop arg count into a, increment a and push back into arg count. Push false into were values.
  • If the token is an operator, o1, then:
  • while there is an operator, o2, at the top of the stack, and either
o1 is associative or left-associative and its precedence is less than (lower precedence) or equal to that of o2, or
o1 is right-associative and its precedence is less than (lower precedence) that of o2,
pop o2 off the stack, onto the output queue;
  • push o1 onto the operator stack.
  • If the token is a left parenthesis, then push it onto the stack.
  • If the token is a right parenthesis:
  • Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
  • Pop the left parenthesis from the stack, but not onto the output queue.
  • If the token at the top of the stack is a function token
    • Pop stack into f
    • Pop arg count into a
    • Pop were values into w
    • If w is true, increment a
    • Set the argument count of f to a
    • Push f onto output queue
  • If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
  • When there are no more tokens to read:
  • While there are still operator tokens in the stack:
  • If the operator token on the top of the stack is a parenthesis, then there are mismatched parenthesis.
  • Pop the operator onto the output queue.
  • Exit.

Note that because I didn’t feel like correctly listifying most of it, consider an if to apply to the end of that sentence only. Operation order is usually important.

With this done, the part of the RPN algorithm that says:

It is known that the function takes n arguments.

can now be satisfied.

{ 44 } Comments

  1. eric | February 20, 2008 at 02:33 | Permalink

    Hi,

    first of all, nice job adding this feature to the original algorithm, it’s a must !
    But imho there may be a little bug in yours, do you think that an expression like f(g() ) will work ?
    I don’t think so, but I may have misread your algorithm.

    regards

  2. eric | February 20, 2008 at 02:45 | Permalink

    sorry, I’ve implemented and tested it, and you were right, I misread : “If the were values stack has a value on it, pop it and push true. Push false onto where values.”
    there are two were_value operations, I only read one!

  3. Robin | February 20, 2008 at 08:17 | Permalink

    Yeah, nested functions were an issue, my first attempt didn’t have the stack for argument counts because I forgot about the possibility. However, it was part of my test case for the implementation :)

  4. Efstratios Gavves | April 11, 2008 at 21:47 | Permalink

    Hi. My name is Stratis. I am in a group that develops an open source Matlab-like program. I am implementing the parser. Your suggestions were very useful. I have added already some new feautures, like using variables without having to subsitute them at the start of the algorithm, suing matrices, like a[1,2,3] and initializing them like
    [[1,2],[3,4]]. When I finish, would you like to post the algorithm here?

  5. Robin | April 12, 2008 at 00:01 | Permalink

    Hi, if I find it interesting, I’d be happy to post it. Most of what I write on my blog is stuff that I’ve done or find useful, but I’m willing to extend that, it’s all free-form :)

    You may be interested, one extension I’ve made to my implementation is allowing boolean functions, in the C-style. basically, if a value is 1 (or 1.0) it is true, else it is false. This allows more spreadsheet-like functions, such as if(1=2, 3, 4) which will return 4. This doesn’t really impact on the algorithm that I documented any, just adding more binary operators.

    Out of curiosity, although I’m all for more open source code, why not combine forces with octave or scilab rather than start something new?

  6. Urs | June 8, 2008 at 01:52 | Permalink

    Hello Robin!
    Thank you very much for your extension of the original algorithm. It’s great and it works perfect for me. I wrote a spreadsheet-like processor in SAP’s programming language ABAP Object using the shunting yard algorithm. I find it a very good example for solving recursive problems in a elegant and easy understandable way.
    Cheers! Urs

  7. Audi Nugraha | July 6, 2008 at 08:51 | Permalink

    Is Shunting Yard algorithm could be a syntax check?

    Is it better than Recursive-Decent parser?

  8. Robin | July 6, 2008 at 11:59 | Permalink

    You can use it as a syntax check if you do a full evaluation of the RPN, otherwise it’ll only give you a basic check. I don’t know how it compares to R-D in that sense, I’ve never used it.

  9. orion | November 21, 2008 at 03:26 | Permalink

    I myself stumbled on this problem and I find it easier to implement output queue as queue of trees. Only change to the algorithm is that when operator is pushed to it, you make last two elements its children (at the end you get only 1 tree, which you read recursively to RPN). Functions with arbitrary arguments are handled this way: when f is pushed to operator stack, you push dummy token on the queue. When f is pushed on the queue, you make everything after last dummy its children.

  10. Steven Hughes | February 5, 2009 at 04:47 | Permalink

    I recently wrote a calculator app for my Mobile 6 phone because the one that came with it did not support anything other than basic arithmetic. I performed the exact same 3 high-level steps that you used. In addition, I also found it helpful to modify the Shunting Yard algorithm. However, my modification was designed to validate that the correct number of parameters was passed to a function. I counted the number of commas in the parameter list using an arg count stack just like you did. One thing my modification did not do was to ensure that a value was in fact passed for each parameter. My modification was not able to detect null parameters (e.g. min(,3) – first parameter missing). It appears your ‘were values’ stack is designed for just such a purpose.

    I had a thought on how to handle a varialbe number of parameters and stil maintain a single postfix expression. If the Shunting Yard algorithm is modified to put the number of arguments passed to a function as the function’s last parameter, you could easily handle variable numbers of parameters when you evaluate the postfix expression.

    ‘Min(1,2,3,4,5)’ would be converted to postfix as ‘1 2 3 4 5 5 Min’. The Min function would pop the last ‘5’ (which would be on the top of the stack) and know it needed to pop 5 more values to get all it’s parameters. Similarly, ‘Min(1,2)’ would be converted to ‘1 2 3 Min’.

    I have not, actually, implemented this but was thinking that it would probably be my approach if I were to. Of course, I would probably want to implement the null value check to so I didn’t corrupt the stack by reading parameters meant for other operations.

  11. Steven Hughes | February 5, 2009 at 04:49 | Permalink

    Correction:

    ‘Min(1,2) would be converted to ‘1 2 2 Min’

  12. Bri | November 17, 2010 at 10:17 | Permalink

    Do you have a reference to or an idea of how one would extend the Shunting Yard algorithm to handle ternary operations such as a conditional:

    a ? b : c

    This could be done (as was posted earlier in the comments) using a function, but can the algorithm be modified to handle the above syntax?

  13. Robin | December 3, 2010 at 23:05 | Permalink

    Off the top of my head, I can’t think of a way. It’s perhaps possible with some pre-parsing.

  14. Munch | January 24, 2011 at 06:00 | Permalink

    It’s possible to implement ?: with shunting yard algorithm, but it requires inline processing. I suggest studying the Forth programming language and how it implements IF ELSE THEN constructs (no, that’s not a typo). In Forth, IF is an expression, not a statement, for Forth is also RPN.

  15. Manadar | February 15, 2011 at 04:27 | Permalink

    Hey Robin, I had a similar problem and found your blog post helpful. Eventually I did not use this algorithm but I used something along the lines of:

    If the token is a function token, call a function that parses a function call. (That function calls the shunting yard algorithm again, so it becomes recursive). Place the parsed function in a global dictionary that keeps track of parsed functions, and then push the funciton token to the output queue. Continue parsing where the function call stopped.

    If the token is a function seperate token (a comma): Break loop

    If the token is a right parenthesis
    [..] normal operation
    If the stack runs out without finding a left parenthesis: break loop

    If any non-valid tokens (other keywords, comments, endline tokens, etc.) are found: break loop

    Essentially what this does is treat a function token like any other primitive or variable. And it allows for recursion by automatically stopping to parse. Afterwards, in your RPN algorithm you can get the entire function token from the dictionary again.

    It’s a bit dirty, but it is easily applied if you already have much of the other classes that interface with this ready.

  16. Ibby | December 28, 2011 at 08:16 | Permalink

    Good work! I am currently working on a graphing calculator for a project for my A Levels and didn’t even think about varying numbers of arguments. Definatelly going to include it if I have time though (and of course give you credit for your modification to the algorithm).

  17. FrankZ | September 1, 2012 at 10:50 | Permalink

    I am dealing with the same problem. I just ended up with the same solution that Steven Hughes proposes and it’s implementation doesn’t need any extra stack.

    The objective is to add an extra argument to the functions that take a variable number of arguments indicating how many argumentsts the function is using. So, Min (1,2,3) will become 1,2,3,3,Min. To do that you need to be able to tell when a token is actually a function taking a variable number of arguments. That it is easily done. Somewhere you must write down how many arguments each function take. You need that to evaluate the RPN. Just write -1 for the functions taking a variable number of arguments.

    I just describe the modifications to the wikipedia algorithm.

    When a ‘(‘ is found, check if the operator atthe top of the stack is a variable-arguments function, if it is, push number 0 in the stack.
    Every time a token is a number, check if the element at the top of the stack is a number. If it is, increment it by 1 and push it back in the stack.

    That’s it.

    When the rpn is evaluates you only have to:
    if the operator is a variable-args function, pop the number of arguments from the stack and then pop that number of arguments from the stack.

    This can deal with nested variable-number args functions. For example Min(1,2, Min(3,4) will become 1,2,3,4,2,Min,3,Min.

    It can also deal with left associative function with a variable number of operators. For example, I have to implement the ‘IN’ operator like the mysql one. That is written as ‘A’ IN (‘B’, ‘C’, ‘D’). You handle with that when you evaluate the rpn, actually. Just add this simple rule:

    If the token is a varaible-args function, pop the number of arguments from the stack, if it is also left-associative, add 1 to the number of arguments.

    In this way you will pop the argument to the left of the function.

  18. FrankZ | September 6, 2012 at 07:44 | Permalink

    Hi again,

    I have realized that the algorithm changes in my previous post are wrong. The idea is the good one but the solution is a bit more complicated than that. Still, the solution does not require any extra stack.

    Here are the changes you need with respect to the algorithm on wikipedia:

    Before staring the parsing define an integer like:

    int args_num = -1;

    The add this rules to the algorithm:

    -For each token you examin, if (token != ‘)’ and args_num == 0) then args_num = 1;

    -If (token == ‘,’) then args_num++; then as per wikipedia algorithm

    -if (token == ‘(‘) then stack.push(args_num); args_num = 0; stach.push(‘(‘);

    -if (token == ‘)’) then {
    pop arguments as per wikipedia algorithm up to first ‘(‘;
    args_num_temp = args_num;
    stack.pop(); # this pops the ‘(‘;
    args_num = stack.pop(); # this pops the args_num pushed just before ‘(‘
    if (elemen on top of stack == variable args operator) {
    if (elemen on top of stack is left associative) {
    args_num_temp++;
    }
    send args_num_temp to output stream.
    }

    that’s it. This is tested and working.

    The idea of these changes is as follow:
    -Keep the args_num integer as a running count of the argument number
    -Before pushing ‘(‘ into the stack, push args_num into the stack. In this way you store the current argument count and start a new count for the ‘(‘ you just pusched in the stack.
    -Once the args_num has been set to 0, that is, we found a ‘(‘, any token, except ‘)’ will indicate that at least one argument inside the bracket is present. So, any token except ‘)’ will move the counter from 0 to 1.
    -Once the args_num has move to 1, then only ‘,’ will increase it.
    -When we found a ‘)’, args_num will contain the arguments found in the bracket pair.
    -Notice that args_num counts argument every time a left bracket is found. We would not always need to keep the count as not all left-brackets will be used to include arguments. Left-brackets can be used just for precedence as in (1 + 2) * 3. Anyway, once the ‘)’ is found we can tell if the args_num is needed or not. In fact, after popping the ‘(‘ we know that the next element will be a number, the args_num of the external count. So pop that and the next element in the stack will either be a variable-args function, in which case the args_num was actually keeping the count, so we need to send it to the output stream, of it was not in which case we ignore it. The args_num we popped from the stack will carry on the count of any external variable-args function that was enclosing our bracke pair.
    -In case the variable-args function is left associative there will also be the value on the left of the function. Adding 1 to the args_num before sending it to the output stream will keep into account of that too.

  19. Dr. Moniem Ismail | December 15, 2014 at 20:09 | Permalink

    I was involved in the design and implementation of the 1. PL/I language and compiler (S/360 DOS) in the sixties (IBM Lab Germany). I used the Dystra method and extended it successfully to implement all mixed data type attributes conversion and functions. 1966-1967 I implented a study project in the IBM Endicott/USA Lab (Advanced technolgy) a PL/conversational compiler which converted a whole program to RPN. The execution was by an RPN interpreter. it was presented in one of IBM conference. It is a bitty that I did not these all documented and and I have no access to to all of this info .. Does anyone find something ..

  20. THORILLON Xavier | January 7, 2015 at 04:58 | Permalink

    Hello Robin,

    I was stucked trying to introduced user function into my own math expression evaluator developped in C. Following your pseudo code solved my issues ! Thanx a lot for you explanations of shunting yard algorithm extension. I m still surprised to not see any simple/free implementation on internet, I will certainly post mine to provide this basic tool.

    Good Job, Regards !

    Xavier

  21. Jivko | January 22, 2015 at 09:36 | Permalink

    Hey , I just wanted to ask you what do you mean by
    w and a when you say like pop were values into w ?
    Thank you and you’ve done a great job !!!

  22. David Lively | January 27, 2015 at 14:12 | Permalink

    I spent a few days on this, and wound up with a pretty simple kludge:

    When the token is ListStart (an open paren in most cases), I add it to the output queue.

    Then, when converting or executing the RPN output, and I encounter a function call token, I pop items from the stack until I encounter a ListStart (again, an open paren), discard it, and consider everything in between to be an argument to the function.

    It was a one-line change to my implementation which was basically trans-coded from the Wikipedia algorithm description, and worked great.

  23. Kosarar | April 9, 2015 at 10:06 | Permalink

    Not a bad idea, David! Thank you!

  24. Kosarar | April 9, 2015 at 10:10 | Permalink

    But does it work with “f(a*(b+c),d)”-style function calls?..

  25. Kosarar | May 22, 2015 at 07:33 | Permalink

    I have found another, the best way to perform this job.

    I treat quantifiers, function and predicate names as unary operators of different priorities. I also treat commas inside function calls and predicate calls as binary left-associative (or right-associative, it is not important) operators. So I have only unary and binary operators, and atoms. I store them in structures with all their properties to not confuse a function name and a variable name, for example. All advantages of RPN without its disadvantages. Simple and efficient. Elegant, I would say. 😀

    You are free to use and improve this method.

  26. Gene Pavlovsky | May 31, 2015 at 07:07 | Permalink

    Kosarar, that sounds interesting. How about some more details, maybe explaining your algorithm?

  27. Kosarar | May 31, 2015 at 10:17 | Permalink

    I tried to do this in my previous post. Just usual RPN with operators A, B, f, g (like A(), B(5, 6+7-9), f(a), g(a, b), …), … What do not you understand?

  28. Gene Pavlovsky | June 2, 2015 at 00:18 | Permalink

    What do you mean by quantifiers and predicate names? What is a predicate call?
    Can you provide a simple example (but complex enough to introduce the advantages of your way) of infix expression, then what it becomes after parsing, and finally how is it evaluated?
    That would be real helpful. I’m just starting to implement my first “usual RPN” parser and evaluator first. Maybe I’ll start with the basic algorithm first and then come back to this post to get a better understanding of your idea.

  29. Kosarar | June 2, 2015 at 04:15 | Permalink

    “What do you mean by quantifiers”

    Like “@” (“Exist”) or “?” (“Any”). I work with mathematical logic (specifically, formal arithmetic), see? But it does not really matter in this case.

    “and predicate names?”

    P, Q, R like in P(), Q(), R(). Usually.

    “What is a predicate call?”

    Typing P(a,5,f(1)) in the code.

    “Can you provide a simple example (but complex enough to introduce the advantages of your way) of infix expression, then what it becomes after parsing, and finally how is it evaluated?
    That would be real helpful. I’m just starting to implement my first “usual RPN” parser and evaluator first. Maybe I’ll start with the basic algorithm first and then come back to this post to get a better understanding of your idea.”

    Let’s add 1, 2, … to formal arithmetic I have working example with for better understandability. First two example deal only with standard (not related to mathematical logic) infix. The most important for you is the second.

    Infix list: 1 + f ( a )
    RPN list: 1 a f +
    Priorities list: -3 -2 14 10
    Left-associativity list: -2 -2 -1 1

    Infix list: f ( 1 * g ( 2 , 3) )
    RPN list: 1 2 3 , g * f
    Priorities list: -3 -3 -3 8 14 11 14
    Left-associativity list: -2 -2 -2 1 -1 1 -1

    Infix list: @ a P ( a – 1 , b , ! Q )
    PRN list: a 1 – b , Q ! , P a @
    Priorities list: -2 -3 10 -2 8 -1 6 8 9 5 4
    Left-associativity list: -2 -2 1 -2 1 -2 -1 1 -1 -1 -1

  30. Gene Pavlovsky | June 2, 2015 at 07:04 | Permalink

    Now I see :) I didn’t study math logic and English is not my native language, so I was really not sure what the terms mean. Thanks for clearing up.
    And your examples are helpful. So priorities list are just priority values that you assigned to different token types? So if talking about variable arguments, if a function get 3 arguments for example
    Infix: max(1, 2, 3) ,
    RPN becomes 1 2 3 , , max
    So when evaluating that RPN, the three arguments would be pushed onto operands stack, then encountering the two comma operators, how are they (commas) evaluated? I’m having trouble understanding how would max know it should pop 3 operands from the stack?

  31. Kosarar | June 2, 2015 at 07:11 | Permalink

    Yes, you got it right. But I would say your mind is a bit lazy. Change commas in your example to pluses and then ask again your last question.

  32. Gene Pavlovsky | June 3, 2015 at 04:37 | Permalink

    Lazy is right. 1 2 + would make a 3 and push it back on the stack. Are you saying comma takes two numbers and puts them in array, and pushes the array back on the stack?

  33. Kosarar | June 3, 2015 at 07:25 | Permalink

    No arrays here. Plus is exactly as comma here. Binary operation. No difference.

  34. Gene Pavlovsky | June 4, 2015 at 00:41 | Permalink

    I don’t quite get it.
    1 2 +
    Plus pops two operands, 1 and 2, computes their sum, and pushes the result, 3, back to the stack.
    What is the result of evaluating “1 2 ,” ? What is pushed back to the stack?

    I’m assuming the standard RPN evaluation algorithm is used:
    While there are input tokens left
    Read the next token from input.
    If the token is a value
    Push it onto the stack.
    Otherwise, the token is an operator (operator here includes both operators and functions).
    It is known a priori that the operator takes n arguments.
    If there are fewer than n values on the stack
    (Error) The user has not input sufficient values in the expression.
    Else, Pop the top n values from the stack.
    Evaluate the operator, with the values as arguments.
    Push the returned results, if any, back onto the stack.
    If there is only one value in the stack
    That value is the result of the calculation.
    Otherwise, there are more values in the stack
    (Error) The user input has too many values.

    If not, please mention what modifications to this algorithm you made. Me asking you to give more details or taking the time to describe your whole algorithm (like the OP did) is not being lazy, this is not a classroom where a professor gives tough questions for students to figure out. This is an open discussion and if everyone would post their ideas as clear and understandable as possible, many people all over the world might find it of tremendous help. As well as other people might generate even better solutions, inspired by these posts.

  35. Kosarar | June 4, 2015 at 00:57 | Permalink

    “What is the result of evaluating “1 2 ,” ?”

    “1,2”. As intended and as given in infix form.

    “What is pushed back to the stack?”

    Comma.

    “If not, please mention what modifications to this algorithm you made. Me asking you to give more details or taking the time to describe your whole algorithm (like the OP did) is not being lazy, this is not a classroom where a professor gives tough questions for students to figure out. This is an open discussion and if everyone would post their ideas as clear and understandable as possible, many people all over the world might find it of tremendous help. As well as other people might generate even better solutions, inspired by these posts.”

    I tried to explain my modification (this is mostly a “vision” shift) as well as possible. Let’s do it this way. Consider following examples and say if you see any difference between them; I don’t.

    max ( 1 + 2 , 3 + 4 , 5 )
    1 2 + 3 4 + , 5 , max

    max ( 1 * 2 , 3 – 4 , 5 )
    1 2 * 3 4 – , 5 , max

    max ( 1 , 2 , 3 + 4 , 5 )
    1 2 , 3 4 + , 5 , max

    max ( 1 , 2 , 3 , 4 , 5 )
    1 2 , 3 , 4 , 5 , max

    max ( 1 + 2 + 3 + 4 + 5 )
    1 2 + 3 + 4 + 5 + max

    max ( 1 , 2 , 3 + 4 * 5 )
    1 2 , 3 4 5 * + , max

    max ( 1 + 2 , 3 + 4 , 5 )
    1 2 + 3 4 + , 5 , max

    max ( 1 , 2 – 3 + 4 , 5 )
    1 2 3 – 4 + , 5 , max

    max ( 1 + 2 , 3 + 4 + 5 )
    1 2 + 3 4 + 5 + , max

  36. Gene Pavlovsky | June 4, 2015 at 01:53 | Permalink

    Thank you for continuing to try to explain it to me :)
    I understand your concept and comma as a binary operator. In your example I indeed don’t see any difference between +, -, and “,”.
    The only problem I have is how the evaluation is done.
    You mentioned you treat function (even multi-argument functions) as unary operators. So how do you store multiple arguments as one value on the stack? You already said you don’t use arrays, is it an infix string with commas separating the values?
    Let’s say this is the simplistic version of the evaluation pseudo-code (I used ActionScript 3 syntax here).

    while (token = readToken()) {
    if (token.type == ‘value’) {
    stack.push(token.value);
    }
    else if (token.type == ‘operator’) {
    if (token.value == ‘+’)
    stack.push(stack.pop() + stack.pop());
    else if (…)

    else if (token.value == ‘,’)
    stack.push(stack.pop().toString() + ‘,’ + stack.pop().toString());
    }
    else if (token.type == ‘function’) {
    functionMap[token.value].apply(this, stack.pop().toString().split(‘,’));
    }
    }

    Something like that?

  37. Kosarar | June 4, 2015 at 02:01 | Permalink

    “So how do you store multiple arguments as one value on the stack?”

    As any other arguments (in “unmodified” method).

    “You already said you don’t use arrays, is it an infix string with commas separating the values?”

    I think you are having trouble understanding what we discuss here. We discuss the Shunting-yard algorithm, made for translation lines in Infix notation to lines in RPN. It seems, your last question and example somehow belong to the opposite task – processing of RPN. It is much simpler task though due to straightforwardness of RPN as a whole. But I am not sure, clarify your goal.

  38. Gene Pavlovsky | June 4, 2015 at 02:37 | Permalink

    No, I understand that. I already got the idea of how to modify the shunting-yard algorithm to translate infix to RPN, with your changes.
    So this step
    max ( 1 , 2 , 3 , 4 , 5 )
    -> Shunting-Yard Algorithm (modified) ->
    1 2 , 3 , 4 , 5 , max

    I only have a question of how do you evaluate this RPN expression. I proposed two variants – with Array and String, but what is your implementation? Algorithm or pseudo-code is welcome.

  39. Kosarar | June 4, 2015 at 02:39 | Permalink

    What is “evaluating” for you? For me this is the final answer:

    “So this step
    max ( 1 , 2 , 3 , 4 , 5 )
    -> Shunting-Yard Algorithm (modified) ->
    1 2 , 3 , 4 , 5 , max”

  40. Gene Pavlovsky | June 4, 2015 at 06:49 | Permalink

    Shunting-yard is a method of parsing infix expression. In this case, result of parsing is the input expression represented in RPN.
    Evaluating that RPN expression would result in one value.
    In case of max(1,2,3,4,5), evaluating it would result in 5.
    How would you evaluate
    1 2 , 3 , 4 , 5 , max
    Into the resulting value 5?

  41. Kosarar | June 4, 2015 at 08:58 | Permalink

    Just as in infix notation, even simpler (and I was right, your task is closely related to “reverse transformation”; it is not my task though). We can take “max” and then the next (one) set of tokens (since max is unary (because it is a function)). This one argument can have a comma as its rightmost token. In that case we take one set and then another. And pass them to written-by-us function “max”, for example. Or to built-in one.

  42. Gene Pavlovsky | June 4, 2015 at 18:59 | Permalink

    Ok, thanks for nice discussion Kosarar. Your input is appreciated. I think your idea is quite elegant, but the implementation of the evaluate(execute) part involves a bit more changes to the original algorithm than David’s somewhat less-elegant but practical idea:

    “When the token is ListStart (an open paren in most cases), I add it to the output queue.

    Then, when converting or executing the RPN output, and I encounter a function call token, I pop items from the stack until I encounter a ListStart (again, an open paren), discard it, and consider everything in between to be an argument to the function.”

  43. Kosarar | June 5, 2015 at 03:20 | Permalink

    Thank you too, Gene. Maybe evaluation in David’s method is better, maybe not. Views differ.

  44. Gene Pavlovsky | June 20, 2015 at 18:05 | Permalink

    I’ve tried David’s and Kosarar’s ideas and found out both of them are really easy to add to the shunting yard algorithm.

    Relevant portion of the (unmodified) shunting yard algorithm says:
    If the token is a left parenthesis, then push it onto the stack.

    To implement David’s way:
    If the token is a left parenthesis, then push it onto the stack. If there is a function token at the top of the stack, add a left parenthesis (LIST_START) to the output queue.
    Example:
    Infix: F(a,b,c)
    RPN: ( a b c F

    To implement Kosarar’s way just add the comma to the operators list, as the lowest precedence left-associative infix operator (associativity doesn’t really make a difference), and remove the comma handling from the shunting yard algorithm.

    The RPN evaluation itself in my personal opinion so far is easier and more straightforward to implement in David’s way.

    Relevant portion of the (unmodified) RPN evaluation algorithm:

    If the token is a value, push it onto the stack.
    If the token is an operator (operator here includes both operators and functions).
    It is known a priori that the operator takes n arguments.
    If there are fewer than n values on the stack
    (Error) The user has not input sufficient values in the expression.
    Else, Pop the top n values from the stack.
    Evaluate the operator, with the values as arguments.
    Push the returned results, if any, back onto the stack.

    My implementation of the RPN evaluation algorithm using David’s way:

    If the token is a value or a left parenthesis (LIST_START), push it onto the stack.
    If the token is an operator (operator here includes both operators and functions).
    Create an empty arguments array.
    While the item at the top of the stack is not a left parenthesis (LIST_START), pop items off the stack onto the arguments array.
    Pop the left parenthesis (LIST_START) from the top of the stack.
    It is known a priori that the operator takes [minArity; maxArity] arguments.
    If the arguments array contains fewer than minArity or more than maxArity values
    (Error) Too few (too many) operands for the operator.
    Else, reverse the arguments array, and evaluate the operator with these arguments.
    Push the returned results, if any, back onto the stack.

    Example:
    RPN: ( a b c F
    The left parenthesis and all the values a b c gets pushed onto the stack.
    The function call F creates an array [], pops c, b, a onto that array, resulting in [c, b, a].
    Then, the array is reversed, resulting in [a, b, c], the function F is applied with these arguments, and the result is pushed back onto the stack.

    My implementation of the RPN evaluation algorithm using Kosarar’s way consists of two parts:
    The comma binary operator is implemented in the function addArgument:

    addArgument(a, b):
    If argument a is a Number, return a new Array [a, b].
    Else, argument a is an Array, push argument b onto the array a, and return the array.

    The RPN evaluation algorithm is modified like this:

    If the token is an operator or an unary function.
    It is known a priori that the operator takes n arguments.
    If there are fewer than n values on the stack
    (Error) The user has not input sufficient values in the expression.
    Else, Pop the top n values from the stack.
    Evaluate the operator, with the values as arguments.
    Push the returned results, if any, back onto the stack.
    If the token is a binary or higher arity function (including variadic functions this is all about).
    Pop the arguments array from the stack.
    It is known a priori that the operator takes [minArity; maxArity] arguments.
    If the arguments array contains fewer than minArity or more than maxArity values
    (Error) Too few (too many) operands for the operator.
    Else, evaluate the operator with these arguments.
    Push the returned results, if any, back onto the stack.

    Example:
    Infix: F(a,b,c)
    RPN: a b , c , F
    The values a b are pushed onto the stack.
    The operator comma pops b, a from the stack, evaluates addArgument(a, b) which returns Array [a, b], and pushes it back onto the stack.
    The value c is pushed onto the stack.
    The (second) operator comma pops c, [a, b] from the stack, evaluates addArgument([a, b], c) which returns Array [a, b, c], and pushes it back onto the stack.
    The variadic function F pops the Array [a, b, c] from the stack, applies function F to these arguments, and pushes the result back onto the stack.

    Both algorithms have no problem with nested function calls / nested parentheses.

{ 1 } Trackback

  1. […] http://www.kallisti.net.nz/blog/2008/02/extension-to-the-shunting-yard-algorithm-to-allow-variable-n… Share this:TwitterFacebookLike this:LikeBe the first to like this post. This entry was posted in Uncategorized. Bookmark the permalink. ← Flee – Fast Lightweight Expression Evaluator […]

Post a Comment

Your email is never published nor shared. Required fields are marked *