Implementing the native functions in DataWeave

In the journey to build a good useful language, we decided to enforce consistency wherever we can. To achieve this, one of the fundamental steps was to get rid of the hardcoded operators.

Before this, functions such as map reduce filter were reserved words of language. That made the language very little extensible and made the fixes were scarce, because we had to deliver the whole product to fix an error.
At the same time, we had to maintain consistency with DataWeave 1.0, because we couldn't afford to let users learn the new version from scratch.
That's why we introduced the infix functions. These are normal functions, which can be used between arguments instead of being a prefix.

So now

[1, 2, 3] map (val, ix) -> { value: val, index: ix }
// Is the same as
map([1, 2, 3], (val, ix) -> { value: val, index: ix })

Because map is now a function.

How do we create infix functions?

It is really easy indeed, it is just a function.

fun myInfix(lhs: String, rhs: String): String = "$lhs $rhs"

"Hello" myInfix "world!"

That script returns a value "Hello world!".

When we wanted to make map become a function, we found that it was overloaded for Ranges and Arrays. In that moment we decided to implement function overloading in the language.

fun myInfix(lhs: String, rhs: String): String = "$lhs $rhs"
fun myInfix(lhs: Number, rhs: Number): Number = lhs + rhs

  "Hello" myInfix "world!",
        1 myInfix 2


["Hello world!", 3]


We knew that precedence was a complicated thing in DataWeave 1.0, people spent their time fixing their scripts using dozens of parenthesis. We decided to solve it by unifying precedence in a simple way.
All that is on the left side of an Infix operator is the first argument, and the first expression on the right side is the second.

   1 to 100 map { valX: $ } filter isEven($.valX) groupBy floor($.valX / 4)

// 1 to 100
//          map { valX: $ }
//                          filter isEven($.valX)
//                                                groupBy floor($.valX / 4)

But wait, what is the dollar sign?

It is an argument of an automatic function injection. I explain it in detail here

We are still missing a mechanism to replace core functions like map

Previously, we had no way to iterate over arrays or objects. The language itself did not provide any mechanism to do that. We decided to implement tail call functions to solve that problem.

Briefly, it allows us to create streamable recursive functions, and that's really important due the async nature of DataWeave.

We use tail recursion to create lazy generators, a good example could be a sequence generator.

fun generator(start: Number = 0): Array<Number> = [start ~ generator(start + 1)]
// This script will pick 100 numbers from 
// the generator, that is also being filtered
(generator(1000) filter isEven($))[0 to 99]

Here is an example of tail call and async generators using this feature. It is a prime number finder using DataWeave.

Going back to the map function, in DataWeave 1.0 it was a Scala binding. It was a hardcoded operator in the language. Right now we can implement the map function directly in DataWeave, without external bindings. It looks like this

// for clarity, we removed the index argument of the mapping function

fun map<I,O>(list: Array<I>, fn: (elem: I) -> O): Array<O> =
  list match {
    // When we receive an empty list, we return an empty list.
    // The execution finishes there.
    case [] -> []

    // When we receive at least one element in the list,
    // we take off the head and let a lazy tail in the stack
    // Then we construct an array with the transformed head element and a
    // lazy tail. Since the construction and deconstruction of tails
    // are lazy operations, this whole process behaves as a stream.
    case [head ~ tail] -> [fn(head) ~ map(tail, fn)]