# rumigaculum.com

## A lazy stream implementation in C++11

(Article also published at codeproject on 2012-12-21)

## Introduction

After attending a free introductory course on functional programming at Coursera I got fascinated by the power of lazy (non-strict) collections. I was taught the basic functioning of the Stream class in Scala (basically a simply linked list with non-strict evaluation). This was a radical change in my view on how collections could be structured and how their elements could be evaluated.

Coming from a mostly imperative world (as a C++ hobbyist), functional programming was an exotic mythical beast I was yearning to tame into a language I felt much more comfortable with.

Browsing in the Internet, my first inspiration came from Example 2 in this Louis Botterill's techy blog insightful post. In this example, Louis depicts a home-made but very inspiring imitation of the Scala Stream class. I took his design and tried to poured it into an equivalent C++03 class, with no success at all after several days. It was not until I moved to C++11 when I began to see the light at the end of the tunnel... But it is not yet time to show the innards of the resulting implementation.

## The fascination of infinite streams

Let´s have a look at this simple instruction:

`lazy_stream<int> stream_a = lazy_stream<int>::from(3);`

To put it simply, stream_a is the set of all integers bigger than 2, i.e.: {3, 4, 5,...}. It represents an infinite set on which we can apply some intriguing operations we may not be used to. The first one is filtering:

`lazy_stream<int> stream_b = stream_a.filter(    [](int value){        return value % 4 == 0;    });`

We have just created another infinite set called stream_b. In this case it is the subset of all multiples of 4 in stream_a. The filtering is achieved by means of an anonymous (lambda) function, in the above example the pattern: [](...){...}. The operation represents the set {4, 8, 12,...}.

Ok. Nice, but how can I hold an infinite set in memory? In fact you can't, but you do not need to. At the point stream_a is created, the only known element is the head of the stream, i.e.: the number 3. The rest (called the tail) is only an embryonic, not yet born stream that can be created by invoking a function we might call tail generator. To invoke the tail generator we use the tail method. Thus:

`stream_a.tail();`

creates a brand new lazy stream whose head is the number 4 and whose tail is another tail generator (just a function).

But not all the operations are equally lazy. from is pure lazy, but filter is not so. In fact it internally calls tail until the first filter-compliant occurrence is found (number 4 in our example). It is easy to see that if no such occurrence is found we will get into trouble (remember: the stream is infinite).

Let's have a look at another pure lazy operation, map:

`typedef std::pair<int, int> pair_type;`

`lazy_stream<pair_type> stream_c = stream_b.map<pair_type>(    [](int value){        return std::pair<int, int>(value, value + 1);        });`

We have created a new lazy stream stream_c by mapping every value element of stream_b into the pair (value, value+1). stream_c is then: {(4, 5), (8, 9), (12, 13),...}.

Once an infinite stream is created, we may be soon interested in exploring its elements. To take the (say) first 10 elements of stream_c, we write:

`lazy_stream<pair_type> stream_d = stream_c.take(10);`

Observe that stream_d is a finite stream. In fact, we can take its size:

`std::cout << stream_d.size() << std::endl; // Prints 10.`

As you may imagine, size is a strict (non-lazy) operation. You have to evaluate all the elements of a lazy stream in order to know how many they are.

If we are only interested in getting a single element, we can use the get method. It just retrieves the element of a specified index n {n=0, 1, 2,...}. Unfortunately, it must evaluate the prior n-1 elements as well.

`std::cout << stream_b.get(2) << std::endl; // Prints 12.`

A worn-out but very illustrative example of a lazy stream is the extraction of the first n prime number by means of the so called Sieve of Eratosthenes.

Without entering in more explanations, this could be the code:

`// The sieve function object does the hard work.std::function<lazy_stream<int> (const lazy_stream<int>&)> sieve =[&sieve](const lazy_stream<int>& start){`

`    int head = start.head();`

`    // We filter the primes with respect to head.    lazy_stream<int> temp = start.filter([head](int value){        return value % head > 0;        });`

`    // We return a new lazy stream with its own head and tail generator.    return lazy_stream<int>(head, [&sieve, temp](){        return sieve(temp);        });};`

`// prime_numbers holds all the prime numbers.lazy_stream<int> prime_numbers = sieve(lazy_stream<int>::from(2));`

`// Let's take the first 100 prime numbers and pour them into a std::list.std::list<int> prime_number_list = prime_numbers.take(100).to_list();`

## The not so fancy finite lazy streams

Not so gorgeous, but they do not look that bad after all. To compose a finite lazy stream, we just prepend elements to an already existing finite stream. The simplest finite lazy stream is the empty one, also called nil.

`lazy_stream<int> stream_e = 1 & (2 & lazy_stream<int>::nil);`

The use of parentheses is a pity, but all the right associative operators in C++ can be overloaded only as members of a class, so I have been compelled to use a left associative operator &. The equivalent Scala operator #:: needs no parentheses, for instance.

Some interesting notes:

• The size of stream_e is 2.
• lazy_stream<int>::nil is equivalent to lazy_stream<int>().
• We can always prepend an element to an already lazy stream, no matter whether this is finite or infinite.

Save the generation function from, the rest of operations (filter, map, take, etc.) are equally available for finite streams. For integral types we can create, for instance, finite ranges:

`// This is the lazy finite set {100, 99, 98,... ,2}.lazy_stream<int> stream_f = lazy_stream<int>::range(100, 1);`

There are other operations it is not worth mentioning. I just would like to highlight one very important reducing function in functional languages: fold_left. This operation first takes a starting value of type U (let's call it start). Then, with each one of the elements in the stream it operates a given function op which takes the start value plus the element value and stores the result in start. It then proceeds with the next element an so on. The final result is a single value of type U.

Let's add the double 0.5 plus all the elements in stream_f:

`// Observe that stream_f is a lazy stream of type int.// The result is of type double instead.// (Types of start and result must be the same).double addition = stream_f.fold_left<double>(0.5)(    [](double accum, int element){        return accum + element;        });`

## Purpose of the lazy_stream<> class

With the writing of lazy_stream<> I did not intend to make a full-fledged, optimized and efficient piece of code. I just wanted to realized how the new C++11 specification could significantly move the scope of the language to a more functional (less imperative) view.

In fact, one of features of lazy_stream<> class is that all its objects are immutable, i.e.: once created, they cannot be modified. This is a common trait of functional programming objects, generally disregarded by imperative languages, because immutability sometimes leads to a lesser memory efficiency; many times it is much cheaper to modify an object than copying it.

On the other hand, I think this design would have been very tricky if not impossible with previous specifications of C++. I will discuss this idea later.

## Implementation details

Both finite and infinite streams keep three smart pointers to dynamically allocated information in the heap, namely:

1. A std::shared_ptr smart pointer, called head_ptr_, to the head (a value of generic type T).
2. A std::shared_ptr smart pointer, called tail_ptr_, to the stream tail (another lazy_stream<T>).
3. A std::shared_ptr smart pointer, called tail_gen_ptr_, to the stream tail generator (a function class of type std::function<lazy_stream<T> ()>).

Pointers [2] and [3] are mutually exclusive in order to keep the design consistent. Thus, one of them is directly set to nullptr since object creation.

The notation std::function<lazy_stream<T> ()>, aliased in the code as tail_gen_type, is quite weird to a C++03 developer eyes. The above type can wrap any function with no input parameters and returning a lazy stream. That includes:

• Plain standalone functions
• Member functions (static or non-static)
• Lambda (anonymous) functions, with or without closures.

The use of tail_gen_type is the keystone of the whole design. The use of lambdas (with or without closures) representing tail generators is widespread along the code. All those functions can be wrapped (or represented) by the same type: tail_gen_type.

As an illustrative example, let's have a look at take method implementation. Remember that this method takes an input stream and returns a new stream with the first n elements of the former.

`// take method implementationtemplate <typename T>lazy_stream<T> lazy_stream<T>::take(size_t n) const{    if(empty_ || n<1)        return lazy_stream<T>();     // [1]`

`    const tail_type& tail_ = tail(); // [2]`

`    return lazy_stream<T>(*head_ptr_, [tail_, n]() ->lazy_stream<T> {return tail_.take(n-1);}) // [3]}`

This method, as well as the rest of methods in lazy_stream<>, is const.

If this stream is empty or if we ask for less than one element, we just return an empty stream [1]. Otherwise we evaluate the tail stream, invoking tail() [2]. Eventually we create and return a brand new lazy stream [3] whose head is this's head and whose tail generator is given by the lambda:

`[tail_, n]() ->lazy_stream<T> {    return tail_.take(n-1);    });`

Observe that once the new lazy stream is created and returned, no further actions are taken. At this point we only know the head of this second stream (*head_ptr_). If we now invoke tail() on this newborn stream, a third lazy stream will be created by evaluating:

`tail_.take(n-1);`

in the lambda, where tail_ is the tail of the first stream.

We have been lazy, but in fact we could have been even more. Consider this alternative (but not implemented) code:

`// Alternative take method implementationtemplate <typename T>lazy_stream<T> lazy_stream<T>::take(size_t n) const{    if(empty_ || n<1)        return lazy_stream<T>();`

`    const tail_ptr_type& tail_ptr = tail_ptr_;    const tail_gen_ptr_type& tail_gen_ptr = tail_gen_ptr_;`

`    return lazy_stream<T>(*head_ptr_,                          [tail_ptr, tail_gen_ptr, n]() ->lazy_stream<T> {                              // This cannot happen                              assert(!(tail_ptr && tail_gen_ptr));`

`                              if(tail_ptr)                                  return tail_ptr->take(n-1);`

`                              return (*tail_gen_ptr)().take(n-1);                              });}`

The above implementation is indeed a little more efficient than the original one: we do not pass the stream head into the lambda. Besides, it is a little lazier: the first invocation of tail() is inside the lambda. I have disregarded it only for a subjective reason: it loses clarity with respect to the original one and it is a little more verbose. Feel free to choose what you like more.

Because the lazy streams herein developed are immutable, they cannot be assigned (assignment operation is disabled). Copy-construction, on the other hand, is available at a reasonable cost. As we already know, a lazy stream object stores its information in the heap, keeping only two active smart pointers pointing to it. When copying, we are actually sharing resources ownership.

## Implementation pitfalls in C++03

My first attempt after the Scala course was to write a lazy stream class in C++03. After racking my brains for a long while, I gave up. In C++03 there are no lambdas, so in order to approach them I tried to use function classes (functors). For instance, let’s turn the following lambda into a functor:

`[tail_, n]() ->lazy_stream<T> {    return tail_.take(n-1);    });`

The equivalent functor could be:

`template <typename T>class take_functor{    typedef lazy_stream<T> tail_type;    T n_;            // Closure parameter    tail_type tail_; // Closure parameter`

`public:`

`    take_functor(const T& n, const tail_type& tail) : n_(n), tail_(tail) {}`

`    tail_type operator()() const    {        return tail_.take(n_-1);    }};`

The essential difference lies in the fact that every lambda function with symbolic: type ()=>lazy_stream<T> (in Scala notation), no matter how it may be, can be represented by a std::function<lazy_stream<T> ()>. As far I know an analogous representation for functors is not possible in C++03. Moreover, std::function does not even exist in C++ specifications prior to 2011. Thus, when using C++03 I was not able to give all the functors I wrote a common representation. What unique type in C++03 can hold a functor with any closure parameters representing a function of type: ()=>lazy_stream<T>? I think none. So the key problem was I could not figure out how to write a tail generator in C++03.

## Source code

The lazy_stream<> implementation code has been attached, along with a main sample code. Only a subset of the corresponding Scala methods (the most insightful, from my viewpoint) has been translated. The class allows extension and further optimization.

The code has been written with Code::Blocks and compiled with MinGw (mingw-get-inst-20120426).