Friday, March 4, 2011

The Problem with Atoms

First: What is an "Atom" (and what is it good for)?
In Erlang, an Atom is a special kind of constant. A constant that could be used without defining it elsewhere and that's identified by the syntax (all lower-case or enclosed in single quotes):

Printer = spawn(...),
Printer ! { do_print, "hello world" },
...

The spawned actor now might match on the atom do_print:

receive
    { do_print, What } -> ...
end

Atoms are a way to "tag", resp. to structure tuples / messages. And this approach is far better than abuse some integer values ("enums").

My last post was about Pattern Matching in C++. We do match on the types first, so Atoms should have distinct types. Those types should be declared while they are used. That's what templates are for. But what's the template parameter(s)? const char* is obviously a bad choice. String literals could not be used as template parameters and the workaround (declare a string with external linkage) does not solve the problem of distinct types. But variadic templates (char...) do.

If we translate the example above, it might look like this:

template<char... Name>
class atom { ... };

...

auto printer = spawn(...);
printer.send(atom<"do_print">, "hello world");

...

receive(on<atom<"do_print">, std::string>() >> ... );

Well... That's not valid C++(0x). Unfortunately!
Although I'm not the only one who would love to see this to become valid C++. But it's not in the current working draft.

By the way, this is how it would be accepted by the compiler:

auto printer = spawn(...);
printer.send(atom<'d','o','_','p','r','i','n','t'>, "hello world");

Would you disagree if I say this is ridiculous? I guess not.
But what's the next best we can do? Maybe a user defined suffix:

template<char... Name>
atom<Name...> operator "" _atom();

auto printer = spawn(...);
printer.send(do_print_atom, "hello world");

...

receive(on<decltype(do_print_atom), std::string>() >> ... );

It's far better then giving a list of characters, but decltype(...) is annoying (and verbose). And I couldn't test it, because user-defined literals are not implemented by any compiler by now (GCC, clang, VisualStudio, Comeau, ...).

I'm afraid that I'm forced to skip atoms until user-defined literals are available (or my favorite solution becomes legal in a future draft version).

Pattern Matching

The following example illustrates how libcppa actually looks like:

using namespace cppa;

void slogger()
{
    receive(on<int>() >> [](int value) {
        reply((value * 20) + 2);
    });
}

int main(int, char**)
{
    auto sl = spawn(slogger);
    send(sl, 2);
    receive(on<int>() >> [](int value) {
        // prints 42
        cout << value << endl;
    });
    
    return 0;
}

Although this is a very short snippet, it shows two main goals of libcppa:

  1. The library should look like an (internal) DSL and in particular should not force the user to write any glue code (e.g. derive a class actor and override virtual functions, etc.).
  2. Any context (even main) is silently converted into an actor if needed.
Messages are matched against a pattern. In Erlang, those patterns may ask for types in the guard. In a strong typed language like C++ it makes obviously more sense to match the types first.

Pattern are defined as on<T0, T1, ..., TN>() >> X. Where T0...TN are the types and X is a Callable (function or functor) that should be invoked on a match.

One more snippet that plays a bit more with patterns:

#include <iostream>
#include "cppa/cppa.hpp"

using std::cout;
using std::endl;
using namespace cppa;

void foo()
{
    auto patterns =
    (
        on<int, anything, int>() >> [](int v1, int v2)
        {
            cout << "pattern1 { " << v1 << ", " << v2 << " }\n";
        },
        on<std::string>() >> [](const std::string& str)
        {
            cout << "pattern2 { \"" << str << "\" }\n";
        },
        on(std::string("1"), any_vals) >> []()
        {
            cout << "pattern3 { \"1\" }\n";
        },
        on(1, val<std::string>(), any_vals) >> [](const std::string& str)
        {
            cout << "pattern4 { \" 1, " << str << "\" }\n";
        }
    );
    // receive 4 messages
    for (int i = 0; i < 4; ++i) receive(patterns);
}

int main(int, char**)
{
    //auto f = spawn(foo);
    auto f = spawn(foo);
    // prints: pattern2 { "hello foo" }
    send(f, "hello foo");
    // prints: pattern3 { "1" }
    send(f, "1", 2, 3);
    // prints: pattern1 { 1, 3 }
    send(f, 1, "2", 3);
    //  prints: pattern4 { "2" }
    send(f, 1, "2", "3");
    await_all_others_done();
    return 0;
}

There are basically two ways to use on():

The first one is to match for types only. In this case, you don't pass any arguments to the function and specify your pattern via the template parameter list. See pattern 1 & 2 for an example.
anything is a wildcard expression matching any number - zero or more - of elements.

Passing other pointer or reference types will cause a compile-time error.

The second way is to match for values. See pattern 3 & 4 for an example. The global constexpr value any_vals has the equivalent meaning of anything in the template parameter list. If you want to match for a type only, use the val() template function at that position.

Why actors?

This blog is - mainly - about the development of libcppa (C++ actor library). But why should we use actors in C++ instead of a signal / slot implementation?
Why bother about "yet another way of decoupling"?

Well, the actor model is all about concurrency. Concurrency by design rather than concurrency by implementation. Mutexes, semaphores and thread primitives are the wrong level of abstraction for programming many-core systems. Think about machines with 20 or more cores (this is the future and there's no way to get the free lunch back!). Writing both thread-safe and scalable code using low-level primitives is challenging and error-prone. You should not think about how to make use of those cores. You should use a level of abstraction that allows your runtime system to make use of as many cores as possible. This is the point of libcppa: raise the level of abstraction in C++. Furthermore, the actor model unifies concurrency and distribution. Since actors communicate by network-transparent, asynchronous message passing, one can freely distribute actors among any number of machines.

Of course, the actor model is not the only high level abstraction for concurrency. There is libdispatch, (software) transactional memory and agents or many other approaches such as concurrent channels (e.g. in googles go) or intel blocks. However, the actor model is the only computation model that applies to both concurrency and distribution. Available real-world applications of the actor model (such as Erlang or Scala libraries) have shown that the actor model leads to scalable applications with a (usually) good software design consisting of small, easy-to-understand and easy-to-test software components.

The actor model is very easy to understand and to explain. Everyone knows what a message is and how a mailbox works. All you have to explain is the syntax to get a message from actor Alice to actor Bob and how Bob could read and compute messages from its mailbox. And there is no reason why C++ should not provide an implementation of the actor model to ease development of concurrent and distributed systems.