16 July 2011

Scala Actor v Erlang gen_server Performance

One of the key performance attributes for Actors is the speed with which messages can be consumed and processed. If the overhead of messaging is too high then Actors can't be used for solving some types of concurrency problems. Ideally the overhead of an Actor should be similar to method calls to make it generally useful, although some overhead has to be expected. While implementing a simple CountingActor in Scala I noticed lower than expected performance and I decided to compare it to an Erlang OTP gen_server implementation.

All the code to reproduce the results can be found on github.

This is far from an ideal benchmark. For a start it is too short. Tests greater than 3 million require a lot of memory. The test also doesn't benefit from warm up in the JVM and it is clearly a very simple a problem for an Actor to solve (all be it similar to what I actually want to do in a real life problem). On the other hand its simplicity does give us a good idea on how quickly an actor can receive and consume messages with the logic having very low complexity.

The problem statement

Create an actor that accepts two messages:
  1. A positive number message that is added to an existing count.
  2. A GetAndReset message that returns the current count and sets the count to 0.
For example the Actor might receive the messages 1 and 5. Its count would be 0+1+5=6 after the two messages are processed. If we then passed the message GetAndReset we would get back the value 6 and the internal count would be reset to 0.

The performance test is running 3 million Count messages all of them having the count 100. Then this is followed by a synchronous GetAndReset. The Actor should at the end have a count of 3000000*100.


Language       Time(s)       Throughput (msg/s)
Scala                  10                        300k
Scala recursive     5.9                       500k
Erlang OTP           1.7                    1.8 million
Erlang receive       1.2                    2.8 million

The two languages/implementations achieve wildly different performance results. Both get all 4 cores of my CPU to 100% usage quickly after the start of the test but you can see that Scala takes a little longer to go past 50%, which I suspect is due to the Range.par's implementation of foreach. Ignore the little spikes there, that's just setting up and capturing the test results.



Erlang is 6x faster. Since these are both compiled languages that is quite a big difference. It feels especially large considering the relative feature set of the two implementations. The Erlang implementation is using an OTP gen_server with all the code replacement and other goodies that comes with it, the Scala Actor is vanilla in comparison. 300K messages a second is not what I was hoping for, especially when we can see Erlang achieving millions. If I implemented a very basic Actor in Erlang using the basic send and receive functionality I suspect I would see even higher performance.

In lines of code the Scala solution is a lot shorter. The first reason is the OTP overhead, there are a number of boilerplate methods to implement the gen_server behaviour interface and then I had to add the associated wrapping methods to make it all work. The second reason is that there isn't a parallel foreach implementation in Erlang by default, so I implemented one based on splitting a Range X ways to simulate a similar algorithm to Parallel Range from Scala.

You can now commence ripping the benchmark apart in the comments.

Added recursive code timings for Scala. Scala can do 450k messages a second.

In the code you will now find an implementation of a server using a bare receive version for Erlang. Its noticeably faster than the OTP version, but its performance is also less consistent and varies from 2.4 million to 2.8 million messages a second. That is a over 50% faster and widens the gap to Scala with recursive react to  6.2x faster.


  1. Wonder if usage of loop could be one of the reasons ? http://erikengbrecht.blogspot.com/2010/08/scala-actors-loop-react-and-schedulers.html

  2. Removing the use of loop { ... } and instead replacing it with a recursive call to act improved performance to 470k/s. Still massively behind Erlang but an improvement.

    I will also try tweaking the Scheduler as suggested in that post and see where that gets me.

  3. also, what about switching erlang side to use bare receive clause? gen_server has a certain overhead

  4. steve.mcjones@googlemail.com16 July 2011 at 22:50

    Maybe you should try the Akka actors instead ... there is a reason why they will eventually replace the Scala ones. :-)

  5. Replacing with Akka actors results in erlang being faster only by 8% in parallel scenario, while being slower by 44% in case of sequential scenario (thus leading me to believe its the fork join framework not message passing thats slower in scala). Details posted here : https://plus.google.com/u/0/112820434312193778084/posts/HdKFx4VQtJj

  6. Posted a different set of results observed when using either recursive react or akka here : https://plus.google.com/u/0/112820434312193778084/posts/HdKFx4VQtJj

  7. If you use Akka it looks like the performance of Erlang vs. Scala is fairly close, namely within 50% of each other. Looks like both Erlang and Scala (with Akka) are good choices for high throughput systems.


  8. @Dhananjay, Daniel: Don't destroy a perfectly rigged benchmark!

  9. I converted this test to use AsyncFP: https://github.com/laforge49/Asynchronous-Functional-Programming/wiki

    Here's my code: https://github.com/laforge49/Asynchronous-Functional-Programming/tree/master/Blip/src/test/scala/org/agilewiki/blip/intro/loopTiming

    And here's the output:

    Test took 36.561 seconds for 100000000 messages
    Throughput=2735154 per sec

    I got the impression that the other benchmarks were run on an i5. My i5 laptop is currently under repair, so I ran the test on my old laptop: Intel Core 2 Duo T6400 @ 2GHz.

  10. Been thinking about the differences between AsyncFP and Scala actors. Messaging with AsyncFP is much more like a method call, as both have implicit flow control.

    And while the Scala actor and AsyncFP versions of this benchmark have about the same throughput, the AsyncFP version is much better behaved as a consequence of its flow control--it does not use a lot of memory and is consequently not limited to 3 million messages.

    Contrast this with Scala actors. If you changed the code so that one message is processed at a time, the way AsyncFP does, throughput would drop dramatically. AsyncFP can do this because it processes everything on a single thread by default.

  11. I know this is a bit of a old subject but what do you think of the results by Franz at:

    Scala, Akka and Erlang Actor Benchmarks

    By his results Akka is able to process 1 million more messages then Erlang?

  12. I think Scala's default Actor library performs quite poorly and that Akka looks more promising on that front, which is good news.

    What I am still looking for is an actor system that does IO processing as well as Erlang does. I have looked through quite a lot of Akka documentation and despite people telling me it has Async IO I can't find it. That is the really important feature I still wish to find that I had in Erlang.

  13. This may help:


  14. What operating system did you run these tests on? I've tried running them on the Mac, and I'm seeing Scala run about 5x faster than Erlang. I'm wondering if the Erlang implementation on OS X is significantly slower.

  15. I ran these tests on Windows 7 64 bit.
    Is one of your results similar to mine and the one much reduced or are they just totally different?