Oh great, another benchmark: Elixir Task vs Ruby Thread
Note: I'm on an M2 Pro (12 cores)
Yeah, benchmarks across languages usually suck and I'm not claiming that this one will be any better. I just want to share some interesting observations that I'm making this morning. My inspiration for this came from trying to see how much punishment a GenServer could take if I were to spin up thousands or millions of processes and then have those processes perform some operation on the server. I wanted to know if it be able to handle the load and of course, how quickly would the entire operation take.
Elixir: Sequential addition
I first made a simple GenServer that would perform addition using the number passed into it.
defmodule Cache do
use GenServer
@impl true
def init(num), do: {:ok, num}
@impl true
def handle_call(:get, _from, state), do: {:reply, state, state}
@impl true
def handle_cast({:add, num}, state) do
{:noreply, num + state}
end
end
There's also a getter function to see what the current state is just so I could verify that every operation reached the GenServer and was processed (it did btw).
Here was my first attempt without creating any processes and just hitting the server sequentially.
defmodule Processes do
def add_to_cache(num) do
add_without_processes(num)
end
defp add_without_processes(num) do
{:ok, pid} = GenServer.start_link(Cache, 0)
{t, _pids} =
:timer.tc(fn ->
for _n <- 1..num do
GenServer.cast(pid, {:add, 1})
end
end)
IO.puts("Finished in #{div(t, 1000)}ms")
end
end
The GenServer handles this very easily and is able to process a million requests in a tenth of a second and 10 million in just 1.5 seconds.
No. of operations | Time (ms) |
---|---|
100 | .032 |
1000 | .36 |
10_000 | 3.196 |
100_000 | 24.231 |
1_000_000 | 149.732 |
10_000_000 | 1582.025 |
Elixir: Asynchronous addition
So the GenServer can handle a lot of throughput even if it's just a simple operation. Obviously, I didn't want to stop there. I wanted to see what would happen if we mocked this as an expensive operation which really just means utilizing sleep
. Something like...
:timer.tc(fn ->
for _n <- 1..num do
:timer.sleep(10)
GenServer.cast(pid, {:add, 1})
end
end)
No. of operations | Time (ms) |
---|---|
100 | 1099 |
1000 | 11001 |
10_000 | ??? |
100_000 | ??? |
1_000_000 | ??? |
10_000_000 | ??? |
Pretty obvious result, it takes longer. So let's wrap each of these operations in a Task
instead and have them work asynchrnously.
def add_with_processes(num) do
{:ok, pid} = GenServer.start_link(Cache, 0)
{t, _pids} =
:timer.tc(fn ->
for _n <- 1..num do
Task.async(fn ->
:timer.sleep(10)
GenServer.cast(pid, {:add, 1})
end)
end
end)
:timer.sleep(10) # let the remaining processes finish up
IO.puts("Finished in #{div(t, 1000)}ms")
end
No. of operations | Time (ms) |
---|---|
100 | 6 |
1000 | 3 |
10_000 | 38 |
100_000 | 421 |
1_000_000 | 6605 |
10_000_000 | 91887 |
Clearly, my results show a discrepancy in the way that I'm measuring the time it takes for these processes to finish. But just ignore that! The important thing to note is that for operations >= 10_000k, we have a distribution that we would expect and the server is still able to process lots and lots of messages in a reasonable amount of time.
Ruby: Asynchronous addition
Ok so we haven't really learned anything so far. Aysnc make code go fast. What's the point of all of this? Well you may know that I work quite a bit in Ruby so I was just curious what would happen if we wrote something roughly equivalent in Ruby. Would the results be any different? Here's my (claude's) implementation in Ruby.
require 'benchmark'
require 'concurrent'
a = 0
num = 100
time = Benchmark.measure do
threads = []
num.times do
threads << Thread.new do
sleep 0.01
a += 1
end
end
threads.each(&:join)
end
puts "Execution time: #{time.real * 1_000} milliseconds"
No. of operations | Time (ms) |
---|---|
100 | 2.81 |
1000 | 37.82 |
10_000 | Error: Can't create thread |
100_000 | Error: ... |
1_000_000 | Error: ... |
10_000_000 | Error: ... |
Ruby: Asynchronous addition w/ a thread pool
We run into our first issue which is Ruby's limited ability to handle massive concurrency as there aren't enough threads to go around. Let's see what happens when we create and implement a thread pool.1
require 'benchmark'
require 'monitor'
require 'concurrent'
a = 0
num = 100
pool_size = 1000
pool = Concurrent::FixedThreadPool.new(pool_size)
time = Benchmark.measure do
latch = Concurrent::CountDownLatch.new(num)
num.times do
pool.post do
begin
sleep 0.01
a += 1
ensure
latch.count_down # Signal task completion
end
end
end
latch.wait
end
puts "Execution time: #{time.real * 1_000} milliseconds"
No. of operations | Ruby: Time (ms) | Elixir: Time(ms) |
---|---|---|
100 | 14.81 | 6 |
1000 | 66.43 | 3 |
10_000 | 254.397 | 38 |
100_000 | 2138.565 | 421 |
1_000_000 | 17800.38 | 6605 |
10_000_000 | ??? | 91887 |
Conclusion
"Elixir is better than Ruby!" Yeah, we've heard that a million times and I'm not really trying to contribute to that with this post. I just wanted to expose a few things that I want to implement when writing Elixir going forward.
- Why do I never use a GenServer? It can clearly handle an insane amount of load and it's a simple way to create a caching layer. I need to think deeply about where I should be using it but GenServer is now firmly at the top of my mind.
- Why do I never use Task or processes? Async has always been pretty taboo as it unlocks a whole new set of bugs. However, that's what Elixir was built for! Same as above, I need to think about where I can find ways to make my code async when applicable. There are massive performance gains to be found.
- Ruby obviously has limits with concurrency, literally no one is surprised. But what I am surprised about is how I never thought for a second about implementing a thread pool equivalent in Elixir. Being able to create 10 million processes and not worry about any resource restrictions is kinda crazy!
You may be wondering if I varied the pool size during testing and I did. I increased the pool to 10k at one point which actually made my performance worse. I did not attempt to find the "magic" number of threads that would increase performance as I just don't care enough. I also doubt it would make a difference when dealing with much much larger amounts of operations.↩