Scaling Up
I’m starting with the sequential version of WideFinder. I have two implementations of Ruby, so this will be interesting.
Ruby 1.8.6 pegs the CPU at 100%, and completes the sample size in 7s. JRuby 1.0.1 goes all the way to 110% CPU utilization, but that’s reported against the JVM, I’m assuming 100% is taken by the processing, the other 10% by overhead, like the garbage collector. Unfortunately, I have to be 50s patient waiting for the results.
The parallelized version fares worse on CRuby. CRuby doesn’t scale out, so still only 100% CPU utilization, plus the overhead of parallelizing on a single thread. On the other hand, JRuby has fun all the way to 150~160% CPU time.
On Linux each core is considered a separate CPU, so the theoretical upper limit for a dual-core CPU is 200%. In practice, there are other things running in the background, and the best I’ve seen so far is 180%~190%.
I’m not sure why JRuby can’t get there. It does in fact stretch its legs all the way to 180%~190% as reported for Java in the first few seconds, while the JIT kicks in. It might in fact be the JIT working the 20% shift. Past the first few seconds, it slows down to a constant rate of 110% (sequential) or 150~160% (parallel).
Perhaps 160% is the upper limit on reading the log file?
Scaling Out
This time I’m turning WideFinder into an N>1 problem, where each “machine” has a log file to process, and one process that drives them all and collects the results. I’m quoting “machine”, since I’m still running this benchmark on a single machine, but now that I’m only coordinating as opposed to feeding them all a single log file, I can scale out easily.
The sample size is different, so don’t compare processing times with the previous run.
The first run is single threaded, one log file at a time, single core (CRuby) pegged at 100% and takes 1m28s to complete.
The second run is parallelized, each process handles a different log file (ten in total), still single core pegged at 100%, completes in 1m30s. 2 seconds difference from pretending to parallelize.
The third run forks a process to handle each log file, stretching it all the way up to 150-160% and finishing in 46s.
Interesting because now I’m keeping both cores busy and completing the workload in half the time, even though it’s barely pushing 160%. I’m not exactly sure how to explain this result, other than forking multiple processes makes magic happen.
Scaling Away
Erlang has the right concurrency model, too bad about the language.
I intentionally did not include the Erlang WideFinder in my tests. I’ll just go by the reports that it’s slow, because frankly I don’t care. In the overall scheme of things, you spend more efforts scaling a room full of servers that generate logs than programs that process these logs. I prefer to focus on those servers first, and from what I hear Erlang is up to that task, so the WideFinder performance can be forgiven.
The point of the exercise was to illustrate the concurrency model, and how you can retrofit it into a variety of languages. When scaling out, it’s as simple as programming against a library. For low-level tasks, it clearly requires the performance and simplicity that comes from changing the underlying VM. And this can be done for Java or Ruby or Python. Though I doubt we’ll see it happen in Java or Ruby, much for the same reason I doubt Erlang will hit a tipping point.
I think the problem is rooted in how developers react to concurrency. When faced with uncertainty, we resort to hierarchy and structure.
Concurrency is fundamentally a problem of uncertainty. Order is undetermined. Some people embrace that by building systems that thrive on concurrency, and once you start a world of opportunities opens up. Have a look at Erlang and Pict. But most developers race (no pun intended) to solve it with structural programming. Threads, forks, synchronization blocks, atomic transactions, parallel FORtran loops. Those are all examples where we scope race conditions and face the consequences, only to avoid thinking in terms of message passing.
I’m waiting for the day when “Synchronization blocks considered harmful” will become the hot topic of the day, and we’ll learn to do without method signatures and without waiting for a return value. Today, though, we’re still forced to do asynchronous, but prefer to think synchronous.