
In The Beginning There Was CGI
That’s inaccurate, in the beginning there were punch cards. But this post is about building scalable applications using Web technologies, and in this post, everything starts with CGI.
CGIs are incredibly slow. Every time you make a request to the server, it starts up another process. You can build simple and light processes if you focus on optimizing those CGIs and tuning every line of code. But you end up developing at the speed of traffic jams.
To develop fast enough you need to reuse code, pull in libraries, hold state in the database. All of which means you end up with significant startup costs. Starting up a VM and a Web framework can take seconds, enough to kill the response time for anything more than a handful of users.
There must be a better way. Incidentally, that’s when the LAMP, Windows and Java crowds each go their own way.
In the LAMP world, processes are everything. If you want to pull out data from a file, sort it, and e-mail the result, you pipe several programs together. You’re building a solution by assembling processes.
And for more complex tasks you add even more processes. Want to do things on a schedule? Fire them up with cron. Need to improve throughput? Start up a cache process. Monitor uptime? That’s another process for you.
For LAMP developers, multi-process is the natural order of things. Multi-threading is reserved for the light stuff.
Windows views the world in a different way. Windows is, after all, a GUI operating system. Building GUIs with threads is easier than processes. Windows has always focused on the GUI at the expense of its under developed command line. As a result, it never developed an ecosystem of processes you can wire together.
The Windows developer mentality focuses on threads and APIs over processes and tasks. Not surprising, Windows is also optimized for threads and has too much of an overhead when it comes to processes.
Java is not Windows. But it tried. Years before J2EE, Java was evolved into an alternative to Windows. There was even talk of replacing the entire operating system and office suite with one written in Java. So Java followed Windows and did threads.
While Windows focused on GUI, Java focused on platform independence. So it went and re-invented the operating system, one library at a time. In Java you don’t scan files with grep, you use a library. You don’t pipe e-mails to sendmail, you use a library. All the features you need are folded into the VM.
Which turned a snappy VM into a huge behemoth that takes a couple of minutes to boot, as it’s setting up libraries, frameworks and containers. You don’t want to startup the JVM more than once.
Java’s Love/Hate Relationship With Threads
Java did to multi-threaded applications what it did to garbage collection and JIT: it brought them to the wider developer community and made them accessible and acceptable. It’s not the best language for multi-threaded development, but it’s a damn good one. And the most popular.
When it comes to multi-threaded development, Java was years ahead of the C/C++ it came to replace, and the VB/Delphi it competed with.
Then it realized what a problem that is. Multi-threaded developers understand concurrency, how threads interact and how to deal with shared resources and data. Most developers don’t. And so they get to suffer the unintended consequences of side effects.
The solution was to force developers to build separate units of work that can run in a multi-threaded environment without conflicts, and scale to more than one server. What comes naturally to every PHP and Perl developer had to be forced down the throat of Java developers.
It was aptly named EJB.
So Java made multi-threaded development incredibly easy, and then made multi-threaded development intentionally hard. Perl, PHP and their crowd didn’t do much to address multi-threaded, instead falling back to processes.
Why Threads Don’t Scale
When you build software, you spend a lot of time tackling problems. You’re solving one set of problems, which means you’re not solving another, or building new features.
You probably want to run more than one application on your server, handle more than one request at a time. So multi-threaded developers need to deal with issues like deployment units, managing containers, tackling class loaders. Anyone who developed in Java knows those technologies are not as slick as they’re made to look in the brochure.
Multi-process developers don’t have to deal with that. But they’re not saving any time. They have to deal with managing processes, deploying in different locations, file system paths. They have their own share of problems to solve. But every time they solve one of these problems, they’re solving a problem that helps them scale.
You build for one machine, but you’re building for a cluster.
When multi-threaded developers optimize their system, they first focus on “in-VM” optimization. How to get components to share resources and talk to each other and build a solution out of objects.
Multi-process developers can’t do that. Instead they have to focus on the cost of sending data from one process to another and coordinating their work.
You have a finite budget for optimizing, and you’ll probably run into a deadline before you max it out. So multi-threaded developers end up optimizing for much better performance on any given machines, but multi-process developers end up optimizing for scale.
Scalability is not performance. Scalability is getting from small to big (and back). I count software design in that.
Multi-threaded developers tend to scale through objects, libraries and frameworks. When you focus on the components around you, you don’t pay much attention to anything outside the sandbox. The level of abstraction is the API.
Multi-process developers scale by assembling programs together, chaining them or running them in parallel. If it’s not in the framework, you look for a program (or combination of) that does what you need. The level of abstraction is the task.
I happen to think tasks are the right level of abstraction. The more complex the system is, the more you need to focus away from pieces of code, and to what those pieces can do.
Performance
If you’re solving problems, not building software for its own sake, then the right level of abstraction is the task. Start working with processes and you’ll discover that you can do a lot more than with one process, no matter how big it is on the inside.
The more independent processes you have, the easier they are to combine into new and interesting uses. In fact, you might be seeing an analogy here with Mashups and SOA, and you’re right.
There are different ways to measure performance. In my experience multi-threaded has higher throughput on a single machine, but there’s so much one machine can do. Eventually, you’ll have to think outside that box.
But the real measure of performance is how fast you can go from idea to working solution. Hardware is cheaper than the people who work with it. Developers who think processes have an easier time building to scale, and an easier time assembling solutions out of existing pieces of code.
It won’t show up in your profiler, but you will get better design and longer lasting code.
Update: Some people confused this post with “how to build a Web server”. This post is about how to build applications higher up the stack. Different tools for different jobs. I also happen to use threads often, I just believe the “everything in one process” approach is misguided. It’s limited by design.
Photo by K /.
I can agree with your statement that less shared context is better than more, in general, but it is worth mentioning that the quality of the scheduler is of paramount imporance. Most modern operating systems have good (i.e., O(1)) schedulers, but you may or may not have much control over where system-level resources are allocated. Alas, subtleties abound.
Pingback: Qix.it
> If you want to pull out data from a file, sort it, and e-mail the result, you pipe several programs together.
So you are saying that LAMP programmers create three processes to do what normal programmers can do in about a dozen lines of code?
I don’t believe you.
Jonathan,
grep pattern file | sort | sendmail me@somewhere.com
Or I could build a program that does the same thing, but most likely I won’t.
It’s easy to underestimate simple solutions because, well, they do so little. But if you can do something simple, you do that often. And if you can’t, you just never get around to it.
Now show us how to do it in the context of a LAMP program.
Or better yet, instead of emailing the sorted file send it back as a web page.
The old adage that applies here is this; “When all you have’s a hammer, then everything’s a nail”.
Every good software engineer should understand that things like threading and multi-proccessing are just tools in the total set of ‘things you should know’. A good engineer knows when each tool applies and (equally important) when they do not. Any good engineer can offer up specific cases to support either implementation, and really good ones can tell you why sometimes you use both.
Typically, things like synchronization, context sharing, message lag, queueing, etc. drive toward one solution over another. The simple chaining of processes together like a typical shell script does have its place, but I’m pretty sure no one believes that scales well. Process-to-Process communication also has its place and can scale well, but more often the latency imposed by the communications medium is onerous.
Learn both. Use both. In fact, learn as many methodologies as you can, and make intelligent decisions about when they work or don’t work for your problem set. The more methodologies (patterns) you understand, the more likely your tool box has the right tools for the job.
One other rant: Keep in mind that, while the developer is more expensive than the hardware, development itself is fixed cost while hardware and network are on-going. So the real cost of the application rarely is driven by the cost to build it. In fact, with use, development cost per transaction moves toward zero. Support costs stay constant.
Jonathan:
system(“/usr/bin/grep ‘{$file_path}’ | /usr/bin/sort | /usr/sbin/sendmail ‘{$email_address}’”);
Perhaps
$pid = $fork();
if($pid == 0){
execve(“grep pattern $file | sort | sendmail $email”);
exit(0);
}
or something similar (I don’t program a lot in Perl/PHP/Python/Ruby etc.) You take the initial hit with the new process creation in fork(), but we all knew that, right?
Jonathan, Assaf: My guess is that you’ve got different interpretations of what constitutes a LAMP program. However, since Jonathan asked:
Nuts – forgot to escape my code:
<?php print `grep pattern file | sort`; ?>
Jonathan,
To send as HTML, add this between sort and sendmail:
| echo header – footer |
Header and footer are just stock boundaries that set the MIME content type and transfer encoding on the body part. And you can reuse them, I have about four of these one line scripts.
I usually do that with curl to grab a Web page I’m monitoring, strip the essential content, and mail the rest as HTML.
Or I could write a program that uses the HTTP library, does the content modification, and then uses the SMTP library. But it ends up taking much longer.
Your execution code is very sloppy and incredibly insecure. You seemingly use the program’s ENVironment variables and allow any thing to be executed via the $file_path variable. You do not descalate privileges to the lowest possible (nobody) nor do you chroot the programs to an isolated part of the system where they can do no harm.
I have made an alternative to the automake, GNUMake and autoconf tools, that people can run on any number of systems as normal or super users. I know what the security implications are and, like SQL injections, how common it is for no one to give a damn.
Trivial typo:
s/Then it realized what a problem that it./Then it realized what a problem that is./
Nick,
Thanks. Fixed it.
Pingback: the jackol’s den » Why Processes Scale Better Than Threads - Mikhail Esteves
Pingback: mokshore » Blog Archive » links for 2006-08-30
Check out:
http://www.jroller.com/page/cpurdy?entry=fastcgi_not_so_fast
Peace.
Cameron,
Thanks for the link. It’s an interesting read.
It doesn’t match my experience, though. I setup FCGI to dynamically adjust the number of processes, so that part works. And it’s the framework (not FCGI) which manages connection lifecycle.
False dichotomy. Java processes can be multi-threaded and mutli-process and both. And this idea that a VMs normally take minutes to load is complete nonsense (unless it’s 1997 where you live.)
Pingback: Deviant Abstraction » Blog Archive » Processes vs. Threads
Where is your proof? Have you ever built a LAMP solution that could handle 10K+ client connections per box? Don’t post crap like this unless you have.
Mike,
This post is about scalability, as in throwing more boxes at the problem.
I’m sorry, how do you figure that throwing fewer boxes at the problem in order to scale is worse scalability?
Try running Tomcat on Linux both with and without NPTL with a huge number of threads some time. You’ll see that things run a LOT better with threads instead of processes. Maybe you have observed multiple processes being faster, but were they using non-blocking I/O? Not a good comparison if so, which is why I suggest really comparing threads to processes 1-on-1.
Mike,
In my experience, the only two metrics that matter are: how much will it cost you to serve (bandwidth, CPUs, etc)? and how much it will cost you to develop?
Generally, optimizing for processes is cheaper because hardware is cheaper than development. $3000 gets you a very nice server, but not even 2 man weeks of development.
I have not observed that multiple processes run faster, I observed that you get a lot more features in a lot shorter time and it scales very well. And for most Web applications, that’s the better tradeoff.
For anyone who has done distributed programming will attest that communication between same process is more than 1000 times faster than communicaiton between external processes. Also, LAMP components are more coarse grained than you might find in Java or C where you are invoking methods of another layer or class. Nevertheless, there is a physical limitations to one machine and ultimately you have to scale to multiple machines. I have found the best option is combination of two and there are many options for deployment. For example, you can deploy entire stack of application on multiple machines or have different machines for specific layers.
There’s some safety with the process model. For the cheap virtual hosting sites running a zillion PHP processes potentially containing untold horrors of code (either purposefully or inadvertantly written), the process model is great. Processes provide infrastructure for keeping the entire server environment safe.
But for dedicated applications, this is less of an issue as they nominally don’t run potentially unsafe code (that’s vetted out in the development and QA process).
But the expenses of that security tends to outweigh the performance benefits of a threaded environment.
If you take the simplest case of a multithreaded server servicing stateless requests without the use of any shared resources, then the overhead is the cost of the OS context switching the threads vs switching the processes, and threads are undoubtedly much cheaper to switch than processes today.
Once you start sharing resources, then the threaded server does even better.
Process servers can’t share resources easily. One of the beauties of the process is the impenetrable wall that the OS throws up around it, giving us lots of security and privacy to do what we want, and how we want. At the naive level, a process server is most likely going to have to serialize any resource it doesn’t natively contain. It will load this data either from a DB or a filesystem, typically. It has hoist that data over that process “firewall” that the OS has built around the code.
If you do this once at startup, then all of the processes have a copy of the data and its simply a startup cost of the process. But now you’re “wasting space” because every process is duplicating data. If you do this to service each request, then you must pay the serialization costs.
You can share resources with shared memory, but as soon as you do that you get all of the coordination issues that are associated with threads. Not a problem with read only data, but it is for modifiable data. So you pay the synchronization prices that shared resourcse invoke in threaded environements, but without the benefits of a threaded environment.
Of course, with processes, the interprocess communication components are all going to be kernel space routines, not user space, and therefore more expensive. The kernel is also surrounded by a firewall of sorts, so anytime you need to talk to the kernel, data needs to be thrown back and forth across that boundary.
With threaded environments, most of these problems are mitigated. Sharing data is implicit in the design, you only need a single copy of the data, and all of your IPC is done in the user space, rather than the kernel. If you have an application sharing read only data among threads, your application isn’t written any differently than a process application, but you still all the lightweight benefits of the threaded app.
The biggest risk in a threaded environment is robustness. A single thread can destroy the entire process, including other “unrelated” clients. Processes are much safer than that.
But all that safety comes at a cost in performance for the application.
A well written threaded application will perform better than a well written process based application if for no other reason than simply demanding less of the encompossing operating system services and resources.
Thus a threaded application will provide better system utilization than a process based application.
While a process based developer doesn’t have to “worry about” resource sharing, the operating system DOES have to worry about that. It’s not like the problems go away, they’re simply delegated to the OS to manage them. So, you have to pay the same price as threaded applications (even if you’re paying that price unknowingly) but you don’t get the advantage of threaded environments in terms of performance and resource sharing.
Finally, regarding the containers and such, those are not required to write threaded applications. You can write a threaded application that doesn’t require any of the packaging or deployment issues et al.
The benefit of the containers is simply that we don’t have to reinvent that code when we can create a simple bundle for our application and delegate that low level plumbing to a tested and proven container.
So, feel free to assert that processes can scale, but threaded applications scale BETTER, provide better system utilization, and will reduce deployment infrastructure costs and long term operation costs.
Some recommendations:
a) You should learn what scalability is.
b) You should learn what a process is.
c) You should learn what a thread is.
After that go write blog entries.
All the best,
Tom
The addition of a $3000 machine is not where the scalability story ends.
If you follow what the costs are in the industry,
additions of machines imply additions of sysadmins to care and feed them.
You just traded the cost of development – perhaps
limited – for a guaranteed cost increase, both in hardware costs and in staffing to care for those extra machines.
—
The extra cost of development (optimize for the – vertical – scaling) may allow to reduce the ‘slope’ of increase
for your horizontal scaling, leading to lower
costs in incremental hardware and associated sysadmin
staff.
Adding another box to an existing one gets me to about 100% performance improvement, give or take. Scalability is not linear, but the hardware keeps getting better.
There’s admin cost, but it’s incremental and in the low percentage rate (per machine).
Optimizing 100% of your code won’t get you 100% more performance, maybe 20%. It will take several man month, and that’s not counting the fact that optimized (for CPUs, not developers or users) code is harder to maintain.
Should you optimize? For the small efforts that give you a lot of gain – certainly! I put indexes on my tables, choose good sorting algorithms, and pick up all the low hanging fruits.
But if you prefer an environment that’s optimized 100% of the time, for the 20% performance gain you get, you’re wasting a lot of resources.
I love that article… So much s that I forwarded it to the tools programmers at my work, causing quite a stir!
One’s first step in wisdom is to kuesteon everything – and one’s last is to come to terms with everything.
G00gle