<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Why Processes Scale Better Than Threads</title>
	<atom:link href="http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/feed/" rel="self" type="application/rss+xml" />
	<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/</link>
	<description></description>
	<lastBuildDate>Thu, 18 Mar 2010 06:29:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Neil Christianson</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-138280</link>
		<dc:creator>Neil Christianson</dc:creator>
		<pubDate>Tue, 21 Aug 2007 12:25:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-138280</guid>
		<description>One&#039;s first step in wisdom is to kuesteon everything - and one&#039;s last is to come to terms with everything. 
&lt;a href=&quot;http://google.com&quot;&gt;G00gle&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>One&#8217;s first step in wisdom is to kuesteon everything &#8211; and one&#8217;s last is to come to terms with everything.<br />
<a href="http://google.com">G00gle</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tnkgrl</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-11745</link>
		<dc:creator>tnkgrl</dc:creator>
		<pubDate>Thu, 14 Sep 2006 23:49:16 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-11745</guid>
		<description>I love that article... So much s that I forwarded it to the tools programmers at my work, causing quite a stir!</description>
		<content:encoded><![CDATA[<p>I love that article&#8230; So much s that I forwarded it to the tools programmers at my work, causing quite a stir!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Assaf</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-10410</link>
		<dc:creator>Assaf</dc:creator>
		<pubDate>Thu, 07 Sep 2006 17:10:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-10410</guid>
		<description>Adding another box to an existing one gets me to about 100% performance improvement, give or take. Scalability is not linear, but the hardware keeps getting better.

There&#039;s admin cost, but it&#039;s incremental and in the low percentage rate (per machine).

Optimizing 100% of your code won&#039;t get you 100% more performance, maybe 20%. It will take several man month, and that&#039;s not counting the fact that optimized (for CPUs, not developers or users) code is harder to maintain.

Should you optimize? For the small efforts that give you a lot of gain - certainly! I put indexes on my tables, choose good sorting algorithms, and pick up all the low hanging fruits.

But if you prefer an environment that&#039;s optimized 100% of the time, for the 20% performance gain you get, you&#039;re wasting a lot of resources.</description>
		<content:encoded><![CDATA[<p>Adding another box to an existing one gets me to about 100% performance improvement, give or take. Scalability is not linear, but the hardware keeps getting better.</p>
<p>There&#8217;s admin cost, but it&#8217;s incremental and in the low percentage rate (per machine).</p>
<p>Optimizing 100% of your code won&#8217;t get you 100% more performance, maybe 20%. It will take several man month, and that&#8217;s not counting the fact that optimized (for CPUs, not developers or users) code is harder to maintain.</p>
<p>Should you optimize? For the small efforts that give you a lot of gain &#8211; certainly! I put indexes on my tables, choose good sorting algorithms, and pick up all the low hanging fruits.</p>
<p>But if you prefer an environment that&#8217;s optimized 100% of the time, for the 20% performance gain you get, you&#8217;re wasting a lot of resources.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arieh Markel</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-10383</link>
		<dc:creator>Arieh Markel</dc:creator>
		<pubDate>Thu, 07 Sep 2006 12:16:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-10383</guid>
		<description>The addition of a $3000 machine is not where the scalability story ends.

If you follow what the costs are in the industry,
additions of machines imply additions of sysadmins to care and feed them.

You just traded the cost of development - perhaps
limited - for a guaranteed cost increase, both in hardware costs and in staffing to care for those extra machines.

---

The extra cost of development (optimize for the - vertical - scaling) may allow to reduce the &#039;slope&#039; of increase
for your horizontal scaling, leading to lower
costs in incremental hardware and associated sysadmin
staff.</description>
		<content:encoded><![CDATA[<p>The addition of a $3000 machine is not where the scalability story ends.</p>
<p>If you follow what the costs are in the industry,<br />
additions of machines imply additions of sysadmins to care and feed them.</p>
<p>You just traded the cost of development &#8211; perhaps<br />
limited &#8211; for a guaranteed cost increase, both in hardware costs and in staffing to care for those extra machines.</p>
<p>&#8212;</p>
<p>The extra cost of development (optimize for the &#8211; vertical &#8211; scaling) may allow to reduce the &#8217;slope&#8217; of increase<br />
for your horizontal scaling, leading to lower<br />
costs in incremental hardware and associated sysadmin<br />
staff.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tom</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-10114</link>
		<dc:creator>tom</dc:creator>
		<pubDate>Mon, 04 Sep 2006 12:09:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-10114</guid>
		<description>Some recommendations:

a) You should learn what scalability is.
b) You should learn what a process is.
c) You should learn what a thread is.

After that go write blog entries.

All the best,
Tom</description>
		<content:encoded><![CDATA[<p>Some recommendations:</p>
<p>a) You should learn what scalability is.<br />
b) You should learn what a process is.<br />
c) You should learn what a thread is.</p>
<p>After that go write blog entries.</p>
<p>All the best,<br />
Tom</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-9835</link>
		<dc:creator>Will</dc:creator>
		<pubDate>Sat, 02 Sep 2006 00:55:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-9835</guid>
		<description>There&#039;s some safety with the process model. For the cheap virtual hosting sites running a zillion PHP processes potentially containing untold horrors of code (either purposefully or inadvertantly written), the process model is great. Processes provide infrastructure for keeping the entire server environment safe.

But for dedicated applications, this is less of an issue as they nominally don&#039;t run potentially unsafe code (that&#039;s vetted out in the development and QA process).

But the expenses of that security tends to outweigh the performance benefits of a threaded environment.

If you take the simplest case of a multithreaded server servicing stateless requests without the use of any shared resources, then the overhead is the cost of the OS context switching the threads vs switching the processes, and threads are undoubtedly much cheaper to switch than processes today.

Once you start sharing resources, then the threaded server does even better. 

Process servers can&#039;t share resources easily. One of the beauties of the process is the impenetrable wall that the OS throws up around it, giving us lots of security and privacy to do what we want, and how we want. At the naive level, a process server is most likely going to have to serialize any resource it doesn&#039;t natively contain. It will load this data either from a DB or a filesystem, typically. It has hoist that data over that process &quot;firewall&quot; that the OS has built around the code.

If you do this once at startup, then all of the processes have a copy of the data and its simply a startup cost of the process. But now you&#039;re &quot;wasting space&quot; because every process is duplicating data. If you do this to service each request, then you must pay the serialization costs.

You can share resources with shared memory, but as soon as you do that you get all of the coordination issues that are associated with threads. Not a problem with read only data, but it is for modifiable data. So you pay the synchronization prices that shared resourcse invoke in threaded environements, but without the benefits of a threaded environment.

Of course, with processes, the interprocess communication components are all going to be kernel space routines, not user space, and therefore more expensive. The kernel is also surrounded by a firewall of sorts, so anytime you need to talk to the kernel, data needs to be thrown back and forth across that boundary.

With threaded environments, most of these problems are mitigated. Sharing data is implicit in the design, you only need a single copy of the data, and all of your IPC is done in the user space, rather than the kernel. If you have an application sharing read only data among threads, your application isn&#039;t written any differently than a process application, but you still all the lightweight benefits of the threaded app. 

The biggest risk in a threaded environment is robustness. A single thread can destroy the entire process, including other &quot;unrelated&quot; clients. Processes are much safer than that.

But all that safety comes at a cost in performance for the application.

A well written threaded application will perform better than a well written process based application if for no other reason than simply demanding less of the encompossing operating system services and resources.

Thus a threaded application will provide better system utilization than a process based application.

While a process based developer doesn&#039;t have to &quot;worry about&quot; resource sharing, the operating system DOES have to worry about that. It&#039;s not like the problems go away, they&#039;re simply delegated to the OS to manage them. So, you have to pay the same price as threaded applications (even if you&#039;re paying that price unknowingly) but you don&#039;t get the advantage of threaded environments in terms of performance and resource sharing.

Finally, regarding the containers and such, those are not required to write threaded applications. You can write a threaded application that doesn&#039;t require any of the packaging or deployment issues et al.

The benefit of the containers is simply that we don&#039;t have to reinvent that code when we can create a simple bundle for our application and delegate that low level plumbing to a tested and proven container.

So, feel free to assert that processes can scale, but threaded applications scale BETTER, provide better system utilization, and will reduce deployment infrastructure costs and long term operation costs.</description>
		<content:encoded><![CDATA[<p>There&#8217;s some safety with the process model. For the cheap virtual hosting sites running a zillion PHP processes potentially containing untold horrors of code (either purposefully or inadvertantly written), the process model is great. Processes provide infrastructure for keeping the entire server environment safe.</p>
<p>But for dedicated applications, this is less of an issue as they nominally don&#8217;t run potentially unsafe code (that&#8217;s vetted out in the development and QA process).</p>
<p>But the expenses of that security tends to outweigh the performance benefits of a threaded environment.</p>
<p>If you take the simplest case of a multithreaded server servicing stateless requests without the use of any shared resources, then the overhead is the cost of the OS context switching the threads vs switching the processes, and threads are undoubtedly much cheaper to switch than processes today.</p>
<p>Once you start sharing resources, then the threaded server does even better. </p>
<p>Process servers can&#8217;t share resources easily. One of the beauties of the process is the impenetrable wall that the OS throws up around it, giving us lots of security and privacy to do what we want, and how we want. At the naive level, a process server is most likely going to have to serialize any resource it doesn&#8217;t natively contain. It will load this data either from a DB or a filesystem, typically. It has hoist that data over that process &#8220;firewall&#8221; that the OS has built around the code.</p>
<p>If you do this once at startup, then all of the processes have a copy of the data and its simply a startup cost of the process. But now you&#8217;re &#8220;wasting space&#8221; because every process is duplicating data. If you do this to service each request, then you must pay the serialization costs.</p>
<p>You can share resources with shared memory, but as soon as you do that you get all of the coordination issues that are associated with threads. Not a problem with read only data, but it is for modifiable data. So you pay the synchronization prices that shared resourcse invoke in threaded environements, but without the benefits of a threaded environment.</p>
<p>Of course, with processes, the interprocess communication components are all going to be kernel space routines, not user space, and therefore more expensive. The kernel is also surrounded by a firewall of sorts, so anytime you need to talk to the kernel, data needs to be thrown back and forth across that boundary.</p>
<p>With threaded environments, most of these problems are mitigated. Sharing data is implicit in the design, you only need a single copy of the data, and all of your IPC is done in the user space, rather than the kernel. If you have an application sharing read only data among threads, your application isn&#8217;t written any differently than a process application, but you still all the lightweight benefits of the threaded app. </p>
<p>The biggest risk in a threaded environment is robustness. A single thread can destroy the entire process, including other &#8220;unrelated&#8221; clients. Processes are much safer than that.</p>
<p>But all that safety comes at a cost in performance for the application.</p>
<p>A well written threaded application will perform better than a well written process based application if for no other reason than simply demanding less of the encompossing operating system services and resources.</p>
<p>Thus a threaded application will provide better system utilization than a process based application.</p>
<p>While a process based developer doesn&#8217;t have to &#8220;worry about&#8221; resource sharing, the operating system DOES have to worry about that. It&#8217;s not like the problems go away, they&#8217;re simply delegated to the OS to manage them. So, you have to pay the same price as threaded applications (even if you&#8217;re paying that price unknowingly) but you don&#8217;t get the advantage of threaded environments in terms of performance and resource sharing.</p>
<p>Finally, regarding the containers and such, those are not required to write threaded applications. You can write a threaded application that doesn&#8217;t require any of the packaging or deployment issues et al.</p>
<p>The benefit of the containers is simply that we don&#8217;t have to reinvent that code when we can create a simple bundle for our application and delegate that low level plumbing to a tested and proven container.</p>
<p>So, feel free to assert that processes can scale, but threaded applications scale BETTER, provide better system utilization, and will reduce deployment infrastructure costs and long term operation costs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shahzad Bhatti</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-9764</link>
		<dc:creator>Shahzad Bhatti</dc:creator>
		<pubDate>Fri, 01 Sep 2006 12:58:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-9764</guid>
		<description>For anyone who has done distributed programming will attest that communication between same process is more than 1000 times faster than communicaiton between external processes. Also, LAMP components are more coarse grained than you might find in Java or C where you are invoking methods of another layer or class. Nevertheless, there is a physical limitations to one machine and ultimately you have to scale to multiple machines. I have found the best option is combination of two and there are many options for deployment. For example, you can deploy entire stack of application on multiple machines or have different machines for specific layers.</description>
		<content:encoded><![CDATA[<p>For anyone who has done distributed programming will attest that communication between same process is more than 1000 times faster than communicaiton between external processes. Also, LAMP components are more coarse grained than you might find in Java or C where you are invoking methods of another layer or class. Nevertheless, there is a physical limitations to one machine and ultimately you have to scale to multiple machines. I have found the best option is combination of two and there are many options for deployment. For example, you can deploy entire stack of application on multiple machines or have different machines for specific layers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Assaf</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-9752</link>
		<dc:creator>Assaf</dc:creator>
		<pubDate>Fri, 01 Sep 2006 07:52:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-9752</guid>
		<description>Mike,

In my experience, the only two metrics that matter are: how much will it cost you to serve (bandwidth, CPUs, etc)? and how much it will cost you to develop?

Generally, optimizing for processes is cheaper because hardware is cheaper than development. $3000 gets you a very nice server, but not even 2 man weeks of development.

I have not observed that multiple processes run faster, I observed that you get a lot more features in a lot shorter time and it scales very well. And for most Web applications, that&#039;s the better tradeoff.</description>
		<content:encoded><![CDATA[<p>Mike,</p>
<p>In my experience, the only two metrics that matter are: how much will it cost you to serve (bandwidth, CPUs, etc)? and how much it will cost you to develop?</p>
<p>Generally, optimizing for processes is cheaper because hardware is cheaper than development. $3000 gets you a very nice server, but not even 2 man weeks of development.</p>
<p>I have not observed that multiple processes run faster, I observed that you get a lot more features in a lot shorter time and it scales very well. And for most Web applications, that&#8217;s the better tradeoff.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-9740</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Fri, 01 Sep 2006 04:42:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-9740</guid>
		<description>I&#039;m sorry, how do you figure that throwing fewer boxes at the problem in order to scale is worse scalability?

Try running Tomcat on Linux both with and without NPTL with a huge number of threads some time.  You&#039;ll see that things run a LOT better with threads instead of processes.  Maybe you have observed multiple processes being faster, but were they using non-blocking I/O?  Not a good comparison if so, which is why I suggest really comparing threads to processes 1-on-1.</description>
		<content:encoded><![CDATA[<p>I&#8217;m sorry, how do you figure that throwing fewer boxes at the problem in order to scale is worse scalability?</p>
<p>Try running Tomcat on Linux both with and without NPTL with a huge number of threads some time.  You&#8217;ll see that things run a LOT better with threads instead of processes.  Maybe you have observed multiple processes being faster, but were they using non-blocking I/O?  Not a good comparison if so, which is why I suggest really comparing threads to processes 1-on-1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Assaf</title>
		<link>http://labnotes.org/2006/08/29/why-processes-scale-better-than-threads/comment-page-1/#comment-9734</link>
		<dc:creator>Assaf</dc:creator>
		<pubDate>Fri, 01 Sep 2006 01:45:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labnotes.org/2006/08/29/why-processes-scale-better-than-threads/#comment-9734</guid>
		<description>Mike,

This post is about scalability, as in throwing more boxes at the problem.</description>
		<content:encoded><![CDATA[<p>Mike,</p>
<p>This post is about scalability, as in throwing more boxes at the problem.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
