Sunday 6 October 2013

Windows vs Linux: Processes and Threads

I'm both a Windows and Linux (Ubuntu, of course) user, and I'm pretty happy with both systems. I find strengths and weaknesses on both of them, and love to try to understand how similar and how different both systems are. It's important to note that I don't have any sort of "moral bias" against Commercial Software. I deeply appreciate Open Source and almost all software I run on my home PCs is Open Source, but I have absolutely nothing against selling software, on the contrary, provided that it's sold by a fair price, I fully support it (until they day capitalism is overthrown and we start to live in a perfect "communist with a human face" society...) People buy and sell hardware, so what's the problem with buying/selling software?

What really annoys me (so much that it made me move away from Linux for several years) are the typical open source bigots that spend the whole day bashing Microsoft (a company where employees earn pretty decent salaries and enjoy a huge level of respect from their employer) because of the inherent evilness in selling software, but don't give a shit about wearing clothes produced by people earning 2 dollars a month under enslavement conditions... It's obvious that if you're involved in an anarchist hacklab you should avoid Closed Software, but someone with a iphone in the pocket of his Levi's trousers is not entitled to give moral lessons to Microsoft, Adobe or whatever... well, enough philosophy, let's go to the business :-)

There are a few Windows/Linux differences that I find interesting and I'd like to touch upon, I'll start off today by Processes and Threads:

For years I've had the impression than Threads in Linux play a rather less important role than in Windows. I can think of a handful of reasons for this:

  • It seems to be common knowledge that Process creation is cheaper in Linux, this discussion makes a pretty enriching read. In short, fork and even fork + exec seem cheaper than CreateProcess, and some aspects of Windows (like security) are fair more complicated (which does not necessarily mean better) than in Linux, which adds overhead. Regarding fork, when a process A starts a second copy of itself it's just a simple fork not followed by an exec, so my understanding is that no hard disk access is involved, while a CreateProcess will always involve disk access.
  • Traditionally Linux threads have been far from optimal, though all this seems to have changed since the introduction of NPTL in Kernel 2.6
  • I think we could say that for the Linux Kernel a Thread and a Process are quite more similar than they are for the Windows Kernel. In Linux both Process creation and Thread creation make use of the clone syscall (either invoked by fork for the former or by pthread_create for the latter), though both calls are done differently so that some data structures (memory space, processor state, stack, PID, open files, etc) are shared or not. This paragraph I found somewhere is good to note:

    Most of today's operating systems provide multi-threading support and linux is not different from these operating systems. Linux support threads as they provide concurrency or parallelism on multiple processor systems. Most of the operating systems like Microsoft Windows or Sun Solaris differentiate between a process and a thread i.e. they have an explicit support for threads which means different data structures are used in the kernel to represent a thread and a process.
    Linux implementation of threads is totally different as compared to the above-mentioned operating systems. Linux implements threads as a process that shares resources among themselves. Linux does not have a separate data structure to represent a thread. Each thread is represented with task_struct and the scheduling of these is the same as that of a process. It means the scheduler does not differentiate between a thread and a process.

    Please, with respect to the last sentence notice that the Windows Scheduler does not differentiate between threads and processes either, it just schedules threads, irrespective of their process. It's nicely confirmed here:

    Scheduling in Windows is at the thread granularity. The basic idea behind this approach is that processes don't run but only provide resources and a context in which their threads run. Coming back to your question, because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. In your example, if process A has 1 runnable thread and process B has 50 runnable threads, and all 51 threads are at the same priority, each thread would receive 1/51 of the CPU time—Windows wouldn't give 50 percent of the CPU to process A and 50 percent to process B. To understand the thread-scheduling algorithms, you must first understand the priority levels that Windows uses.

    This is another good read about Linux Threads and Processes

One consequence of these differences in importance is that getting thread figures is more straightforward in Windows.
Viewing the threads associated to a process is pretty simple in Windows, you don't even need the almighty ProcessExplorer and just can get by with Task Manager if you add the Threads column to it. This is not that out of the box in Linux. Ubuntu's System Manager does not have a Threads column, and most command line tools do not show the threads number by default, so you'll need to use some additional parameters:

with ps you can use the o option to specify the nlwp column, so you can end up with something like this:
ps axo pid,ppid,rss,vsz,nlwp,cmd
When using top in principle you can pass the -H parameter so that it'll show threads rather than processes, but I find the output confusing.

I think another clear example of the differences in "thread culture" between Linux/Windows communities is Node.js. Its asynchronous programming model is great for many scenarios, but it's easy to get to a point where you really need two "tasks" running in parallel (2 cpu bound tasks like decrypting 2 separate streams), when I first read that the only solution for those cases is spawning a new process, such answer came as a shock as I've got mainly a Windows background. When you think that though it's now massively used in Windows Node.js started with Linux as its main target, the answer is not that surprising.

4 comments: