| Achim's profilePattern RecognitionBlog | Help |
|
|
February 14 The operating system knows best - coarse grain concurrency using pipesIn my previous post I wrote about how the current languages (and runtimes) really have a hard time exploiting the benefits that the new multicore processors bring. This previous post mainly dealt with exploiting concurrency on a fine grain level (e.g. loops).
Now I don't know what triggered my realization that there is an oldschool mechanism in both Windows and Un*x that enables many scenarios of coarse grain concurrency: anonymous pipes. It was either the announcement of Yahoo! Pipes or my laziness not to rewrite a bunch of scripts that interact heavily via command line IO.
How do anonymous pipes enable coarse grain concurrency? Say we have a problem that requires some multi-step sequential manipulation/filtering of some kind of data. One way to attack this would be to code this sequence into the main routine of the program in your favorite programming language and pass the data around in programming language structures or references to them. Works fine, except that we'd have a hard time getting any of these steps to run concurrently on a multicore processor with today's languages/runtimes.
The better way in regards to concurrency would be to split up the steps into their own little programs which receive their input via STDIN and output the result for their processing via STDOUT. To execute all steps in sequence you just string them together with anonymous pipes:
There are many benefits:
Of course there are a lot of things to be cautious about:
The Lambda the Ultimate blog had a good discussion on this and the relation to functional programming concepts a little over a year ago.
October 17 Multicore processors or how to choose a programming language for the next 5 yearsI love the open source scripting languages: Perl, Python, PHP and Ruby - the P in LAMP (ok, they have to rename the last one). They are easy to pick up, support different programming paradigms, are available for many platforms, have extensive libraries, have great communities driving them forward, are free (as in beer and freedom) ... I could go on and on.
All of them are in different phases of growing up. Perl 6, Python 3000, Ruby 2.0 all promise bigger and better things in terms of language design and functionality. But (judging from my web searches) not many people talk about how the next revolution in programming - multicore computing - affects the language runtimes (I know, big words, but bear with me).
The battle between PC hardware companies is heating up again, on TV ads for multicore processors are shown during primetime. What happens though, when you run a Perl script containing code like this on one of these shiny new multicore machines?
One of the cores is awfully busy while the others are idly sitting around! Assuming the function some_expensive_function has no side effects the task could easily split up among the different processor cores.
I hear you saying: "But yes, of course the script has to be multi-thread enabled to make use of all the cores.". However, this requires additional, non-trivial work - as Herb Sutter says in his excellent 2005 article: "The free lunch is over". Herb urges everybody to brush up their skills in writing multi-threaded applications. He says: "Implicitly parallelizing compilers can help a little, but don’t expect much; they can’t do nearly as good a job of parallelizing your sequential program as you could do by turning it into an explicitly parallel and threaded version."
This is one way to approach the problem - what I would call "handcrafting" your parallelism. I'm sure you can get very well performing applications out of this; applications like video encoders used to measure multicore performance are enabled today.
That is if you can get this handcrafting right - the web is full of tales of multi-threaded programming gone bad. Applications like this are notoriously hard to debug.
Is there a better solution? If I have to (re-)learn concepts is there one that deals with this problem a little more elegantly?
Turns out there is: functional programming. From the Wikipedia entry: "Disallowing side effects provides for referential transparency, which makes it easier to verify, optimize, and parallelize programs, and easier to write automated tools to perform those tasks".
Excellent - so I just have to adopt the functional programming constructs available in Perl, Python and Ruby and the things that are parallelizable will be parallelized automatically for me? Wishful thinking for now, unfortunately. I couldn't find any info that any of the present runtimes are thread-aware (not just thread-safe), especially for functional programming constructs.
What to do? Wait for the new Parrot, YARV, CPython runtimes? Possibly contribute there? Judging from the Perl 6 history this could take a while.
Use one of the functional programming languages like Erlang or F#? Certainly attractive from a learning point of view, but I'd certainly always would have to trade off at least one of the advantages of the P languages mentioned at the start of the post.
Fortunately there seems to be a way out: Python. For Python, unlike for Perl and Ruby, there are multiple runtimes, among them the Java VM and the .NET runtime. There is a strong motivation for Sun and Microsoft to make the bytecode of these runtimes work as fast as possible on multicore machines. All we need now is for IronPython and Jython to analyze the functional constructs and parallelize them automatically if possible.
For the really tricky performance bottlenecks there will be C/C++ extensions using the tried and tested OpenMP.
Natural language processing requires a lot of computing power. Multicore machines promise to make this available.
Update: Just found out about RubyCLR. Along with JRuby this will allow to target .NET and Java with Ruby. Both seem to be less mature than IronPython and Jython, though. |
|
|