Thoughts of the day

Actually this afternoon I was talking with my colleagues here at AXANT about the discussion pending over tail recursion elimination that is proceeding around Python, so they wanted me to post about that.

But, as I think that functional programming doesn’t really have to be embedded inside python if we don’t need it, I don’t really want to talk about that! (I’m not going to talk about that, so please, leave this post now, because I won’t say a word!) Usually I think about functional programming as a good tool to implement generic programming and also Tail recursion elimination can be an optimization inside compiled languages where a loop costs less than a function call as it is just an integer increment, and even so on modern architectures this might not be true, probably on Python it might even be worse.

In Python performing a function call requires: performing a function call (off course…), and iterating on something requires: incrementing an iterator (off course again?), which means performing a function call. So both would cost the same, but I think that probably function calls are best optimized than iterator incrementation inside CPython as the first is implemented in pure C while the second is mostly performed in python itself. Anyway the complexity of the algorithm from which you are eliminating tail recursion doesn’t change at all, so it isn’t a real optimization.

Fine, I ended talking about tail recursion elimination even if I didn’t want to!

Time for the real post!

This evening while surfing around the web I found an interesting consideration that makes me hope that for the future computer science might not be in the hands of business people as much as I think. In the recent years it seemed that for business people Ruby was becoming the new Java, every company thought that it could solve all its problems by just switching to it. All the interest around Ruby was confirmed by the fact that In the 2005 O’Reilly reported that Ruby books sales where up by around a 1500%, looking again at the same report now for 2008 it seems that Ruby hype is fading away (look at ObjC increment! Might that be cause of iPhone? Uhm, I think that Java will see something like that cause of Android launch).

This is making me think that maybe in the future we will be able to use Ruby for what it is and where it better behaves, instead of trying to make everything with it without being able to tell to our customers that “it isn’t a good idea” as it would receive as the only response that TechCrunch reported that “random name company” had increased its income by using Ruby. (I’m still trying to understand how it is possible that they usually don’t realize that “random name company” increased its income probably cause of what they did by using Ruby, not by the fact of using Ruby itself)

Maybe some day Italian companies will stop trying to take decisions about their IT infrastructure using stock market values and will start to realize that using Drupal, Plone or even Django might be a better solution than spending thousands of euros in implementing their own CMS from scratch using RoR just because they wanted to use RoR (At least please use Railfrog or something like that if you really want RoR!). Software development languages and frameworks should be the tool to achieve your target, not the target itself. I’m looking forward to the future or to the nearest sniper rifles shop.

Profiling Proficiency

Trying to assess the impact of the Ragel state machine generator on software, I’ve been trying to come down with a quick way to seriously benchmark some simple testcases, making sure that the results are as objective as possible. Unfortunately, I’m not a profiling expert; to keep with the alliteration in the post’s title, I’m not proficient with profilers.

Profiling can be done either internally to the software (manually or with GCC’s help), externally with a profiling software, or in the kernel with a sampling profiler. Respectively, you’d be found using the rdtsc instruction (on x86/amd64), the gprof command, valgrind and one of either sysprof or oprofile. Turns out that almost all these options don’t give you anything useful in truth, and you have to decide whether to get theoretical data, or practical timing, and in both cases, you don’t have a very useful result.

Of that list of profiling software, the one that looked more promising was actually oProfile; especially more interesting since the AMD Family 10h (Barcelona) CPUs supposedly have a way to have accurate reporting of instruction executed, which should combine the precise timing reported by oProfile with the invariant execution profile that valgrind provides. Unfortunately oprofile’s documentation is quite lacking, and I could find nothing to get that “IBS” feature working.

Since oprofile is a sampling profiler, it has a few important points to be noted: it requires kernel support (which in my case required a kernel rebuild because I had profiling support disabled), it requires root to set up and start profiling, through a daemon process, by default it profiles everything running in the system, which on a busy system running a tinderbox might actually be _too_ much. Support for oprofile in Gentoo also doesn’t seem to be perfect, for instance there is no standardised way to start/stop/restart the daemon, which might not sound that bad for most people, but is actually useful because sometimes you forget to have the daemon running and you try to start it time and time again and it doesn’t seem to work as you expect. Also, for some reason it stores the data in /var/lib instead of /var/cache; this wouldn’t be a problem if it wasn’t that if you don’t pay enough attention you can easily see your /var/lib filesystem filling up (remember: it runs at root so it bypasses the root-reserved space).

More worrisome, you’ll never get proper profiling on Gentoo yet, at least for AMD64 systems. The problem is that all sampling profilers (so the same holds true for sysprof too) require frame pointer information; the well-known -fomit-frame-pointer flag, that allows to save precious registers on x86, and used to break debug support, can become a problem, as Mart pointed out to me. The tricky issue is that since a few GCC and GDB versions, the frame pointer is no longer needed to be able to get complete backtraces of processes being debugged; this meant that, for instance on AMD64, the flag is now automatically enabled by -O2 and higher. On some architecture, this is still not a problem to sample-based profiling, but on AMD64 it is. Now, the testcases I had to profile are quite simple and minimal and only call into the C library (and that barely calls the kernel to read the input files), so I only needed to have the C library built with frame pointers to break down the functions; unfortunately this wasn’t as easy as I hoped.

The first problem is that the Gentoo ebuild for glibc does have a glibc-omitfp USE flag; but does not seem to explicitly disable frame pointer omitting, just explicitly enable it. Changing that, thus, didn’t help; adding -fno-omit-frame-pointer also does not work properly because flags are (somewhat obviously) filtered down; adding that flag to the whitelist in flag-o-matic does not help either. I have yet to see it working as it’s supposed to, so for now oprofile is just accumulating dust.

My second choice was to use gprof together with GCC’s profiling support, but that only seems to provide a breakdown in percentage of execution time, which is also not what I was aiming for. The final option was to fall back to valgrind, in particular the callgrind tool. Unfortunately the output of that tool is not human-readable, and you usually end up using software like KCacheGrind to be able to parse it; but since this system is KDE-free I couldn’t rely on that; and also KCacheGrind does not provide an easy way to extract data out of it to compile benchmark tables and tests.

On the other hand, the callgrind_annotate script does provide the needed information, although still in a not exactly software-parsable fashion; indeed to find the actual information I had to look at the main() calltrace and then select the function I’m interested in profiling, that provided me with the instruction count of execution. Unfortunately this does not really help me as much as I was hoping, since it tells me nothing on how much time will it take for the CPU to execute the instruction (which is related to the amount of cache and other stuff like that), but it’s at least something to start with.

I’ll be providing the complete script, benchmark and hopefully results once I can actually have a way to correlate them to real-life situations.

Canvas 3D and various modern web technologies reflections

Every ~10 years we face an incredible new technology that looks a lot like an old technology that spent his life in total oblivion.

Recently we faced the return of clustered computing under the new brand of cloud computing, we faced the return of time sharing systems under the software as a service paradigm and recently 3D movies have returned to life from 1970. Technology improves and sometimes we recall that something that we already tried can be made better and released under a new less geeky brand.

Recently HTML5, which is trying to remove the need for plugins for nowadays common actions (like playing videos online) and to stop the HTML vs XHTML war by permitting to integrate SVG and MathML in HTML, caused the return of the idea of making complex graphic online.

Indeed being able to draw a few pixels using <canvas> doesn’t mean being able to draw complex 3D graphic and so we started to see the old idea of VRML getting new life under the flag of Canvas3DL and O3D. (Actually VRML and Canvas3DL/O3D have little in common, but we are still talking about 3d graphic on web)

This was quite inevitable and I think that in HTML6 we will probably see some kind of 3D API integrated inside the standard canvas, but for now I’m just curious to see if they will be used for something good or if they will face the same doom as VRML and flash which have been used 90% to create disturbing and useless web pages, animations and introductions.

Anyway, Cadie was already moving on this front and has provided the solution for 3D web

At least it would be cool to be able to retrieve your heavy machine gun inside Quake6 with jQuery(‘#machinegun’) but, as the world hates me, 3D objects won’t be part of the DOM and will only rely on your javascript heap, I’m very sad about this :/

Google loves Ruby

Can you see any reference to Ruby here?

They even have an ObjectiveC library, probably cause of the huge amount of Objective C web frameworks, but no official Ruby library.

Do they hate Ruby? Please, do not hate Ruby! I’m a poor little interpreted language used by everyone who is making real money, please love me! pleaseeeee, I’ll make you rich!

Yeah, ok, I’m sarcastic and there obviously are some unofficial Ruby libraries around there for GData.

CentOS and apache

A great part of VPS users run a CentOS, also a lot of real servers run CentOS nowadays. Usually this happens as CentOS users hope that it should be more stable, tested and secure then other distributions as being derived from an enterprise commercial one.

This might be true for most cases, but while developing PyHP we found a lot of “unknown mysterious bugs” that happened only on CentOS. After some investigation we found that apr_stat on CentOS always returns 0 as file size (this made quite interesting to allocate buffers or use mmap to read files) and also that bucket brigades had a strange behaviour, and as strange I mean that in some conditions they never considered terminated the request and caged the user in a wonderful infinite loop (as in “while(true)” not as teleporting the user to apple head quarters).

As a big percentage of PyHP users rely on CentOS we had to rewrite some parts to use lstat instead of apr_stat and also move away from bucket brigades to ap_should_client_block. If you are using CentOS and find any problem with PyHP try to upgrade to the latest svn trunk, also if you are using the svn trunk please upgrade to the latest one as there was a bug caused by the process of migrating from bucket brigades to apr_should_client_block that might prevent your users from being able to upload big files.