(Val)grinding your code

One of of the most exoteric tool when working on software projects that developers should be using more often than they do, is certainly Valgrind. It’s a multi-faced tool: timing profiler, memory profiler, race conditions detector, … it can do all this because it’s mainly a simulated CPU, which executes the code, keeping track of what happens, and replacing some basic functions like the allocation routines.

So this works pretty well, to a point; unfortunately it does not always work out that well because there are a few minor issues: it needs to know about all the instructions used in the programs (otherwise it raises SIGILL crashes), and right now it does not know about SSE4 on x86 and x86-64 processors. And it has to know about the places in the C library where it cannot correctly keep scores via a series of suppression files. Turns out it currently fails at properly knowing about the changes in glibc 2.10, which makes it pretty difficult to find the real issues right in this moment with the most bleeding edge tools available.

But beside these problems with the modern tools and hardware, there are other interesting edges; one of these, the one I’m going to write about today, relate to the way the memory checker tool identifies leaks in the software: all the memory areas that cannot be reached in some way at the end of the execution, are considered leaked. Unfortunately this often enough counts in the memory areas that are allocated at one point and would just be freed at the end of the process, which can be said not to be a leak at all.

All the memory areas that are only to be freed at the end of a process don’t really need to be freed at all: the operating system will take care of that; leaks are those where the memory is used up and never freed nor used during execution; especially bad are those leaks that happen inside either iterative cycles or upon similar inputs, because they build up.

Actually freeing up the resources that does not need to be freed up can increase the size of the executable section of a binary (executable or library) for no good reason during production usage of the software, it is thus often considered a bad idea. At the same time, freeing them up makes it easier to find the actual bad leaks in the software. My choice for this is to write the actual source code to do the freeing up, but only compile it when debug code is enabled, just like assertions and debug output.

This way, the production builds won’t have unneeded code, while the debug builds will give proper results out on tools like valgrind. But it doesn’t stop at freeing the resources at the end of the main() function, because sometimes you don’t preserve reference to the actual memory areas allocated during initialization (since they are not strictly needed), or you just keep them as static local variables, which are both reported as unreachable by the end of the process, and are not easy to free up. To work this out, in feng, I’ve been using a slightly more sophisticated solution, making use of a feature of recent GCC and ICC versions: destructor functions.

Destructor functions are called within the fini execution path, after the main() and before the process is unloaded and the resource freed automatically. Freeing the memory at that point is the ideal situation to make sure that all the resources are freed up properly. Of course, this reduces the cases when the code is enabled to having both debug system enabled, and a compiler that supports the destructor functions. But this is a broad enough definition to allow proper development. A simple check with autoconf and the thing is done.

By switching the local static variables with unit-static variables (which, once compiled, are basically the same thing), and adding a cleanup function, almost all memory can be freed without having an explicit deallocation function. There is still the matter of initializing these memory areas; in this case there are two options: either you just initialize them explicitly, or you initialize them the first time they are needed. Both these options allow for the size of the structure to be chosen at runtime (from a configuration file), for example for the number of configuration contexts to initialize. The third way, that is to initialize them with constructor functions, also works out well for fixed-size memory areas, but since I don’t really have a good reason for which they could be better than the alternative the to just call a function during initialization, I don’t have any reason to take them into consideration.

For dynamic initialization, the interface usually used is the “once” interface, which also allows for multithreaded applications to have run-once blocks of code without having to fiddle with mutexes and locks; this is implemented by the pthread_once_t handling functions from the POSIX thread interface, or GOnce as provided by glib. This allows delayed allocation and initialization of structures which is especially useful for parts of the code that are not used constantly but just in some specific cases.

All the code can be found in your local feng repository or the nearest git server!

CrawlBot Wars

Everybody who ever wanted to write a “successful website” (or more recently, thanks to the Web 2.0 hype, a “successful blog”) knows the bless and curse of crawlers, or bots, that are unleashed by all kind of entities to scan the web, and report the content back to their owners.

Most of these crawlers are handled by search engines, such as Google, Microsoft Live Search, Yahoo! and so on. With the widespread use of feeds, at least Google and Yahoo! added to their standard crawler bots also feed-specific crawlers that are used to aggregate blogs and other feeds into nice interfaces for their users (think Google Reader). Together with this kind of crawlers, though, there are less useful, sometimes nastier crawlers that either don’t respond to search engines, or respond to search engines whose ethical involvement makes somewhat wonder.

Good or bad, at the end of the day you might not want some bots to crawl your site; some Free Software -bigots- activists some time ago wanted, for instance, to exclude the Microsoft bot from their sites (while I have some other ideas), but there are certain bots that are even more useful to block, like the so-called “marketing bots”.

You might like Web 2.0 or you might not, but certainly lots of people found the new paradigm of Web as a gold mind to make more money out of content others have written – incidentally these are not, like RIAA, MPAA and SIAE insist, the “pirates” that copy music and movies, but rather companies whose objective is to provide other companies with marketing research and data based on content of blogs and similar services. While some people might be interested in getting their blog scanned by these crawlers either way, I’d guess that for most users who host their own blog this is just a waste of bandwidth: the crawlers tend to be quite pernicious since they don’t use If-Modified-Since or Etag headers in their request, and even when they do, they tend to make quite a few requests on the feeds per hour (compare this with Google’s Feedfetcher bot that requires at most one copy of the same feed per hour – well, if it isn’t confused by multiple compatibility redirects like it unfortunately is with my main blog).

While there is a voluntary exclusion protocol (represented by the omni-present robots.txt file), only actually “good” robots do consider that, while evil or rogue robots can simply ignore it. Also, it might be counter-productive to block rogue robots even when they do look at it. Say that a rogue robot wants your data, and to pass as a good one is advertising itself in the User-Agent string, complete with a link to a page explaining what it’s supposedly be doing, and accepting the exclusion. If you exclude it in robots.txt you can give it enough information to choose a _different_ User-Agent string that is not listed in the exclusion protocol.

One way to deal with the problem is by blocking the requests at the source, answering straight away with an HTTP 403 (Access Denied) on the web server when making a request. When using the Apache web server, the easiest way to do this is by using modsecurity and a blacklist rule for rogue robots, similar to the antispam system I’ve been using for a few months already. The one problem I see with this is that Apache’s mod_rewrite seem to be executed _before_ mod_security, which means that for any request that is rewritten by compatibility rules (moved, renamed, …) there is first a 301 response and just after that an actual 403.

I’m currently working on compiling such a blacklist by analysing the logs of my server, the main problem is deciding which crawlers to block and which to keep. When the description page explicitly states they are marketing research, blocking them is quite straightforward; when they _seem_ to provide an actual search service, that’s more shady, and it turns down to checking the behaviour of the bot itself on the site. And then there are the vulnerability scanners.

Still, it doesn’t stop here: given that in the Google description of GoogleBot they provide a (quite longish to be honest) method to verify that a bot is actually GoogleBot as it advertises itself to be, one has to assume that there are rogue bots out there trying to pass for GoogleBot or other good and lecit bot. This is very likely the case because some website that are usually visible only by registered users make an exception for search engine crawlers to access and index their content.

Especially malware, looking for backdoors into a web application, is likely to forge the User-Agent of a known good search engine bot (that is likely _not_ blocked by the robots.txt exclusion list), so that it doesn’t fire up any alarm in the logs. So finding “fake” search engine bots is likely to be an important step in securing a webserver running webapplications, may them be trusted or not.

As far as I know there is currently no way in Apache to check that a request actually does come from the bot it’s declared to come from. The nslookup method that Google suggests works fine for a forensic analysis but it’s almost impossible to perform properly with Apache itself, and not even modsecurity, by itself, can do much about that. On the other hand, there is one thing in the recent 2.5 versions of modsecurity that can be probably used to implement an actually working check: the LUA scripts loading. Which is what I’m going to work on as soon as I find some extra free time.

5 lines RSS reader

Recently while creating AXANT Labs we decided to put inside the page a little RSS aggregator which should mix news from our projects, at first we took a look at Planet, but it was a bit too big for our needing so we developed this short RSS feed reader using Universal Feed Parser. I’m sharing this as the sources are really compact and might be useful in other situations

import feedparser, operator, time

feeds = (“http://blog.axant.it/feed”, “http://www.lscube.org/rss.xml”)

feeds = map(lambda x : feedparser.parse(x).entries, feeds)

feeds = reduce(operator.concat, feeds)

feeds = sorted(feeds, lambda x,y : cmp(y.date_parsed, x.date_parsed))

for entry in feeds: print ‘%s (%s) -> %s’ % (entry.title, time.strftime(“%Y-%m-%d %H:%M”, entry.date_parsed), entry.description)

Autotools Come Home

With our experience as Gentoo developers, me and Luca have had experience with a wide range of build systems; while there are obvious degrees of goodness/badness in build system worlds, we express our preference for autotools over most of the custom build systems, and especially over cmake-based build systems, that seem to be high on the tide thanks to KDE in the last two years.

I have recently written my views on build systems: in which I explain why I dislike CMake and why I don’t mind it when it is replacing a very-custom bad build system. The one reason I gave for using CMake is the lack of support for Microsoft Visual C++ Compiler, which is needed by some type of projects under Windows (GCC still lacks way too many features); this starts to become a moot point.

Indeed if you look at the NEWS file for the latest release (unleashed yesterday) 1.11, there is this note:

– The `depcomp’ and `compile’ scripts now work with MSVC under MSYS.

This means that when running configure scripts under MSYS (which means having most of the POSIX/GNU tools available under the Windows terminal prompt), it’s possible to use the Microsoft compiler, thanks to the compile wrapper script. Of course this does not mean the features are on par with CMake yet, mostly because all the configure scripts I’ve seen up to now seem to expect GCC or compatible compilers, which means that it will require for more complex tests, and especially macro archives, to replace the Visual Studio project files. Also, CMake having a fairly standard way to handle options and extra dependencies, can have a GUI to select those, where autotools are still tremendously fragmented in that regard.

Additionally, one of the most-recreated and probably useless features, the Linux-style quiet-but-not-entirely build, is now implemented directly in automake through the silent-make option. While I don’t see much point in calling that a killer feature I’m sure there are people who are interested in seeing that.

While many people seem to think that autotools are dead and that they should disappear, there is actually fairly active development behind them, and the whole thing is likely going to progress and improve over the next months. Maybe I should find the time to try making the compile wrapper script work with Borland’s compiler too, of which I have a license; it would be one feature that CMake is missing.

At any rate, I’ll probably extend my autotools guide for automake 1.11, together with a few extras, in the next few days. And maybe I can continue my Autotools Mythbuster series that I’ve been writing on my blog for a while.

On Sprox

Lately I have tried to use Sprox with Elixir.

First of all I have to thank percious. He is incredibly reliable and helpful. There is actually a bug in sprox that makes him threat one-to-many relationships as one-to-one relationships and makes it show a single selection field instead of a multiple selection field. This can be avoided changing the field type to sprox.widgets.PropertyMultipleSelectField but percious has been so kind to fix it on the fly while I was testing the problem for him and now sprox correctly detects the field type by default.

Bad enough there is a big problem with Elixir. As Sprox probably creates internal instances of the Entity you pass to him this causes an  undesidered behaviour. When using SQLAlchemy, until you add the object to the session it won’t be saved on the database, but with Elixir creating an object means saving it on the db and this results in having multiple empty entities saved on db each time you open a forum generated with Sprox. If you have any required field in your entity your application will crash as it won’t be able to save it.

In the end I had to switch back from Elixir to DeclarativeBase for my application and everything worked fine

Using Elixir with TG2

I had to spend some time to permit to a project of ours to use Elixir inside TG2. Maybe someone with more experience than me might have a better answer, but I have been able to make Elixir work this way:

First of all I had to make Elixir use my TG2 metadata and session by adding to each model file that has a class inheriting from elixir.Entity this line:

from project_name.model import metadata as __metadata__, DBSession as __session__

Then I had to switch to model __init__.py and add elixir.setup_all() to init_model function just after DBSession.configure. This is really important as makes Elixir create all the SQLAlchemy tables and without this you won’t see anything happen for your elixir based models.

Also we can now import inside your model scope every elixir.Entity inherited class like we usually do for DeclarativeBase children.

Thoughts of the day

Actually this afternoon I was talking with my colleagues here at AXANT about the discussion pending over tail recursion elimination that is proceeding around Python, so they wanted me to post about that.

But, as I think that functional programming doesn’t really have to be embedded inside python if we don’t need it, I don’t really want to talk about that! (I’m not going to talk about that, so please, leave this post now, because I won’t say a word!) Usually I think about functional programming as a good tool to implement generic programming and also Tail recursion elimination can be an optimization inside compiled languages where a loop costs less than a function call as it is just an integer increment, and even so on modern architectures this might not be true, probably on Python it might even be worse.

In Python performing a function call requires: performing a function call (off course…), and iterating on something requires: incrementing an iterator (off course again?), which means performing a function call. So both would cost the same, but I think that probably function calls are best optimized than iterator incrementation inside CPython as the first is implemented in pure C while the second is mostly performed in python itself. Anyway the complexity of the algorithm from which you are eliminating tail recursion doesn’t change at all, so it isn’t a real optimization.

Fine, I ended talking about tail recursion elimination even if I didn’t want to!

Time for the real post!

This evening while surfing around the web I found an interesting consideration that makes me hope that for the future computer science might not be in the hands of business people as much as I think. In the recent years it seemed that for business people Ruby was becoming the new Java, every company thought that it could solve all its problems by just switching to it. All the interest around Ruby was confirmed by the fact that In the 2005 O’Reilly reported that Ruby books sales where up by around a 1500%, looking again at the same report now for 2008 it seems that Ruby hype is fading away (look at ObjC increment! Might that be cause of iPhone? Uhm, I think that Java will see something like that cause of Android launch).

This is making me think that maybe in the future we will be able to use Ruby for what it is and where it better behaves, instead of trying to make everything with it without being able to tell to our customers that “it isn’t a good idea” as it would receive as the only response that TechCrunch reported that “random name company” had increased its income by using Ruby. (I’m still trying to understand how it is possible that they usually don’t realize that “random name company” increased its income probably cause of what they did by using Ruby, not by the fact of using Ruby itself)

Maybe some day Italian companies will stop trying to take decisions about their IT infrastructure using stock market values and will start to realize that using Drupal, Plone or even Django might be a better solution than spending thousands of euros in implementing their own CMS from scratch using RoR just because they wanted to use RoR (At least please use Railfrog or something like that if you really want RoR!). Software development languages and frameworks should be the tool to achieve your target, not the target itself. I’m looking forward to the future or to the nearest sniper rifles shop.

Profiling Proficiency

Trying to assess the impact of the Ragel state machine generator on software, I’ve been trying to come down with a quick way to seriously benchmark some simple testcases, making sure that the results are as objective as possible. Unfortunately, I’m not a profiling expert; to keep with the alliteration in the post’s title, I’m not proficient with profilers.

Profiling can be done either internally to the software (manually or with GCC’s help), externally with a profiling software, or in the kernel with a sampling profiler. Respectively, you’d be found using the rdtsc instruction (on x86/amd64), the gprof command, valgrind and one of either sysprof or oprofile. Turns out that almost all these options don’t give you anything useful in truth, and you have to decide whether to get theoretical data, or practical timing, and in both cases, you don’t have a very useful result.

Of that list of profiling software, the one that looked more promising was actually oProfile; especially more interesting since the AMD Family 10h (Barcelona) CPUs supposedly have a way to have accurate reporting of instruction executed, which should combine the precise timing reported by oProfile with the invariant execution profile that valgrind provides. Unfortunately oprofile’s documentation is quite lacking, and I could find nothing to get that “IBS” feature working.

Since oprofile is a sampling profiler, it has a few important points to be noted: it requires kernel support (which in my case required a kernel rebuild because I had profiling support disabled), it requires root to set up and start profiling, through a daemon process, by default it profiles everything running in the system, which on a busy system running a tinderbox might actually be _too_ much. Support for oprofile in Gentoo also doesn’t seem to be perfect, for instance there is no standardised way to start/stop/restart the daemon, which might not sound that bad for most people, but is actually useful because sometimes you forget to have the daemon running and you try to start it time and time again and it doesn’t seem to work as you expect. Also, for some reason it stores the data in /var/lib instead of /var/cache; this wouldn’t be a problem if it wasn’t that if you don’t pay enough attention you can easily see your /var/lib filesystem filling up (remember: it runs at root so it bypasses the root-reserved space).

More worrisome, you’ll never get proper profiling on Gentoo yet, at least for AMD64 systems. The problem is that all sampling profilers (so the same holds true for sysprof too) require frame pointer information; the well-known -fomit-frame-pointer flag, that allows to save precious registers on x86, and used to break debug support, can become a problem, as Mart pointed out to me. The tricky issue is that since a few GCC and GDB versions, the frame pointer is no longer needed to be able to get complete backtraces of processes being debugged; this meant that, for instance on AMD64, the flag is now automatically enabled by -O2 and higher. On some architecture, this is still not a problem to sample-based profiling, but on AMD64 it is. Now, the testcases I had to profile are quite simple and minimal and only call into the C library (and that barely calls the kernel to read the input files), so I only needed to have the C library built with frame pointers to break down the functions; unfortunately this wasn’t as easy as I hoped.

The first problem is that the Gentoo ebuild for glibc does have a glibc-omitfp USE flag; but does not seem to explicitly disable frame pointer omitting, just explicitly enable it. Changing that, thus, didn’t help; adding -fno-omit-frame-pointer also does not work properly because flags are (somewhat obviously) filtered down; adding that flag to the whitelist in flag-o-matic does not help either. I have yet to see it working as it’s supposed to, so for now oprofile is just accumulating dust.

My second choice was to use gprof together with GCC’s profiling support, but that only seems to provide a breakdown in percentage of execution time, which is also not what I was aiming for. The final option was to fall back to valgrind, in particular the callgrind tool. Unfortunately the output of that tool is not human-readable, and you usually end up using software like KCacheGrind to be able to parse it; but since this system is KDE-free I couldn’t rely on that; and also KCacheGrind does not provide an easy way to extract data out of it to compile benchmark tables and tests.

On the other hand, the callgrind_annotate script does provide the needed information, although still in a not exactly software-parsable fashion; indeed to find the actual information I had to look at the main() calltrace and then select the function I’m interested in profiling, that provided me with the instruction count of execution. Unfortunately this does not really help me as much as I was hoping, since it tells me nothing on how much time will it take for the CPU to execute the instruction (which is related to the amount of cache and other stuff like that), but it’s at least something to start with.

I’ll be providing the complete script, benchmark and hopefully results once I can actually have a way to correlate them to real-life situations.

PyHP-SVN got a new parser

PyHP got a new parser on subversion repository (click here to download), the new parser is still experimental and needs a lot of testing, but it has some interesting new features:

  • Parsing errors reporting
  • Block based indentation
  • Faster parsing and inclusion

First of all, now parsing errors like unmatching tags are reported on pyhp log. It will be reported the line and the file for which the error has happened (also if the file has been included).

Next now you can indent your code as you wish inside your <?pyhp?> code blocks. PyHP will reformat the indentation based on the first line indentation. You will just have to indent your code in the same way inside the <?pyhp?> block, but the blocks can be indented as html tags now.

The new parsers is written with ragel state machines generator and it performs faster then the old one.

As the new parser might have bugs you can still compile enabling the old parser by passing –enable-newparser=no to the configure script.

Python 3.0

Python 3.0 changes have been defined “arbitrary and academic“, for no real use for real programmers. Before starting to talk about python changes, I would think about the meaning of “real programmers” in the sentence before.

How would we define a “real programmer”?

  • Are hackers “real programmers”?
  • Are professors of language theory “real programmers”?
  • Are experts in algorithm theory “real programmers”?
  • Are people who do coding for a living “real programmers”?

Probably people inside the first definition didn’t mind a lot about how Python 2.x did things and could live with it. They will just be angry because suddenly they tools and scripts will stop working. But those people can also go on with Python 2.x as they don’t tend to have complex software maintainment processes and would probably just be happy having a way to run it.

People inside the second and third definition will probably be happier as Python 3.0 clean up will satisfy their concept of elegance a bit more.

People inside the fourth definition will probably be angry thinking of how many hours they will have to spend porting their code to Python 3.0 as their code probably will have to live for the next 5 years as hundred of customers will have to use it.

This might be right, but I think that we have to go deeper inside the question to really understand the effect of those changes. I actually do code for a living, and inside the majority of our projects we try to refactor often to make long time maintaining of the code as easy as possible. As us, there will probably be hundreds of other companies who often perform refactoring, and all of us do it for one common goal: “Keeping the code as simple as possible to make it clear and obvious even after years”.

Usually this means things like:

  • converting strange behaviours to standard one inside the application domain
  • converting spread values to constants with a meaning
  • removing code duplicates to have only one unit of work
  • change function and classes to make their role as simple to understand as possible
  • Separate mixed code blocks in methods/classes with a clear scope
  • and things like that…

Uhm, wait for a second… isn’t this exactly what Python is doing for the 3.0 release?

  • They changed the print keyword to a standard function conforming it to the behaviour of any other function inside the language.
  • They removed some of the duplicated modules like urllib and urllib2
  • They took care of modules that did the same things trying to localize everything in only one clear place
  • They moved things inside more clear namespaces instead of having them all mixed in the global one
  • They removed old modules superseded by new one that still where available.
  • They renamed some things to make them conformed to python conventions.

In the end they actually forced us to perform refactoring of our own projects refactoring a piece of our projects: “the underlying library and language”. This is just the same thing that happens everyday with self written libraries that you reuse in multiple projects inside your company.

But if thousand of people are performing refactoring everyday on things from which their lives depend on, if Google is accepting Python 3.0 when most of their business uses Python (and they also hired Guido so they could probably influence Python 3.0 development a lot if they wanted) how can it be such a bad thing?

I’m starting to think that people blaming python for its changes are mostly angry because they have been forced to perform refactoring when they didn’t want to or just saw their software stop working. But actually you don’t have to change things, your software didn’t stop working and Python 2.x will be maintained for at least one year. You have the choice to do what you want.

As for me I’ll probably just thank the Python team for refactoring libraries and things that I use everyday without asking me to pay a thing. And I’ll just perform migration to 3.0 of my projects using 2to3 tool and something that I already perform everyday: code refactoring…