Gluster Experience (part one)

Recently we started to dabble with clustering file systems, in particular a rather new and promising one called gluster

So far, even if people suggests to use the upcoming 2.0 version we found already some annoying glitches in the 2.0.0rc1, namely the writebehind capability wasn’t working at all, reducing the writing speed to 3Mb/s (on a gigabit link to a cluster of 3 nodes each one with a theoretical peak speed of 180Mb/s), luckily they fixed it in their git, sadly the peak speed for a single node is about 45Mb/s per single transfer and around 75Mb/s when aggregating 5 concurrent transfers, nfs on the same node reaches 95Mb/s on single transfer.

Since looks like there is lots of time wasted waiting somehow (as the experiment with concurrent transfer hints) we’ll probably investigate more and obviously look for advices.

The current setup uses iocache+writebehind as performance translators and maps the nodes as 6 bricks (2 bricks exported per node), replicating 3 times (one for each node) and using dht to join the 2 replicating groups.

PyHP-SVN got a new parser

PyHP got a new parser on subversion repository (click here to download), the new parser is still experimental and needs a lot of testing, but it has some interesting new features:

  • Parsing errors reporting
  • Block based indentation
  • Faster parsing and inclusion

First of all, now parsing errors like unmatching tags are reported on pyhp log. It will be reported the line and the file for which the error has happened (also if the file has been included).

Next now you can indent your code as you wish inside your <?pyhp?> code blocks. PyHP will reformat the indentation based on the first line indentation. You will just have to indent your code in the same way inside the <?pyhp?> block, but the blocks can be indented as html tags now.

The new parsers is written with ragel state machines generator and it performs faster then the old one.

As the new parser might have bugs you can still compile enabling the old parser by passing –enable-newparser=no to the configure script.

Brother mfc5890 on Linux Gentoo 64bit

Actually we bought a new MFC-5890  printer for the office, we have a quite homogeneous environment with mainly Linux systems, a some with Gentoo, a some with Ubuntu, a few OSX systems and two Windows. We have been quite satisfied with the fact that Brother release packages for ubuntu, but we laso had to face the fact that there wasn’t a quick way to install it on Gentoo.

So I spent some time to edit the Brother package to install on Gentoo 64bit and create a .tgz to permit to other Gentoo users to install the printer drivers in a quick way.

You can download the package from here: http://www.axant.it/static/files/Brother.tgz

Simply unpack it in /usr/local (the paths are not relative so you must unpack it inside /usr/local), enter the directory /usr/local/Brother/Printer/mfc5890cn/cupswrapper and run cupswrappermfc5890cn it will perform every step needed by the installation and will restart CUPS, you will then be able to configure your printer from cups.

Python 3.0

Python 3.0 changes have been defined “arbitrary and academic“, for no real use for real programmers. Before starting to talk about python changes, I would think about the meaning of “real programmers” in the sentence before.

How would we define a “real programmer”?

  • Are hackers “real programmers”?
  • Are professors of language theory “real programmers”?
  • Are experts in algorithm theory “real programmers”?
  • Are people who do coding for a living “real programmers”?

Probably people inside the first definition didn’t mind a lot about how Python 2.x did things and could live with it. They will just be angry because suddenly they tools and scripts will stop working. But those people can also go on with Python 2.x as they don’t tend to have complex software maintainment processes and would probably just be happy having a way to run it.

People inside the second and third definition will probably be happier as Python 3.0 clean up will satisfy their concept of elegance a bit more.

People inside the fourth definition will probably be angry thinking of how many hours they will have to spend porting their code to Python 3.0 as their code probably will have to live for the next 5 years as hundred of customers will have to use it.

This might be right, but I think that we have to go deeper inside the question to really understand the effect of those changes. I actually do code for a living, and inside the majority of our projects we try to refactor often to make long time maintaining of the code as easy as possible. As us, there will probably be hundreds of other companies who often perform refactoring, and all of us do it for one common goal: “Keeping the code as simple as possible to make it clear and obvious even after years”.

Usually this means things like:

  • converting strange behaviours to standard one inside the application domain
  • converting spread values to constants with a meaning
  • removing code duplicates to have only one unit of work
  • change function and classes to make their role as simple to understand as possible
  • Separate mixed code blocks in methods/classes with a clear scope
  • and things like that…

Uhm, wait for a second… isn’t this exactly what Python is doing for the 3.0 release?

  • They changed the print keyword to a standard function conforming it to the behaviour of any other function inside the language.
  • They removed some of the duplicated modules like urllib and urllib2
  • They took care of modules that did the same things trying to localize everything in only one clear place
  • They moved things inside more clear namespaces instead of having them all mixed in the global one
  • They removed old modules superseded by new one that still where available.
  • They renamed some things to make them conformed to python conventions.

In the end they actually forced us to perform refactoring of our own projects refactoring a piece of our projects: “the underlying library and language”. This is just the same thing that happens everyday with self written libraries that you reuse in multiple projects inside your company.

But if thousand of people are performing refactoring everyday on things from which their lives depend on, if Google is accepting Python 3.0 when most of their business uses Python (and they also hired Guido so they could probably influence Python 3.0 development a lot if they wanted) how can it be such a bad thing?

I’m starting to think that people blaming python for its changes are mostly angry because they have been forced to perform refactoring when they didn’t want to or just saw their software stop working. But actually you don’t have to change things, your software didn’t stop working and Python 2.x will be maintained for at least one year. You have the choice to do what you want.

As for me I’ll probably just thank the Python team for refactoring libraries and things that I use everyday without asking me to pay a thing. And I’ll just perform migration to 3.0 of my projects using 2to3 tool and something that I already perform everyday: code refactoring…

Alternatives 2.0

You may know that ruby and rails hype is fading and there are lots of interesting platform that have the same good points of rails like the rapid development even if they are almost unknown. Some, like the perl catalyst, are faster, other like the new python turbogears provide everything rails gives you and overall in a cleaner, more rational shape.

Obviously you hardly heard about them or how good they are since there isn’t enough people blogging and gloating about how good they are or how many kool points your achieve by using them.

Well I’ll start some small post about it with some obviously biased comparisons, just to raise curiosity and foster discussion, let’s start with the template engines.

Catalyst suggests the use of TT by default. It is fast, quite lightweight and simplifies the perl constructs a bit.

Rails has erb as template engine with some faster implementations than the stock ones like erubis (that usually saves your day once you notice how pitiful rails is about performances).

Turbogears let you pick whatever you like, but right now suggests/bundles genshi since it’s quite fast, uses an xml compliant markup so you may edit it with your favourite xml/xhtml editor, the parser will error out if what you wrote isn’t compliant to your dtd/schema giving you a nice output pointing where it is broken like tidy does for you static content, since it is xml you may generate rich xhtml pages embedding svg and mathml with relative ease.

That’s all I hope someone will debate if is better have your template engine use a more compact markup even if then you cannot use it common tools to edit your views structure or the possibility to fully harness xml good points out weights the relative verbosity.

I will discuss ORM and models in the next post.

LSCube Flux 1.0 Released

After a few months of work and tests the LSCube project released the first component of the imminent rewrite for the ESOF 2010 event.

The component is Flux, it is the RTP Mixer/Stream manipulator of the project. It has born to replace the old felix tool which was the core tool to perform live streaming injecting packets inside the feng server. Now Flux for the first release has the same features of felix and aims to implement more complete and powerful packets manipulation like transcoding, overlaying and mixing.

Also Flux has a more clean and easy to understand architecture making possible for anyone to implement new IO Classes or Parser in an easy and clean manner.

PyHP Sessions support improvements

Good news for PyHP. Latest SVN version got a few bug fixes and a big session management improvement. Now session collision is handled in a more coherent way, it’s a major improvement and I suggest to everyone using it to upgrade to svn revision. Also a few fixes have been implemented to catch programming errors by the developer that caused PyHP to crash making it more stable.

Liskov Substitution Principle Reflection

I have found a quite interesting shakespearean article about Liskov Substitution Principle. It is mainly about C++, but being it the core of Object Oriented paradigm I think that it might be interesting for everyone.

Most of the think is nowadays resolved with mixins and Policy Based Design, but they run away a bit from pure Objects Oriented and so is still an interesting reflection

So if you are interested take a look at this  Blog Post

FlyPDF now has C# support

FlyPDF the multi-language PDF generation library we develop in collaboration with OS3 now has C# (.Net) support bindings. Also improvements have been made to the C bindings implementing callbacks to override default header and footer generation. Also the Write method for flowing text has been made available to all the language bindings including C++ and Python.

Also Fabio Rotondo from OS3 had a speech about FlyPDF at the Italian PyCon which had quite a big success.

PyHP now supports multiple headers

SVN revision 24 of PyHP now supports multiple headers!

This broke code that iterates over the old headers_in dictionary because now every value inside headers_in is a list instead of a string. On the other side code writing inside headers_out should continue to work because it is possible to insert both a string or a list inside headers_out dictionary to set only one header or multiple headers.

You can try it from here: http://pyhp.svn.sourceforge.net/viewvc/pyhp.tar.gz?view=tar