A great part of VPS users run a CentOS, also a lot of real servers run CentOS nowadays. Usually this happens as CentOS users hope that it should be more stable, tested and secure then other distributions as being derived from an enterprise commercial one.
This might be true for most cases, but while developing PyHP we found a lot of “unknown mysterious bugs” that happened only on CentOS. After some investigation we found that apr_stat on CentOS always returns 0 as file size (this made quite interesting to allocate buffers or use mmap to read files) and also that bucket brigades had a strange behaviour, and as strange I mean that in some conditions they never considered terminated the request and caged the user in a wonderful infinite loop (as in “while(true)” not as teleporting the user to apple head quarters).
As a big percentage of PyHP users rely on CentOS we had to rewrite some parts to use lstat instead of apr_stat and also move away from bucket brigades to ap_should_client_block. If you are using CentOS and find any problem with PyHP try to upgrade to the latest svn trunk, also if you are using the svn trunk please upgrade to the latest one as there was a bug caused by the process of migrating from bucket brigades to apr_should_client_block that might prevent your users from being able to upload big files.
PyHP got the ability to save sessions on a database, the new PyHPSessionBackend config variable can be set to “db” or “file”, and PyHPDBSessionUser, PyHPDBSessionPass and PyHPDBSessionUri can be specified to set which database to use to store sessions and which user to access the database.
This works only on mysql as mysql is currently the only database supported by the pyhp database backend. Sessions will be saved in the pyhp_sessions table which will be created by pyhp itself, so remember to give to the user you choose the ability to create tables inside the selected database.
Luckily the issue I was experiencing with gluster 2.0.0rc1 was just an ugly bug squashed in the 2.0.0rc2 release. Right now I’m keeping the configuration I blogged about and now we are thinking about topologies and expansion.
Right now the big issue is trying to provide enough bandwidth for write in replication since a single Gbit link isn’t enough. It’s too late to order infiniband so I’m stuck thinking what is the best topology given we have a single writer, 70 readers, 3 storage (gluster) and about 4 24port gigabit switches with 10Gbit expansion link unused and at least 2 gigabit interfaces per node.
More will follow soon
PS: I’m wondering how hard would be trying to get a round-robin translator to accelerate replicated writes by just issuing a write from the client node to one of the N replicating nodes and then have them sync automatically by themselves…
Recently we started to dabble with clustering file systems, in particular a rather new and promising one called gluster
So far, even if people suggests to use the upcoming 2.0 version we found already some annoying glitches in the 2.0.0rc1, namely the writebehind capability wasn’t working at all, reducing the writing speed to 3Mb/s (on a gigabit link to a cluster of 3 nodes each one with a theoretical peak speed of 180Mb/s), luckily they fixed it in their git, sadly the peak speed for a single node is about 45Mb/s per single transfer and around 75Mb/s when aggregating 5 concurrent transfers, nfs on the same node reaches 95Mb/s on single transfer.
Since looks like there is lots of time wasted waiting somehow (as the experiment with concurrent transfer hints) we’ll probably investigate more and obviously look for advices.
The current setup uses iocache+writebehind as performance translators and maps the nodes as 6 bricks (2 bricks exported per node), replicating 3 times (one for each node) and using dht to join the 2 replicating groups.
PyHP got a new parser on subversion repository (click here to download), the new parser is still experimental and needs a lot of testing, but it has some interesting new features:
- Parsing errors reporting
- Block based indentation
- Faster parsing and inclusion
First of all, now parsing errors like unmatching tags are reported on pyhp log. It will be reported the line and the file for which the error has happened (also if the file has been included).
Next now you can indent your code as you wish inside your <?pyhp?> code blocks. PyHP will reformat the indentation based on the first line indentation. You will just have to indent your code in the same way inside the <?pyhp?> block, but the blocks can be indented as html tags now.
The new parsers is written with ragel state machines generator and it performs faster then the old one.
As the new parser might have bugs you can still compile enabling the old parser by passing –enable-newparser=no to the configure script.
You may know that ruby and rails hype is fading and there are lots of interesting platform that have the same good points of rails like the rapid development even if they are almost unknown. Some, like the perl catalyst, are faster, other like the new python turbogears provide everything rails gives you and overall in a cleaner, more rational shape.
Obviously you hardly heard about them or how good they are since there isn’t enough people blogging and gloating about how good they are or how many kool points your achieve by using them.
Well I’ll start some small post about it with some obviously biased comparisons, just to raise curiosity and foster discussion, let’s start with the template engines.
Catalyst suggests the use of TT by default. It is fast, quite lightweight and simplifies the perl constructs a bit.
Rails has erb as template engine with some faster implementations than the stock ones like erubis (that usually saves your day once you notice how pitiful rails is about performances).
Turbogears let you pick whatever you like, but right now suggests/bundles genshi since it’s quite fast, uses an xml compliant markup so you may edit it with your favourite xml/xhtml editor, the parser will error out if what you wrote isn’t compliant to your dtd/schema giving you a nice output pointing where it is broken like tidy does for you static content, since it is xml you may generate rich xhtml pages embedding svg and mathml with relative ease.
That’s all I hope someone will debate if is better have your template engine use a more compact markup even if then you cannot use it common tools to edit your views structure or the possibility to fully harness xml good points out weights the relative verbosity.
I will discuss ORM and models in the next post.
After a few months of work and tests the LSCube project released the first component of the imminent rewrite for the ESOF 2010 event.
The component is Flux, it is the RTP Mixer/Stream manipulator of the project. It has born to replace the old felix tool which was the core tool to perform live streaming injecting packets inside the feng server. Now Flux for the first release has the same features of felix and aims to implement more complete and powerful packets manipulation like transcoding, overlaying and mixing.
Also Flux has a more clean and easy to understand architecture making possible for anyone to implement new IO Classes or Parser in an easy and clean manner.
Good news for PyHP. Latest SVN version got a few bug fixes and a big session management improvement. Now session collision is handled in a more coherent way, it’s a major improvement and I suggest to everyone using it to upgrade to svn revision. Also a few fixes have been implemented to catch programming errors by the developer that caused PyHP to crash making it more stable.
FlyPDF the multi-language PDF generation library we develop in collaboration with OS3 now has C# (.Net) support bindings. Also improvements have been made to the C bindings implementing callbacks to override default header and footer generation. Also the Write method for flowing text has been made available to all the language bindings including C++ and Python.
Also Fabio Rotondo from OS3 had a speech about FlyPDF at the Italian PyCon which had quite a big success.
SVN revision 24 of PyHP now supports multiple headers!
This broke code that iterates over the old headers_in dictionary because now every value inside headers_in is a list instead of a string. On the other side code writing inside headers_out should continue to work because it is possible to insert both a string or a list inside headers_out dictionary to set only one header or multiple headers.
You can try it from here: http://pyhp.svn.sourceforge.net/viewvc/pyhp.tar.gz?view=tar