Software Development
libacr published on PyPI
Wednesday, July 28th, 2010 | Computer Science, Opensource, Software Development, Web | Comments
As we are moving to make libacr a very cool and functional python CMF, we published libacr on PyPI!
Now you just need to:
pip install libacr
and you are done, as easy as saying!
More details on http://pypi.python.org/pypi/libacr/
GCC using C++
Monday, May 31st, 2010 | Opensource, Software Development | Comments
I got this news http://gcc.gnu.org/ml/gcc/2010-05/msg00705.html and it puzzled me a bit: you have the system C compiler depending on C++, making in fact it no more self hosting.
That alone makes me thing whoever decided and whoever requested that is next to suicidal. GCC is known for having _quite_ a shaky C++ standard library AND ABI, as in having at least an incompatibility every major version and sometimes even with minor ones.
I do dislike C++ usage mostly on this basis, let alone the fact is a language overly large, with not enough people dabbling it properly, let alone being proficient.
There are already compilers using C++, one that many people find interesting is llvm. It doesn’t aim to be a system compiler and it’s not exactly self hosting.
Many already stated that would switch to llvm clang front-end once it reaches full maturity (now freebsd proved that this level has been pretty well archived), I didn’t consider to fully switch to it just because it concerned me the fact it depends on C++ and how easy is to have subtle yet major breakages in that language implementations.
llvm people look to me way more capable of managing C++ than GCC ones and I saluted with please the fact they already have a libc++ implementation.
Back about being suicidal, if I have to pick between people that did well on C++ and people that botched many time on the same field, who would I pick?
The current discussions in the GCC mailing list are about C++ coding style, which features to pick and which to forbid, rearchitecture the whole beast to use a “proper” hierarchy and such, basically some/(many?) want to redo everything with the new toy. That makes me think again that llvm will be a better target for the next months/year.
I hope there are enough GCC developers and/or concerned party that will fork gcc now and keep a C branch. Probably having a radical cleanup and refactor is a completely orthogonal issue and should be done no matter they’ll pick C++ or C as their implementation language, GCC has lots of cruft, starting from their bad usage of the autotools.
VideoLAN Web Plugin: xpi vs crx
Tuesday, April 27th, 2010 | Hardware, Opensource, Software Development, Uncategorized, Web | Comments
One of the main issue while preparing streaming solution is answering the obnoxious question:
- Question: Is possible to use the service through a browser?
- Answer: No, rtsp isn’t* http, a browser isn’t a tool for accessing any network content.
* Actually would be neat having rtsp support within the video tag but that’s yet another large can of worms
Once you say that you have half of your audience leaving. Non technical people is too much used to consider the browser the one and only key to internet. The remaining ones will ask something along those lines:
- Question: My target user is
a complete idiottechnically impairednaive and unaccustomed and could not be confronted with the hassle of a complex installation procedure, is there something that fits the bill? - Answer: VideoLAN Web Plugin
Usually that makes some people happy since it’s something they actually know or at least they have heard about. Some might start complaining since they experienced an old version and well it crashed a lot. What would you be beware of is the following one:
- Question: Actually I need to install the VideoLAN Web Plugin and it requires attention, isn’t there a quicker route?
- Answer: Yes xpi an crx for Firefox an Chrome
Ok, that answer is more or less from the future and it’s the main subject of this post: Seamless bundling something as big and complex as vlc and make our non tecnical and naive target user happy.
I picked the VideoLAN web plugin since it is actually quite good already, has a nice javascript interface to let you do _lots_ of nice stuff and there are people actually working on it. Additional points since it is available on windows and MacOSX. Some time ago I investigated how to use the extension facility of firefox to have the fabled “one click” install. The current way is quite straightforward and has already landed in the vlc git tree for the curious and lazy:
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#"> <Description about="urn:mozilla:install-manifest"> <em:id>vlc-plugin@videolan.org</em:id> <em:name>VideoLAN</em:name> <em:version>1.2.0-git</em:version> <em:targetApplication> <Description> <em:id>{ec8030f7-c20a-464f-9b0e-13a3a9e97384}</em:id> <em:minVersion>1.5</em:minVersion> <em:maxVersion>3.6.*</em:maxVersion> </Description> </em:targetApplication> </Description> </RDF>
Putting that as install.rdf in a zip containing a directory called plugins with libvlc, it’s modules and obviously the npapi plugin does the trick quite well.
Chrome now has something similar and it seems also easier so that’s what I put in the manifest.json:
{ "name": "VideoLAN", "version": "1.2.0.99", "description": "VideoLan Web Plugin Bundle", "plugins": [{"path":"plugins/npvlc.dll", "public":true }] }
Looks simpler and neater, isn’t it? Now we get to the problematic part about chrome extension packaging:
It is mostly a zip BUT you have to prepend to it a small header with more or less just the signature.
You can do that either by using chrome built-in facility or by a small ruby script. Reimplementing the same logic in Makefile using openssl is an option, for now I’ll stick with crxmake.
Then first test build for win32 are available as xpi and crx hosted on lscube.org as usual.
Sadly the crx file layout and the not so tolerant firefox xpi unpacker make impossible having a single zip containing both the manifest.xpi and the install.rdf served as xpi and crx.
by the way, wordpress really sucks
Remember me in Turbogears2
Wednesday, March 10th, 2010 | Software Development | Comments
One of the problems with TG2 is that the current version doesn’t support a “standard” way to remember the user after he closes the browser. We have been able to find a quick and dirty solution that we like to share here. Keep in mind that this solution only works with recent versions of repoze.who, this works with TG2.0.3, but might not work with previous releases of TG2.
Inside the login.html we set a cookie for the remember_me option to pass it to the controller and then inside the post_login we change the cookie.
Supposing you have a #remember_me checkbox inside your login.html you can add this to set the cookie:
function set_remember_cookie() { is_checked = jQuery('#remember_me:checked').length; if (is_checked) document.cookie = 'remember_me=1'; else document.cookie = 'remember_me=0'; } jQuery(document).ready(function() { set_remember_cookie(); jQuery('#remember_me').click(set_remember_cookie); }
Then inside your post_login method in the root controller you can place:
remember_me = request.cookies.get('remember_me', 0) try: remember_me = int(remember_me) except: remember_me = 0 if remember_me: request.identity['max_age'] = 2252000 # 30 days request.identity['userdata']= "max_age" # force cookie refresh
This would remember the user for 30 days even if he closes the browser.
Turbogears authentication over mongodb users database
Wednesday, March 3rd, 2010 | Software Development | Comments
As we saw that there isn’t a lot of documentation around about how to perform authentication in turbogears over mongodb we decided to create a simple code snippet and public it here to help people trying to obtain the same thing.
This is mainly a proof of concept and is quick and dirty way to obtain it. You will probably have something like ming as your model, instead of directly accessing mongo.
This code also validates password over the clear text one, you will probably have hashed passwords in your database, so remember to change validate_password method as required
To make it work you will have to place this code inside your config.app_cfg, it also expects you to have you database exposed as db inside your model
from my_app.model import db from zope.interface import implements from repoze.who.interfaces import IAuthenticator, IMetadataProvider from repoze.who.plugins.friendlyform import FriendlyFormPlugin from repoze.who.plugins.auth_tkt import AuthTktCookiePlugin from repoze.who.middleware import PluggableAuthenticationMiddleware def validate_password(user, password): return user['password'] == password class MongoAuthenticatorPlugin(object): implements(IAuthenticator) # IAuthenticator def authenticate(self, environ, identity): if not ('login' in identity and 'password' in identity): return None login = identity.get('login') user = db.users.find_one({'user_name':login}) if user and validate_password(user, identity.get('password')): return identity['login'] class MongoUserMDPlugin(object): implements(IMetadataProvider) def add_metadata(self, environ, identity): user_data = {'user_name':identity['repoze.who.userid']} identity['user'] = db.users.find_one(user_data) class MyAppConfig(AppConfig): auth_backend = 'sqlalchemy' #this is a fake, but it's needed to enable #auth middleware at least on TG2.0 login_url = '/login' login_handler = '/login_handler' post_login_url = None logout_handler = '/logout_handler' post_logout_url = None login_counter_name = None def add_auth_middleware(self, app, skip_authentication): cookie_secret = pylons_config.get('auth_cookie_secret', 'myapp_adsfsdfh3423') cookie_name = pylons_config.get('auth_cookie_name', 'myapp_auth') who_args = {} form_plugin = FriendlyFormPlugin(self.login_url, self.login_handler, self.post_login_url, self.logout_handler, self.post_logout_url, login_counter_name=self.login_counter_name, rememberer_name='cookie') challengers = [('form', form_plugin)] auth = MongoAuthenticatorPlugin() authenticators = [('mongoauth', auth)] cookie = AuthTktCookiePlugin(cookie_secret, cookie_name) identifiers = [('cookie', cookie), ('form', form_plugin)] provider = MongoUserMDPlugin() mdproviders = [('mongoprovider', provider)] from repoze.who.classifiers import default_request_classifier from repoze.who.classifiers import default_challenge_decider log_stream = None app = PluggableAuthenticationMiddleware(app, identifiers, authenticators, challengers, mdproviders, default_request_classifier, default_challenge_decider) return app base_config = MyAppConfig() base_config.renderers = [] base_config.package = my_app #Set the default renderer base_config.default_renderer = 'genshi' base_config.renderers.append('genshi') base_config.renderers.append('json') #Configure the base SQLALchemy Setup base_config.use_sqlalchemy = False base_config.model = my_app.model
(Val)grinding your code
Saturday, June 6th, 2009 | Software Development | Comments
One of of the most exoteric tool when working on software projects that developers should be using more often than they do, is certainly Valgrind. It’s a multi-faced tool: timing profiler, memory profiler, race conditions detector, … it can do all this because it’s mainly a simulated CPU, which executes the code, keeping track of what happens, and replacing some basic functions like the allocation routines.
So this works pretty well, to a point; unfortunately it does not always work out that well because there are a few minor issues: it needs to know about all the instructions used in the programs (otherwise it raises SIGILL crashes), and right now it does not know about SSE4 on x86 and x86-64 processors. And it has to know about the places in the C library where it cannot correctly keep scores via a series of suppression files. Turns out it currently fails at properly knowing about the changes in glibc 2.10, which makes it pretty difficult to find the real issues right in this moment with the most bleeding edge tools available.
But beside these problems with the modern tools and hardware, there are other interesting edges; one of these, the one I’m going to write about today, relate to the way the memory checker tool identifies leaks in the software: all the memory areas that cannot be reached in some way at the end of the execution, are considered leaked. Unfortunately this often enough counts in the memory areas that are allocated at one point and would just be freed at the end of the process, which can be said not to be a leak at all.
All the memory areas that are only to be freed at the end of a process don’t really need to be freed at all: the operating system will take care of that; leaks are those where the memory is used up and never freed nor used during execution; especially bad are those leaks that happen inside either iterative cycles or upon similar inputs, because they build up.
Actually freeing up the resources that does not need to be freed up can increase the size of the executable section of a binary (executable or library) for no good reason during production usage of the software, it is thus often considered a bad idea. At the same time, freeing them up makes it easier to find the actual bad leaks in the software. My choice for this is to write the actual source code to do the freeing up, but only compile it when debug code is enabled, just like assertions and debug output.
This way, the production builds won’t have unneeded code, while the debug builds will give proper results out on tools like valgrind. But it doesn’t stop at freeing the resources at the end of the main() function, because sometimes you don’t preserve reference to the actual memory areas allocated during initialization (since they are not strictly needed), or you just keep them as static local variables, which are both reported as unreachable by the end of the process, and are not easy to free up. To work this out, in feng, I’ve been using a slightly more sophisticated solution, making use of a feature of recent GCC and ICC versions: destructor functions.
Destructor functions are called within the fini execution path, after the main() and before the process is unloaded and the resource freed automatically. Freeing the memory at that point is the ideal situation to make sure that all the resources are freed up properly. Of course, this reduces the cases when the code is enabled to having both debug system enabled, and a compiler that supports the destructor functions. But this is a broad enough definition to allow proper development. A simple check with autoconf and the thing is done.
By switching the local static variables with unit-static variables (which, once compiled, are basically the same thing), and adding a cleanup function, almost all memory can be freed without having an explicit deallocation function. There is still the matter of initializing these memory areas; in this case there are two options: either you just initialize them explicitly, or you initialize them the first time they are needed. Both these options allow for the size of the structure to be chosen at runtime (from a configuration file), for example for the number of configuration contexts to initialize. The third way, that is to initialize them with constructor functions, also works out well for fixed-size memory areas, but since I don’t really have a good reason for which they could be better than the alternative the to just call a function during initialization, I don’t have any reason to take them into consideration.
For dynamic initialization, the interface usually used is the “once” interface, which also allows for multithreaded applications to have run-once blocks of code without having to fiddle with mutexes and locks; this is implemented by the pthread_once_t handling functions from the POSIX thread interface, or GOnce as provided by glib. This allows delayed allocation and initialization of structures which is especially useful for parts of the code that are not used constantly but just in some specific cases.
All the code can be found in your local feng repository or the nearest git server!
CrawlBot Wars
Sunday, May 24th, 2009 | Opensource, Software Development, Web | Comments
Everybody who ever wanted to write a “successful website” (or more recently, thanks to the Web 2.0 hype, a “successful blog”) knows the bless and curse of crawlers, or bots, that are unleashed by all kind of entities to scan the web, and report the content back to their owners.
Most of these crawlers are handled by search engines, such as Google, Microsoft Live Search, Yahoo! and so on. With the widespread use of feeds, at least Google and Yahoo! added to their standard crawler bots also feed-specific crawlers that are used to aggregate blogs and other feeds into nice interfaces for their users (think Google Reader). Together with this kind of crawlers, though, there are less useful, sometimes nastier crawlers that either don’t respond to search engines, or respond to search engines whose ethical involvement makes somewhat wonder.
Good or bad, at the end of the day you might not want some bots to crawl your site; some Free Software -bigots- activists some time ago wanted, for instance, to exclude the Microsoft bot from their sites (while I have some other ideas), but there are certain bots that are even more useful to block, like the so-called “marketing bots”.
You might like Web 2.0 or you might not, but certainly lots of people found the new paradigm of Web as a gold mind to make more money out of content others have written – incidentally these are not, like RIAA, MPAA and SIAE insist, the “pirates” that copy music and movies, but rather companies whose objective is to provide other companies with marketing research and data based on content of blogs and similar services. While some people might be interested in getting their blog scanned by these crawlers either way, I’d guess that for most users who host their own blog this is just a waste of bandwidth: the crawlers tend to be quite pernicious since they don’t use If-Modified-Since or Etag headers in their request, and even when they do, they tend to make quite a few requests on the feeds per hour (compare this with Google’s Feedfetcher bot that requires at most one copy of the same feed per hour – well, if it isn’t confused by multiple compatibility redirects like it unfortunately is with my main blog).
While there is a voluntary exclusion protocol (represented by the omni-present robots.txt file), only actually “good” robots do consider that, while evil or rogue robots can simply ignore it. Also, it might be counter-productive to block rogue robots even when they do look at it. Say that a rogue robot wants your data, and to pass as a good one is advertising itself in the User-Agent string, complete with a link to a page explaining what it’s supposedly be doing, and accepting the exclusion. If you exclude it in robots.txt you can give it enough information to choose a _different_ User-Agent string that is not listed in the exclusion protocol.
One way to deal with the problem is by blocking the requests at the source, answering straight away with an HTTP 403 (Access Denied) on the web server when making a request. When using the Apache web server, the easiest way to do this is by using modsecurity and a blacklist rule for rogue robots, similar to the antispam system I’ve been using for a few months already. The one problem I see with this is that Apache’s mod_rewrite seem to be executed _before_ mod_security, which means that for any request that is rewritten by compatibility rules (moved, renamed, …) there is first a 301 response and just after that an actual 403.
I’m currently working on compiling such a blacklist by analysing the logs of my server, the main problem is deciding which crawlers to block and which to keep. When the description page explicitly states they are marketing research, blocking them is quite straightforward; when they _seem_ to provide an actual search service, that’s more shady, and it turns down to checking the behaviour of the bot itself on the site. And then there are the vulnerability scanners.
Still, it doesn’t stop here: given that in the Google description of GoogleBot they provide a (quite longish to be honest) method to verify that a bot is actually GoogleBot as it advertises itself to be, one has to assume that there are rogue bots out there trying to pass for GoogleBot or other good and lecit bot. This is very likely the case because some website that are usually visible only by registered users make an exception for search engine crawlers to access and index their content.
Especially malware, looking for backdoors into a web application, is likely to forge the User-Agent of a known good search engine bot (that is likely _not_ blocked by the robots.txt exclusion list), so that it doesn’t fire up any alarm in the logs. So finding “fake” search engine bots is likely to be an important step in securing a webserver running webapplications, may them be trusted or not.
As far as I know there is currently no way in Apache to check that a request actually does come from the bot it’s declared to come from. The nslookup method that Google suggests works fine for a forensic analysis but it’s almost impossible to perform properly with Apache itself, and not even modsecurity, by itself, can do much about that. On the other hand, there is one thing in the recent 2.5 versions of modsecurity that can be probably used to implement an actually working check: the LUA scripts loading. Which is what I’m going to work on as soon as I find some extra free time.
5 lines RSS reader
Tuesday, May 19th, 2009 | Software Development, Web | Comments
Recently while creating AXANT Labs we decided to put inside the page a little RSS aggregator which should mix news from our projects, at first we took a look at Planet, but it was a bit too big for our needing so we developed this short RSS feed reader using Universal Feed Parser. I’m sharing this as the sources are really compact and might be useful in other situations
import feedparser, operator, time
feeds = (”http://blog.axant.it/feed”, “http://www.lscube.org/rss.xml”)
feeds = map(lambda x : feedparser.parse(x).entries, feeds)
feeds = reduce(operator.concat, feeds)
feeds = sorted(feeds, lambda x,y : cmp(y.date_parsed, x.date_parsed))
for entry in feeds: print ‘%s (%s) -> %s’ % (entry.title, time.strftime(”%Y-%m-%d %H:%M”, entry.date_parsed), entry.description)
Autotools Come Home
Monday, May 18th, 2009 | Opensource, Software Development | Comments
With our experience as Gentoo developers, me and Luca have had experience with a wide range of build systems; while there are obvious degrees of goodness/badness in build system worlds, we express our preference for autotools over most of the custom build systems, and especially over cmake-based build systems, that seem to be high on the tide thanks to KDE in the last two years.
I have recently written my views on build systems: in which I explain why I dislike CMake and why I don’t mind it when it is replacing a very-custom bad build system. The one reason I gave for using CMake is the lack of support for Microsoft Visual C++ Compiler, which is needed by some type of projects under Windows (GCC still lacks way too many features); this starts to become a moot point.
Indeed if you look at the NEWS file for the latest release (unleashed yesterday) 1.11, there is this note:
- The `depcomp’ and `compile’ scripts now work with MSVC under MSYS.
This means that when running configure scripts under MSYS (which means having most of the POSIX/GNU tools available under the Windows terminal prompt), it’s possible to use the Microsoft compiler, thanks to the compile wrapper script. Of course this does not mean the features are on par with CMake yet, mostly because all the configure scripts I’ve seen up to now seem to expect GCC or compatible compilers, which means that it will require for more complex tests, and especially macro archives, to replace the Visual Studio project files. Also, CMake having a fairly standard way to handle options and extra dependencies, can have a GUI to select those, where autotools are still tremendously fragmented in that regard.
Additionally, one of the most-recreated and probably useless features, the Linux-style quiet-but-not-entirely build, is now implemented directly in automake through the silent-make option. While I don’t see much point in calling that a killer feature I’m sure there are people who are interested in seeing that.
While many people seem to think that autotools are dead and that they should disappear, there is actually fairly active development behind them, and the whole thing is likely going to progress and improve over the next months. Maybe I should find the time to try making the compile wrapper script work with Borland’s compiler too, of which I have a license; it would be one feature that CMake is missing.
At any rate, I’ll probably extend my autotools guide for automake 1.11, together with a few extras, in the next few days. And maybe I can continue my Autotools Mythbuster series that I’ve been writing on my blog for a while.
On Sprox
Thursday, May 7th, 2009 | Software Development, Web | Comments
Lately I have tried to use Sprox with Elixir.
First of all I have to thank percious. He is incredibly reliable and helpful. There is actually a bug in sprox that makes him threat one-to-many relationships as one-to-one relationships and makes it show a single selection field instead of a multiple selection field. This can be avoided changing the field type to sprox.widgets.PropertyMultipleSelectField but percious has been so kind to fix it on the fly while I was testing the problem for him and now sprox correctly detects the field type by default.
Bad enough there is a big problem with Elixir. As Sprox probably creates internal instances of the Entity you pass to him this causes an undesidered behaviour. When using SQLAlchemy, until you add the object to the session it won’t be saved on the database, but with Elixir creating an object means saving it on the db and this results in having multiple empty entities saved on db each time you open a forum generated with Sprox. If you have any required field in your entity your application will crash as it won’t be able to save it.
In the end I had to switch back from Elixir to DeclarativeBase for my application and everything worked fine