MongoDB Aggregation Pipeline and Pagination

One of the most common problems in web development is paginating a set of data. It’s common that the set of data we need to show our users is pretty big and we might want to show only a part of it retrieving the next slices only when requested. Skip/Limit Pagination The most common way to achieve this is usually to count the number of items and split them in pages, each of a fixed set of items. This is usually achieved through the limit, skip and count operators. For the purpose of this post I’ll be relying on a collection containing tickets for a project management tool, each document looks like: {u’_id': ObjectId(‘4eca327260fc00346500000f’), u’_sprint': ObjectId(‘4eca318460fc00346500000b’), u’complexity': u’days’, u’description': u’This […]

MongoDB and UnitOfWork love or hate?

One of the most common source of issues for MongoDB newcomers is the lack of transactions, people have been used to work with transactions for the past 10 years and probably their web frameworks automatically starts, commits and rolls back transactions for them whenever something wrong happens. So we are pretty used to web development environments where the problem of writing only a part of our changes is usually solved out of the box. When people first approach MongoDB I noticed that this behaviour is often took for granted and messed up data might arise from code that crashes while creating or updating entities on the database. No transaction, no party? To showcase the issue, I’ll try to came up […]

Redis and MongoDB insertion performance analysis

Recently we had to study a software where reads can be slow, but writes need to be as fast as possible. Starting from this requirement we thought about which one between redis and mongodb would better fit the problem. Redis should be the obvious choice as its simpler data structure should make it light-speed fast, and actually that is true, but we found a we interesting things that we would like to share. This first graph is about MongoDB Insertion vs Redis RPUSH. Up to 2000 entries the two are quite equivalent, then redis starts to get faster, usually twice as fast as mongodb. I expected this, and I have to say that antirez did a good job in thinking […]

Turbogears authentication over mongodb users database

As we saw that there isn’t a lot of documentation around about how to perform authentication in turbogears over mongodb we decided to create a simple code snippet and public it here to help people trying to obtain the same thing. This is mainly a proof of concept and is quick and dirty way to obtain it. You will probably have something like ming as your model, instead of directly accessing mongo. This code also validates password over the clear text one, you will probably have hashed passwords in your database, so remember to change validate_password method as required To make it work you will have to place this code inside your config.app_cfg, it also expects you to have you […]

Tweelter, the twitter filter

While speaking with the top-ix people during a meeting we started to talk about the need of a way to filter out “noise” from twitter searches. Probably everyone found that searching something on twitter returns a big list of retweets and duplicated tweets. As those reduce the ability to follow a discussion or an event on twitter they are usually more a problem than a useful result. At the end of that meeting Tweelter was born. Tweelter is a twitter search engine which filters out duplicated entries, retweets and permits to search results older than one month on most followed topics. More interesting thing is that tweelter performs those search in a parallel manner and on a distributed mongodb. While […]