Redis and MongoDB insertion performance analysis

Recently we had to study a software where reads can be slow, but writes need to be as fast as possible. Starting from this requirement we thought about which one between redis and mongodb would better fit the problem. Redis should be the obvious choice as its simpler data structure should make it light-speed fast, and actually that is true, but we found a we interesting things that we would like to share.

This first graph is about MongoDB Insertion vs Redis RPUSH.
Up to 2000 entries the two are quite equivalent, then redis starts to get faster, usually twice as fast as mongodb. I expected this, and I have to say that antirez did a good job in thinking the redis paradigm, in some situations it is the perfect match solution.
Anyway I would expect mongodb to be even slower relating to the features that a mongodb collection has over a simple list.

This second graph is about Redis RPUSH vs Mongo $PUSH vs Mongo insert, and I find this graph to be really interesting.
Up to 5000 entries mongodb $push is faster even when compared to Redis RPUSH, then it becames incredibly slow, probably the mongodb array type has linear insertion time and so it becomes slower and slower. mongodb might gain a bit of performances by exposing a constant time insertion list type, but even with the linear time array type (which can guarantee constant time look-up) it has its applications for small sets of data.

I would like to say that this benchmarks have no real value, as usual, and have been performed just for curiosity

You can find here the three benchmarks snippets

import redis, time
MAX_NUMS = 1000
 
r = redis.Redis(host='localhost', port=6379, db=0)
del r['list']
 
nums = range(0, MAX_NUMS)
clock_start = time.clock()
time_start = time.time()
for i in nums:
    r.rpush('list', i)
time_end = time.time()
clock_end = time.clock()
 
print 'TOTAL CLOCK', clock_end-clock_start
print 'TOTAL TIME', time_end-time_start
import pymongo, time
MAX_NUMS = 1000
 
con = pymongo.Connection()
db = con.test_db
db.testcol.remove({})
db.testlist.remove({})
 
nums = range(0, MAX_NUMS)
clock_start = time.clock()
time_start = time.time()
for i in nums:
    db.testlist.insert({'v':i})
time_end = time.time()
clock_end = time.clock()
 
print 'TOTAL CLOCK', clock_end-clock_start
print 'TOTAL TIME', time_end-time_start
import pymongo, time
MAX_NUMS = 1000
 
con = pymongo.Connection()
db = con.test_db
db.testcol.remove({})
db.testlist.remove({})
oid = db.testcol.insert({'name':'list'})
 
nums = range(0, MAX_NUMS)
clock_start = time.clock()
time_start = time.time()
for i in nums:
    db.testcol.update({'_id':oid}, {'$push':{'values':i}})
time_end = time.time()
clock_end = time.clock()
 
print 'TOTAL CLOCK', clock_end-clock_start
print 'TOTAL TIME', time_end-time_start

13 thoughts on “Redis and MongoDB insertion performance analysis

  1. Your graphs are a bit misleading.
    They suggest exponential growth.

    If you use an exponential horizontal axis, you should use an exponential vertical axis.

    Currently, the graphs suggest neither system scales at all. In reality, (when we look at your numbers), the relationship seems to be linear. And the difference between redis and mongodb seems to be a low constant factor.

    That is: not something that affects scalability at all.

  2. Hello! I expect the difference to be *much* higher if the test is run against a big dataset with many concurrent writes. In this condition I think the difference for LPUSH will be an order of magnitude, but I would like to see it in the practice. If you see this difference in *latency* (with a single client this is what you are really measuring) the difference in with real parallel load and big datasets should be much more.

    Btw thanks for the test.

  3. Pawel this seems like a good idea at first, but it's not so easy. For instance the redis benchmark figures are obtained using client and server in the same computer so actually redis is faster than the 130k ops/sec we claim, but how do you get a so fast link as loopback easily between two separated boxes?

  4. As I know redis is slower over the net than mongo. Today I'll doing replication and performance tests with mongo (5 / 10 / 50 / 200 / 500 connections) over the 1gbps link to 2 pair of servers.
    First pair (each has): 2x quad core , 12GB ram and 6x 143GB SAS (raid 10)
    Second pair (each has): 2x quad core, 12GB ram and 4x 50GB SSD (raid 10).
    I write about results if you like.

  5. My tests are due to the use case of having a distribuited queue where to put data to be processed later, so I needed writes to be as fast as possible and I didn't want to use an AMQP server as I have only 1 consumer but 8-12 producers. The data set will never get huge as entries will be processed every hour and discarded, anyway it will probably be bigger than 2000 entries, so redis seems for perform better for my usecase.

    Something that I found interesting has been the clock time.
    I didn't report it as you are usually interested in real time spent to perform the action, but redis client clock time is usually a lot higher than the mongodb one.

    I didn't have time to check if it is due to python redis bindings or due to the redis protocol itself being more complicated than the mongo one to parse, so I just report it here for curiosity.

  6. For insert/updates, Mongo is one-way (no ack is sent by Mongo to the client) while Redis is two-ways (an ack is sent by Redis for each command). Your code looks purely synchronous, so for the Redis test, you measure the latency of your network (or the loopback), rather than the performance of Redis. To get good performance from Redis and really compare to Mongo, you need to get asynchronous, or pipeline your commands (most Redis clients support the later).

  7. For insert/updates, Mongo is one-way (no ack is sent by Mongo to the client) while Redis is two-ways (an ack is sent by Redis for each command). Your code looks purely synchronous, so for the Redis test, you measure the latency of your network (or the loopback), rather than the performance of Redis. To get good performance from Redis and really compare to Mongo, you need to get asynchronous, or pipeline your commands (most Redis clients support the later).

Leave a Reply

Your email address will not be published. Required fields are marked *