Building Fast, Scalable Multi Tenant Apps with MongoDB

landlord_and_tenantMULT TENANT

We are all building apps in the cloud. Accounting apps, to-do apps, word processing apps everything. With clients trusting you with their data, the question becomes how to keep their data safe and separate. Keeping your clients data separate is the difference between living in a group commune and being a tenant in an apartment. This is called multi-tenancy.

Multi-tenancy refers to a principle in software architecture where a single instance of the runs on a server, serving multiple customers (tenants). This is a really important part of important feature of cloud computing. This is important because in multi tenant environment customers do not share or see each other’s data.

There are three main ways to build multi tenant databases in MongoDB. The first is by putting all tenants in a single database. The second is putting each tenant in their own individual database, and finally, each tenant in its own collection.

PUTTING ALL TENANTS IN A SINGLE DATABASE

This is the most common form of multi-tenancy and where most web apps start. Putting all your tenants together is a lot simpler and we do not even think about calling it multi-tenancy. Putting all tenants in a single database requires putting the multi tenancy logic into the application level. Enforcing security at the application level can be something as simple as place a  enforcing user or customer  filters on all data queries, eg.  prefixing every database query with a user id.

For a “freemium” business, this will be a better model, since each MongoDB database occupies at least 32MB.  Creating hundreds of databases for hundreds of non paying customers can waste a lot of resources.

GIVING EACH TENANT A SEPARATE COLLECTION

This is probably the worst way for MongoDB for a couple of reasons. I won’t go into detail, but this is the method you really want to avoid.

First, Collections in the same database share the same database Lock. MongoDB concurrency has been steadily improving, but it is still there. Second, the default MongoDB nssize setting limits the number of collections in a database to 24,000. You can go up to 3 million by changing the nssize setting in the configuration.

GIVING EACH TENANT A SEPARATE DATABASE

The may be the best way depending on your app. Giving each tenant their own databases gives you flexibility in managing and optimising your MongoDB setup. By having a separate database per customer, things like great for moving , managing and deleting client databases become trivial. Since each database is separate, you can create different indexes for different clients depending on their needs.

The downside to this is that each client takes space. If you clients are paying customers, this is not a problem. If you have a free service, then each client will use 32 MB of disk space which is quite a lot if you have a lot of inactive clients.

Even with multi tenancy, it can be hard to pick a shard key. The hashed shard key in MongoDB can provide performance even at scale ( depending on your application )

WRAP UP

For most things, performance is application specific. Especially with MongoDB, and the advice here needs to be seen in the light of your application. As always, your mileage may vary. I you are starting an app, for development, you can certainly use one large database, while writing you app logic to support One database per tenant. You may not use this initially with your app, but by   putting multi database logic in your app code from the start will save you a lot of heartache if you have any sort of success. And one last thing. Shard Early Picking a shard key is something that is hard to change later

 


Advertisements

NoSQL Popularity – Comparing MongoDb, Riak, Redis and CouchDb

NoSQL POPULARITY

Starting the new year, I wanted to take a look at the most popular NoSQL databases as rated by searches on Google. The more people search for a database, the more popular it should be, given as unscientific as this kind of poll is.

The graph below is a popularity contest, it shows how many times users searched for the term. It doe not represent which database is better. I would never do a chart on which database was better, I would not want to start a religious war, and spend my day answering comments.

GOOGLE SEARCH POPULARITY

I used Google Zeitgeist to examine four popular NoSQL databases. MongoDb, Riak, Redis and CouchDb. ( Click on the image to see a larger image )

Jan2014-mongodb-riak-redis-couchdb

So what can we learn from this chart.

MongoDB – Strong and growing in popularity. Yeah ! Since the name of the blog is The Mongo DBA, you probably know I am biased.

Riak – I thought Riak was more popular, and I was surprised at the result. Que sera sera ( Which is french for ‘Whatever’ )

Redis – Has pretty much flat lined. I think Redis had it’s day, and we have reached ‘peak Redis’

CouchDb – CouchDb is declining, which is almost as surprising as the results from Riak.

RESULTS

If you are looking at a popular NoSQL database, look at Redis and MongoDb. The have more ‘mindshare’ and more visibility in the open source community.

This is my humble opinion, so take it with a grain of salt. Riak and CouchDB are written in Erlang and that may have something to do with the popularity of the database. It is a lot easier to gain open souce traction if hackers can us ‘C’ which was used to write Redis and MongoDb.

I did try to search for Cassandra, but Cassandra is a popular word that lots of non database results appear over reporting ‘Cassandra‘. If you look for Apache Cassandra, you miss a lot of posts, under reporting ‘Apache Cassandra