Moving A Cluster to MongoDB Enterprise with SSL

Porsche_356_CarreraWHY SSL

MongoDB’s SSL support allows MongoDB clients to talk to the database using encrypted connections for security. Now if you are trying to run  from a regular distribution of MongoDB, it probably will not work, because the fre version of MongoDB does not contain support for SSL. To use SSL, you must either build MongoDB yourself or buy MongoDB Enterprise.

What this blog post is about, how to move a cluster running MongoDB to MongoDB Enterprise with SSL, and a little background into what is going on with the MongoDB servers in the process.

More of an outline for getting started with SSL and assume that you have already installed a build of MongoDB that includes SSL support and that your client driver supports SSL.

There are two parts relevant to moving your cluster to SSL. The server side as the servers communicate with each other and the client side that send queries to the servers. In MongoDB 2.6, there is a new net.ssl.mode parameter that can ease the transition.

MIXED MODE

The net.ssl.mode parameter is new in version 2.6. There are four modes that ssl can operate using. The major difference is how the servers communicate between servers. One of the reason you may consider this is because of client drivers.


disabled No SSL encrypted connections
allowSSL Between servers do not use SSL. Otherwise accept both SSL and non-SSL.
preferSSL Between servers use SSL. Otherwise accept both SSL and non-SSL.
requireSSL Only SSL encrypted connections

 

MongoDB Servers

Servers can operate in three modes. The first is SSL encryption mode where everything is encrypted. The second is that clients have a cert from a certificate authority , which rules out self signed certificates. Finally, the server validates with a valid certificate or NO certificate. The last mode only fails if the client passes and invalid certificate.

To upgrade a cluster, you go through the three SSL modes. First you start the server nodes with all the nodes using allowSSL. Then using this command update the entire cluster to preferSSL

db.getSiblingDB('admin').runCommand( { setParameter: 1, sslMode: "preferSSL" } )

And finally , the last move to requireSSL, which blocks any non SSL nodes.

db.getSiblingDB('admin').runCommand( { setParameter: 1, sslMode: "requireSSL" } )

After this, update /etc/mongodb.conf to requireSSL so the settings will stay the same after a reboot

MongoDB SSL Clients

Now that the servers are running in SSL, lets look the MongoDB client.  All the mongo tools, (mongo, mongodump, mongoexport,mongofiles,mongoimport,mongooplog, mongorestore, mongostat, mongotop ) need to have SSL configured, in the same way as the shell. Since you will be upgrading your cluster, you need your shell configured first.

As an example with mongo ( that is just as valid with the other mongo utilities ), you would pass -ssl along with a .pem file.

mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem

If the server only cared about encryption, then passing -ssl would be fine

mongo --ssl

Not all client drivers support SSL connections. This is a pain, and another reason why you should use the official driver. I was using a C# driver that did not support SSL and when the requirement came along to use SSL , then a lot of re-factorings had to happen switching to the official driver which did support SSL.

A Note on FIPS

The United States government defines many (several hundred) “Federal Information Processing Standard” (FIPS) documents. One of the FIPS regulations, FIPS 140, governs the use of encryption and cryptographic services. FIPS Mode”, which is really “FIPS 140 Mode”. Both MongoDB Enterprise and MongoDB compiled with –ssl can operate in FIPS 140 Mode.


Building Fast, Scalable Multi Tenant Apps with MongoDB

landlord_and_tenantMULT TENANT

We are all building apps in the cloud. Accounting apps, to-do apps, word processing apps everything. With clients trusting you with their data, the question becomes how to keep their data safe and separate. Keeping your client’s data separate is the difference between living in a group commune and being a tenant in an apartment. This is called multi-tenancy.

Multi-tenancy refers to a principle in software architecture where a single instance of the runs on a server, serving multiple customers (tenants). This is a really important part of an important feature of cloud computing. This is important because in multi-tenant environment customers do not share or see each other’s data.

There are three main ways to build multi-tenant databases in MongoDB. The first is by putting all tenants in a single database. The second is putting each tenant in their own individual database, and finally, each tenant in its own collection.

PUTTING ALL TENANTS IN A SINGLE DATABASE

This is the most common form of multi-tenancy and where most web apps start. Putting all your tenants together is a lot simpler and we do not even think about calling it multi-tenancy. Putting all tenants in a single database requires putting the multi-tenancy logic into the application level. Enforcing security at the application level can be something as simple as placing an enforcing user or customer filters on all data queries, eg.  prefixing every database query with a user id.

For a “freemium” business, this will be a better model, since each MongoDB database occupies at least 32MB.  Creating hundreds of databases for hundreds of non paying customers can waste a lot of resources.

GIVING EACH TENANT A SEPARATE COLLECTION

This is probably the worst way for MongoDB for a couple of reasons. I won’t go into detail, but this is the method you really want to avoid.

First, Collections in the same database share the same database Lock. MongoDB concurrency has been steadily improving, but it is still there. Second, the default MongoDB nssize setting limits the number of collections in a database to 24,000. You can go up to 3 million by changing the nssize setting in the configuration.

GIVING EACH TENANT A SEPARATE DATABASE

The may be the best way depending on your app. Giving each tenant their own databases gives you flexibility in managing and optimizing your MongoDB setup. By having a separate database per customer, things like great for moving, managing and deleting client databases become trivial. Since each database is separate, you can create different indexes for different clients depending on their needs.

The downside to this is that each client takes space. If your clients are paying customers, this is not a problem. If you have a free service, then each client will use 32 MB of disk space which is quite a lot if you have a lot of inactive clients.

Even with multi-tenancy, it can be hard to pick a shard key. The hashed shard key in MongoDB can provide performance even at scale ( depending on your application )

WRAP UP

For most things, performance is application specific. Especially with MongoDB, and the advice here needs to be seen in the light of your application. As always, your mileage may vary. Wou are starting an app, for development, you can certainly use one large database while writing your app logic to support One database per tenant. You may not use this initially with your app, but by   putting multi database logic in your app code from the start will save you a lot of heartache if you have any sort of success. And one last thing. Shard Early Picking a shard key is something that is hard to change later

 


Installing Elastic Search with Docker

ImageDOCKER , THE LATEST COOLNESS

Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.

That is from the docker.io website. I plain language, it is the ability to package applications and run them easily on Linux.

Think of it as the ability to make applications that run as easily as desktop application on a Mac. They just run.

WHY ??

My biggest concern is why would I want to use something like Docker. At my last gig, I would just fire up a new VM and do whatever I wanted.  Either I would copy images or make puppet scripts to install and configure what I wanted.

So where does docker fit in all this ? Ease of use for the non sys admin. If you are a developer and you need a development environment, then docker is probably for you.

I this case, I want to try it with Elasticsearch. I REALLY feel installing docker for this is WAY more work than just installing Elasticsearch which I can do fairly quickly. The reason I am installing it is for the next application I want to test.

ON TO ELASTICSEARCH

Installing docker for Ubuntu go here , /install/ubuntulinux/  , the instructions are straight forward, so I won’t repeat them here. The only thing you have to be concerned about it running on Ubuntu 12.04. Docker runs best on a 3.8 kernel, so you will have to backport an older kernel. Ubuntu 13.10 has a 3.8 kernel, so you have that going for you.

Once you have docker installed, installing Elasticsearch from a trusted build is a single command

docker pull dockerfile/elasticsearch

In my next post, I will use Elasticsearch in a project I am working on to query Elasticsearch.


New Mongo DB Class : MongoDB Advanced Deployment and Operations

ImageNEW COURSE

Mongo DB is offering an advanced course for MongoDB DBA’s. This is a follow on to their 102 course for MongoDB DBA’s. The course has not started as I write this, but from the description, it seems to cover more real life use cases. if you have been a MongoDB DBA for a while, I am sure it will be boring, but for those of you who have not run MongoDB in production, it is probably a good course given the cost. Also, these courses are a good way to keep up with the changes to MongoDB.

HOW I USE THE COURSES

I have used the courses and I will continue to use the MongoDB classes. I started using MongoDB during the 1.6/1.8 days, when MongoDB was not nearly as polished. To keep up with all the changes, I use the courses as my continuing edication. I take a class and head straight to the quizzes and homework. Usually I just breeze through them without spending too much time. When I run into a quiz where I cannot answer, it is usually a new feature that has been added to MongoDB. Then I go through the videos, and “keep my tools sharp”.

Fork Me on Github

I JOINED THE CROWD

I finally have a public Github repo for my stuff. I do not know what I am going to put there exactly, but it will be MongoDB goodness. I have a lot of little tools I have written for various things. Some Javascript, some Python, some C# and some Bash. Even some straight C.

MONGODB ADMIN

Probably the best repo to start with it the MongoDB admin repo. That is probably where I will start adding things.

 

NoSQL Popularity – Comparing MongoDb, Riak, Redis and CouchDb

NoSQL POPULARITY

Starting the new year, I wanted to take a look at the most popular NoSQL databases as rated by searches on Google. The more people search for a database, the more popular it should be, given as unscientific as this kind of poll is.

The graph below is a popularity contest, it shows how many times users searched for the term. It doe not represent which database is better. I would never do a chart on which database was better, I would not want to start a religious war, and spend my day answering comments.

GOOGLE SEARCH POPULARITY

I used Google Zeitgeist to examine four popular NoSQL databases. MongoDb, Riak, Redis and CouchDb. ( Click on the image to see a larger image )

Jan2014-mongodb-riak-redis-couchdb

So what can we learn from this chart.

MongoDB – Strong and growing in popularity. Yeah ! Since the name of the blog is The Mongo DBA, you probably know I am biased.

Riak – I thought Riak was more popular, and I was surprised at the result. Que sera sera ( Which is french for ‘Whatever’ )

Redis – Has pretty much flat lined. I think Redis had it’s day, and we have reached ‘peak Redis’

CouchDb – CouchDb is declining, which is almost as surprising as the results from Riak.

RESULTS

If you are looking at a popular NoSQL database, look at Redis and MongoDb. The have more ‘mindshare’ and more visibility in the open source community.

This is my humble opinion, so take it with a grain of salt. Riak and CouchDB are written in Erlang and that may have something to do with the popularity of the database. It is a lot easier to gain open souce traction if hackers can us ‘C’ which was used to write Redis and MongoDb.

I did try to search for Cassandra, but Cassandra is a popular word that lots of non database results appear over reporting ‘Cassandra‘. If you look for Apache Cassandra, you miss a lot of posts, under reporting ‘Apache Cassandra

Scrum: How Sprint Retrospectives Can Really Improve Your Sprints

I am writing too much about Scrum in a Mongodb blog, but this is a really good post over at Loomio called Occupy Scrum: How Sprint Retrospectives Brought us to Agile Nirvana.

I do not agree with them 100%, but I do agree with them for the most part. And in particular, the retrospectives part. Too many times I have seen teams say they are using Scrum, but they are not learning and improving as much as they could in each scrum.

The real magic of Scrum is not just the Scrum techniques themselves, it’s that continuous process improvement is built in. You can’t do that without retrospectives.

Good stuff. &#0153

Occupy Scrum: How Sprint Retrospectives Brought us to Agile Nirvana

Another Week, Another Sprint

I am a big fan of Scrum. I think it is very easy to do Scrum, but very hard to do well.  If you are just starting in Scrum, take a look at http://www.scrumprimer.org/scrumprimer20_small.pdf

This project I am doing 1 week sprints, Monday to Sunday, with the weekends for integration. Just a habit I have had for a long time. My sprints would be one week, and then the weekends were for integration and planning the next weeks sprint.

This weeks sprint is

[1] Get the first Android app up and running for the company that shall remain nameless.

[2] Get the API in  respectable shape

Study: A Simple Surgery Checklist Saves Lives

Sticks and stones may break your bones — but if you need surgery, the right words used in the operating room can be more powerful than many drugs. New research published today in the New England Journal of Medicine found that when surgical teams…
Study: A Simple Surgery Checklist Saves Lives

Using the apiary.io Mock Server

For agile development in the API space, I think a mock server is essiential. When you are revving quickly, waiting for others to update the API to test can be a real pain.

With a mock server, you can rev the API and keep development running. Without one, you start getting into dependancy issues.

Our goal here is to keep the Scrum sprints on schedule and that leads me to apiary.io. They have a mock server when you define the API. It kills two birds with one stone in that in writing the documentation, you are also writing the mock server.

Will this work ? I don’t know this is the first time I am using their service, but time will tell by the end of this sprint.