Document Database | Robin Osborne

My third session was about MongoDB and how you might implement it in Azure, presented by MongoDB’s own Gregor Macadam (@gregormacadam).

I only had limited knowledge of what MongoDB was before this session (a document based data store, much like CouchDB and other NoSQL variants), so given that this session appeared to be an intro to MongoDB as opposed to MongoDB on Azure then that suited me just fine!

Here are the basic notes I made during Gregor’s talk (although you may as well just go to MongoDB.org and read the intro..):

MongoDB uses sharding for write throughput.
The REST interface uses JSON as the data transport format
Data is saved in BSON structure

The db structure (usually?) consists of three nodes; a single primary and two replicated secondary – these are referred to as a Replica Set.
A Replica Set has a single write node with async replicate to other set members, read from all

The write history (known as UpLog) is in the format "move from state A, to state B" so as to avoid overwriting changed data.

If write (to primary) fails, an automatic election determines which remainder is new primary; usually primary is the node with latest data.

It can be configured to write to multiple hosts, but the write won’t return until all writes are completed

An "arbiter" can be the tie breaker in determining the new primary node during election, and we can specify weighting for that process.

"Read" scales with more read nodes, "Write" scales with multiple read/write groups (replica sets) or sharding.

Need config database to define key ranges for sharding etc

MongoS process runs on another node and knows which shard to write your data to.

The updates are released on windows and Linux at same time

Within Azure

Data is persisted in blob storage
MongoDB runs in worker role
page blob is NTFS cloud drive (data drive?)

MongoS router process is required to load balance access to correct node, not the Azure load balancer; the Azure load balancer can end up sending the write request to a non-primary node.

OSdisk has caching enabled by default, data disk doesn’t

Code is Open Source and can be found on github and issues can be raised on the Mongo Jira site

You can sign up for a free Mongo Monitoring Service on 10gen

Main points that I took away from this is that it sounds like you need a large number of Azure VMs to get Mongo running; one for each node, one for each MongoS service, one for an arbiter (maybe more – I didn’t catch all of these details that were raised by a couple of good questions from the audience).

Although I have a big plan to use NoSQL for the front end of an ecommerce website, I don’t think that MongoDB’s Azure offering is mature enough yet. I’ll be looking into CouchDB and Raven initially and keeping an eye on MongoDB. (Interested in how to get Raven running on Azure? Wait for the next post!)

The slide deck from this session is here

Next up – node.js

Robin Osborne

Always learning more about Performance, Observability, DevOps, and Tech Leadership

Menu

Tag Archives: Document Database

MongoDB @ UKWAUG: MS Cloud Day – Windows Azure Spring Release

Within Azure