Scripting a StatsD, MongoDB, ElasticSearch metrics server on Azure with Powershell

At Mailcloud I am constantly destroying and rebuilding environments (intentionally!), especially the performance test ones. I also need to gather oodles of metrics from these tests, and I have a simple script to create the VM and another to install all of the tools I need.

This article will cover using a simple Powershell script to create the Linux VM and then a slightly less simple bash script to install all the goodies. It’s not as complicated as it may look, and considering you can get something running in a matter of minutes from a couple of small scripts I think it’s pretty cool!

Stage 1, Creating the VM using Powershell

Set up the variables

[powershell]
# os configuration
# Get-AzureVMImage | Select ImageName
# I wanted an ubuntu 14.10 VM:
$ImageName = "b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_10-amd64-server-20140625-alpha1-en-us-30GB"

# account configuration
$ServiceName = "your-service-here"
$SubscriptionName= "your azure subscription name here"
$StorageAccount = "your storage account name here"
$Location = "your location here"

# vm configuration – setting up ssh keys is better, username/pwd is easier.
$user = "username"
$pwd = "p@ssword"

# ports
## ssh
$SSHPort = 53401 #set something specific for ssh else powershell generates a random one

## statsd
$StatsDInputPort = 1234
$StatsDAdminPort = 5678

## elasticsearch
$ElasticSearchPort = 12345
[/powershell]

Get your Azure subscription info

[powershell]
Set-AzureSubscription -SubscriptionName $SubscriptionName `
-CurrentStorageAccountName $StorageAccount

Select-AzureSubscription -SubscriptionName $SubscriptionName
[/powershell]

Create the VM

Change “Small” to one of the other valid instance sizes if you need to.

[powershell]
New-AzureVMConfig -Name $ServiceName -InstanceSize Small -ImageName $ImageName `

| Add-AzureProvisioningConfig –Linux -LinuxUser $user –Password $pwd -NoSSHEndpoint `

| New-AzureVM –ServiceName $ServiceName -Location $Location
[/powershell]

Open the required ports and map them

[powershell]
Get-AzureVM -ServiceName $ServiceName -Name $ServiceName `
| Add-AzureEndpoint -Name "SSH" -LocalPort 22 -PublicPort $SSHPort -Protocol tcp `

| Add-AzureEndpoint -Name "StatsDInput" -LocalPort 8125 -PublicPort $StatsDInputPort -Protocol udp `

| Add-AzureEndpoint -Name "StatsDAdmin" -LocalPort 8126 -PublicPort $StatsDAdminPort -Protocol udp `

| Add-AzureEndpoint -Name "ElasticSearch" -LocalPort 9200 -PublicPort $ElasticSearchPort -Protocol tcp `

| Update-AzureVM

Write-Host "now run: ssh $serviceName.cloudapp.net -p $SSHPort -l $user"
[/powershell]

The whole script is in a gist here

Stage 1 complete

Now we have a shiny new VM running up in Azure, so let’s configure it for gathering metrics using a bash script.

Stage 2, Installing the metrics software

You could probably have the powershell script automatically upload and execute this, but it’s no big deal to SSH in, “sudo nano/vi” a new file, paste it in, chmod, and execute the below.

Set up the prerequisites

[bash]
# Prerequisites
echo "#### Starting"
echo "#### apt-get updating and installing prereqs"
sudo apt-get update
sudo apt-get install screen libexpat1-dev libicu-dev git build-essential curl -y
[/bash]

Install nodejs

[bash]
# Node
echo "#### Installing node"
. ~/.bashrc
export "PATH=$HOME/local/bin:$PATH"
mkdir $HOME/local
mkdir $HOME/node-latest-install

pushd $HOME/node-latest-install
curl http://nodejs.org/dist/node-latest.tar.gz | tar xz -strip-components=1
./configure -prefix=~/local
make install
popd

## the path isn’t always correct, so set up a symlink
sudo ln -s /usr/bin/nodejs /usr/bin/node

## nodemon
echo "#### npming nodemon"
sudo apt-get install npm -y
sudo npm install -g nodemon

[/bash]

Add StatsD

Here I’ve configured it to use mongo-statsd-backend as the only backend and not graphite. Configuring Graphite is a PAIN as you have to set up python and a web server and deal with all the permissions, etc. Gah.

[bash]
# StatsD
echo "#### installing statsd"
pushd /opt
sudo git clone https://github.com/etsy/statsd.git
cat >> /tmp/localConfig.js << EOF
{
port: 8125
, dumpMessages: true
, debug: true
, mongoHost: ‘localhost’
, mongoPort: 27017
, mongoMax: 2160
, mongoPrefix: true
, mongoName: ‘statsD’
, backends: [‘/opt/statsd/mongo-statsd-backend/lib/index.js’]
}
EOF

sudo cp /tmp/localConfig.js /opt/statsd/localConfig.js
popd
[/bash]

Mongo and a patched mongo-statd-backend

You could use npm to install mongo-statsd-backend, but that version has a few pending pull requests to patch a couple of issues that mean it doesn’t work out of the box. As such, I use my own patched version and install from source.

[bash]
# MongoDB
echo "#### installing mongodb"
sudo apt-key adv -keyserver hkp://keyserver.ubuntu.com:80 -recv 7F0CEB10
echo ‘deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen’ | sudo tee /etc/apt/sources.list.d/mongodb.list
sudo apt-get update && sudo apt-get install mongodb-org -y
sudo service mongod start
cd /opt/statsd

## Mongo Statsd backend – mongo-statsd-backend
## the version on npm has issues; use a patched version on github instead:
sudo git clone https://github.com/rposbo/mongo-statsd-backend.git
cd mongo-statsd-backend
sudo npm install
[/bash]

Ready to start?

Let’s kick off a screen

[bash]
# Start StatsD
screen nodemon /opt/statsd/stats.js /opt/statsd/localConfig.js
[/bash]

Fancy getting ElasticSearch in there too?

To pull down and install the java runtime, install ES and the es-head, kopf, and bigdesk plugins, add the below script just before you kick off the “screen” command.


# ElasticSearch
echo "#### installing elasticsearch"
sudo apt-get update && sudo apt-get install default-jre default-jdk -y
wget https://download.elasticsearch.org/
elasticsearch/elasticsearch/elasticsearch-1.1.1.deb && sudo dpkg -i elasticsearch-1.1.1.deb
sudo update-rc.d elasticsearch defaults 95 10
sudo /etc/init.d/elasticsearch start

## Elasticsearch plugins
sudo /usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head
sudo /usr/share/elasticsearch/bin/plugin -install lukas-vlcek/bigdesk

You can now browse to /_plugin/bigdesk (or the others) on the public $ElasticSearchPort port you configured in the powershell script to see your various ES web interfaces.

The whole script is in a gist here.

Stage 2 complete

I use StatsD to calculate a few bits of info around the processing of common tasks, in order to find those with max figures that are several standard deviations away from the average and highlight them as possible areas of concern.

I have an Azure Worker Role to pull azure diagnostics from table and blob storage and spew it into the Elasticsearch instance for easier searching; still figuring out how to get it looking pretty in a Grafana instance though – I’ll get there eventually.

Setting up an Ubuntu development VM: Scripted

Having seen this blog post about setting up a development Linux VM in a recent Morning Brew, I had to have a shot at doing it all in a script instead, since it looked like an awful lot of hard work to do it manually.

The post I read covers downloading and installing VirtualBox (which could be scripted also, using the amazing Chocolatey) and then installing Ubuntu, logging in to the VM, downloading and installing Chrome, SublimeText2, MonogDB, Robomongo, NodeJs, NPM, nodemon, and mocha.

Since all of this can be handled via apt-get and a few other cunning configs, here’s my attempt using Vagrant. Firstly, vagrant init a directory, then paste the following into the Vagrantfile:

Vagrantfile

[bash]
Vagrant.configure(2) do |config|

config.vm.box = "precise32"
config.vm.box_url = "http://files.vagrantup.com/precise32.box"

end
[/bash]

Setup script

Now create new file in the same dir as the Vagrantfile (since this directory is automatically configured as a shared folder, saving you ONE ENTIRE LINE OF CONFIGURATION), calling it something like set_me_up.sh. I apologise for the constant abuse of > /dev/null – I just liked having a clear screen sometimes..:

[bash]#!/bin/sh

clear
echo "******************************************************************************"
echo "Don’t go anywhere – I’m going to need your input shortly.."
read -p "[Enter to continue]"

### Set up dependencies
# Configure sources & repos
echo "** Updating apt-get"
sudo apt-get update -y > /dev/null

echo "** Installing prerequisites"
sudo apt-get install libexpat1-dev libicu-dev git build-essential curl software-properties-common python-software-properties -y > /dev/null

### deal with intereactive stuff first
## needs someone to hit "enter"
echo "** Adding a new repo ref – hit Enter"
sudo add-apt-repository ppa:webupd8team/sublime-text-2

echo "** Creating a new user; enter some details"
## needs someone to enter user details
sudo adduser developer

echo "******************************************************************************"
echo "OK! All done, now it’s the unattended stuff. Go make coffee. Bring me one too."
read -p "[Enter to continue]"

### Now the unattended stuff can kick off
# For mongo db – http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
echo "** More prerequisites for mongo and chrome"
sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv 7F0CEB10 > /dev/null
sudo sh -c ‘echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | sudo tee /etc/apt/sources.list.d/mongodb.list’ > /dev/null
# For chrome – http://ubuntuforums.org/showthread.php?t=1351541
wget -q -O – https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add –

echo "** Updating apt-get again"
sudo apt-get update -y > /dev/null

## Go, go, gadget installations!
# chrome
echo "** Installing Chrome"
sudo apt-get install google-chrome-stable -y > /dev/null

# sublime-text
echo "** Installing sublimetext"
sudo apt-get install sublime-text -y > /dev/null

# mongo-db
echo "** Installing mongodb"
sudo apt-get install mongodb-10gen -y > /dev/null

# desktop!
echo "** Installing ubuntu-desktop"
sudo apt-get install ubuntu-desktop -y > /dev/null

# node – the right(?) way!
# http://www.joyent.com/blog/installing-node-and-npm
# https://gist.github.com/isaacs/579814

echo "** Installing node"
echo ‘export "PATH=$HOME/local/bin:$PATH"’ >> ~/.bashrc
. ~/.bashrc
mkdir ~/local
mkdir ~/node-latest-install
cd ~/node-latest-install
curl http://nodejs.org/dist/node-latest.tar.gz | tar xz –strip-components=1
./configure –prefix=~/local
make install

# other node goodies
sudo npm install nodemon > /dev/null
sudo npm install mocha > /dev/null

## shutdown message (need to start from VBox now we have a desktop env)
echo "******************************************************************************"
echo "**** All good – now quitting. Run *vagrant halt* then restart from VBox to go to desktop ****"
read -p "[Enter to shutdown]"
sudo shutdown 0
[/bash]

The gist is here, should you want to fork and edit it.

You can now open a prompt in that directory and run
[bash]
vagrant up && vagrant ssh
[/bash]
which will provision your VM and ssh into it. Once connected, just execute the script by running:
[bash]
. /vagrant/set_me_up.sh
[/bash]

(/vagrant is the shared directory created for you by default)

Nitty Gritty

Let’s break this up a bit. First up, I decided to group together all of the apt-get configuration so I didn’t need to keep calling apt-get update after each reconfiguration:

[bash]
# Configure sources & repos
echo "** Updating apt-get"
sudo apt-get update -y > /dev/null

echo "** Installing prerequisites"
sudo apt-get install libexpat1-dev libicu-dev git build-essential curl software-properties-common python-software-properties -y > /dev/null

### deal with intereactive stuff first
## needs someone to hit "enter"
echo "** Adding a new repo ref – hit Enter"
sudo add-apt-repository ppa:webupd8team/sublime-text-2
[/bash]

Then I decided to set up a new user, since you will be left with either the vagrant user or a guest user once this script has completed; and the vagrant one doesn’t have a desktop/home nicely configured for it. So let’s create our own one right now:

[bash]
echo "** Creating a new user; enter some details"
## needs someone to enter user details
sudo adduser developer

echo "******************************************************************************"
echo "OK! All done, now it’s the unattended stuff. Go make coffee. Bring me one too."
read -p "[Enter to continue]"
[/bash]

Ok, now the interactive stuff is done, let’s get down to the installation guts:

[bash]
### Now the unattended stuff can kick off
# For mongo db – http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
echo "** More prerequisites for mongo and chrome"
sudo apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv 7F0CEB10 > /dev/null
sudo sh -c ‘echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | sudo tee /etc/apt/sources.list.d/mongodb.list’ > /dev/null
# For chrome – http://ubuntuforums.org/showthread.php?t=1351541
wget -q -O – https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add –

echo "** Updating apt-get again"
sudo apt-get update -y > /dev/null
[/bash]

Notice the URLs in there referencing where I found out the details for each section.

The only reason these config sections are not at the top with the others is that they can take a WHILE and I don’t want the user to have to wait too long before creating a user and being told they can go away. Now we’re all configured, let’s get installing!

[bash]
## Go, go, gadget installations!
# chrome
echo "** Installing Chrome"
sudo apt-get install google-chrome-stable -y > /dev/null

# sublime-text
echo "** Installing sublimetext"
sudo apt-get install sublime-text -y > /dev/null

# mongo-db
echo "** Installing mongodb"
sudo apt-get install mongodb-10gen -y > /dev/null

# desktop!
echo "** Installing ubuntu-desktop"
sudo apt-get install ubuntu-desktop -y > /dev/null
[/bash]

Pretty easy so far, right? ‘Course it is. Now let’s install nodejs on linux the – apparently – correct way. Well it works better than compiling from source or apt-getting it.

[bash]
# node – the right(?) way!
# http://www.joyent.com/blog/installing-node-and-npm
# https://gist.github.com/isaacs/579814

echo "** Installing node"
echo ‘export "PATH=$HOME/local/bin:$PATH"’ >> ~/.bashrc
. ~/.bashrc
mkdir ~/local
mkdir ~/node-latest-install
cd ~/node-latest-install
curl http://nodejs.org/dist/node-latest.tar.gz | tar xz –strip-components=1
./configure –prefix=~/local
make install
[/bash]

Now let’s finish up with a couple of nodey lovelies:
[bash]
# other node goodies
sudo npm install nodemon > /dev/null
sudo npm install mocha > /dev/null
[/bash]

All done! Then it’s just a case of vagrant halting the VM and restarting from Virtualbox (or edit the Vagrantfile to include a line about booting to GUI); you’ll be booted into an Ubuntu desktop login. Use the newly created user to log in and BEHOLD THE AWE.

Enough EPICNESS, now the FAIL…

Robomongo Fail 🙁

The original post also installs Robomongo for mongodb administration, but I just couldn’t get that running from a script. Booo! Here’s the script that should have worked; please have a crack and try to sort it out! qt5 fails to install for me which then causes everything else to bomb out.

[bash]
# robomongo
INSTALL_DIR=$HOME/opt
TEMP_DIR=$HOME/tmp

# doesn’t work
sudo apt-get install -y git qt5-default qt5-qmake scons cmake

# Get the source code from Git. Perform a shallow clone to reduce download time.
mkdir -p $TEMP_DIR
cd $TEMP_DIR
sudo git clone –depth 1 https://github.com/paralect/robomongo.git

# Compile the source.
sudo mkdir -p robomongo/target
cd robomongo/target
sudo cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR
make
make install

# As of the time of this writing, the Robomongo makefile doesn’t actually
# install into the specified install prefix, so we have to install it manually.
mkdir -p $INSTALL_DIR
mv install $INSTALL_DIR/robomongo
mkdir -p $HOME/bin
ln -s $INSTALL_DIR/robomongo/bin/robomongo.sh $HOME/bin/robomongo

# Clean up.
rm -rf $TEMP_DIR/robomongo
[/bash]

Not only is there the gist, but the whole shebang is over on github too.

ENJOOOYYYYY!

MongoDB @ UKWAUG: MS Cloud Day – Windows Azure Spring Release

My third session was about MongoDB and how you might implement it in Azure, presented by MongoDB’s own Gregor Macadam (@gregormacadam).

I only had limited knowledge of what MongoDB was before this session (a document based data store, much like CouchDB and other NoSQL variants), so given that this session appeared to be an intro to MongoDB as opposed to MongoDB on Azure then that suited me just fine!

Here are the basic notes I made during Gregor’s talk (although you may as well just go to MongoDB.org and read the  intro..):

MongoDB uses sharding for write throughput.
The REST interface uses JSON as the data transport format
Data is saved in BSON structure

The db structure (usually?) consists of three nodes; a single primary and two replicated secondary – these are referred to as a Replica Set.
A Replica Set has a single write node with async replicate to other set members, read from all

The write history (known as UpLog) is in the format "move from state A, to state B" so as to avoid overwriting changed data.

If write (to primary) fails, an automatic election determines which remainder is new primary; usually primary is the node with latest data.

It can be configured to write to multiple hosts, but the write won’t return until all writes are completed

An "arbiter" can be the tie breaker in determining the new primary node during election, and we can specify weighting for that process.

"Read" scales with more read nodes, "Write" scales with multiple read/write groups (replica sets) or sharding.

Need config database to define key ranges for sharding etc

MongoS process runs on another node and knows which shard to write your data to.

The updates are released on windows and Linux at same time

Within Azure

Data is persisted in blob storage
MongoDB runs in worker role
page blob is NTFS cloud drive (data drive?)

MongoS router process is required to load balance access to correct node, not the Azure load balancer; the Azure load balancer can end up sending the write request to a non-primary node.

OSdisk has caching enabled by default, data disk doesn’t

Code is Open Source and can be found on github and issues can be raised on the Mongo Jira site

You can sign up for a free Mongo Monitoring Service on 10gen

Main points that I took away from this is that it sounds like you need a large number of Azure VMs to get Mongo running; one for each node, one for each MongoS service, one for an arbiter (maybe more – I didn’t catch all of these details that were raised by a couple of good questions from the audience).

Although I have a big plan to use NoSQL for the front end of an ecommerce website, I don’t think that MongoDB’s Azure offering is mature enough yet. I’ll be looking into CouchDB and Raven initially and keeping an eye on MongoDB. (Interested in how to get Raven running on Azure? Wait for the next post!)

The slide deck from this session is here

Next up – node.js