Wednesday, November 20, 2013

Javascript testing with Real Browsers


karma run --runner-port 9100
PhantomJS 1.4 (Linux): Executed 1 of 1 SUCCESS (0.397 secs / 0.071 secs)
Chrome 30.0 (Linux): Executed 1 of 1 SUCCESS (0.518 secs / 0.06 secs)
TOTAL: 2 SUCCESS

Sunday, October 20, 2013

Chef server internal error (11.08)

Tried the new version of chef-server 11.08 and looks like is broken. There is a bug into the jira CHEF-4339. I tried onto CentOS but looks like Ubuntu is broken as well (see bug description). How to see the error logs
$ chef-server-ctl tail

==> /var/log/chef-server/nginx/access.log <==
192.168.122.1 - - [20/Oct/2013:14:56:42 +0000]  "PUT /sandboxes/000000000000a38d5dd8e2763f913c6c HTTP/1.1" 500 "8.109" 36 "-" "Chef Knife/11.6.0 (ruby-1.9.3-p429; ohai-6.18.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "500" "8.049" "11.6.0" "algorithm=sha1;version=1.0;" "chef-user" "2013-10-20T14:54:00Z" "oMRtV6loUDnbKJuGcW6nqBbF8ww=" 1029

==> /var/log/chef-server/erchef/current <==
2013-10-20_14:56:42.62140 
2013-10-20_14:56:42.62144 =ERROR REPORT==== 20-Oct-2013::14:56:42 ===
2013-10-20_14:56:42.62145 webmachine error: path="/sandboxes/000000000000a38d5dd8e2763f913c6c"
2013-10-20_14:56:42.62145 {error,
2013-10-20_14:56:42.62146     {throw,
2013-10-20_14:56:42.62146         {checksum_check_error,26},
2013-10-20_14:56:42.62146         [{chef_wm_named_sandbox,validate_checksums_uploaded,2,
2013-10-20_14:56:42.62147              [{file,"src/chef_wm_named_sandbox.erl"},{line,144}]},
2013-10-20_14:56:42.62147          {chef_wm_named_sandbox,from_json,2,
2013-10-20_14:56:42.62148              [{file,"src/chef_wm_named_sandbox.erl"},{line,99}]},
2013-10-20_14:56:42.62148          {webmachine_resource,resource_call,3,
2013-10-20_14:56:42.62148              [{file,"src/webmachine_resource.erl"},{line,166}]},
2013-10-20_14:56:42.62149          {webmachine_resource,do,3,
2013-10-20_14:56:42.62149              [{file,"src/webmachine_resource.erl"},{line,125}]},
2013-10-20_14:56:42.62150          {webmachine_decision_core,resource_call,1,
2013-10-20_14:56:42.62150              [{file,"src/webmachine_decision_core.erl"},{line,48}]},
2013-10-20_14:56:42.62150          {webmachine_decision_core,accept_helper,0,
2013-10-20_14:56:42.62151              [{file,"src/webmachine_decision_core.erl"},{line,583}]},
2013-10-20_14:56:42.62151          {webmachine_decision_core,decision,1,
2013-10-20_14:56:42.62151              [{file,"src/webmachine_decision_core.erl"},{line,489}]},
2013-10-20_14:56:42.62152          {webmachine_decision_core,handle_request,2,
2013-10-20_14:56:42.62153              [{file,"src/webmachine_decision_core.erl"},{line,33}]}]}}

==> /var/log/chef-server/erchef/erchef.log.1 <==
2013-10-20T14:56:42Z erchef@127.0.0.1 ERR req_id=rOkhxZcSowyaKaD+WsjFKg==; status=500; method=PUT; path=/sandboxes/000000000000a38d5dd8e2763f913c6c; user=chef-user; msg=[]; req_time=8043; rdbms_time=5; rdbms_count=2; s3_time=8028; s3_count=1


However the integration tests all pass ...

$ chef-server-ctl test

...

Sandboxes API Endpoint
  Sandboxes Endpoint, POST
    when creating a new sandbox
      should respond with 201 Created
  Sandboxes Endpoint, PUT
    when committing a sandbox after uploading files
      should respond with 200 OK
Deleting client pedant_admin_client ...
Deleting client pedant_client ...
Pedant did not create the user admin, and will not delete it
Deleting user pedant_non_admin_user ...
Deleting user knifey ...

Finished in 54.02 seconds
70 examples, 0 failures

Hopefully will be fixed soon.

Monday, October 14, 2013

Vim paste tricks

Vim is cool! But sometimes can be annoying - for example you edit your file etc. and have a short snippet of code you want to insert into it, so copy and paste BUT ... vim has the indent on. Now there are different types of indent and you can try to turned them off - see Indenting source code. A neat trick is the :paste option however there are cases where you want to turn off indented for different reasons. You can use something like this into your .vimrc
function! IndentPasteOff()
  set noai nocin nosi inde=
endfunction

function! IndentPasteOn()
  set ai cin si
endfunction

nmap _0  :call IndentPasteOff() 
nmap _1  :call IndentPasteOn() 
" paste on/off
set pastetoggle=
Now you don't want any indent type _0 and to indent again _1. Happy viming!

Wednesday, August 28, 2013

Why schematics is awesome

Schematics it's a python library that has primary use to validate json data.

Why is this awsome versus other validation tools like validictory or jsonschema ?

The workflow by design is based on the django/sqlalchemy, so you get back an object that has fields and each field can have it's own type, even complex types like objects that contain their own fields and so on.

One more thing that schematics has is - default values. This comes very very handy if you want your data to be normalized,

This is a simple example on how it works:

>>> from schematics import models
>>> from schematics import types
>>>
>>> class Client(models.Model):
>>>     name = types.StringType(required=True, min_length=1, max_length=255)
>>>     email = types.EmailType(required=True)
>>>     active = types.IntType(default=1)
>>>
>>> c = Client(raw_data={'name': 'John', 'email': 'john@example.com'})
>>> c.validate()
>>> c.serialize()
>>> {'active': 1, 'email': u'john@example.com', 'name': u'John'}

There are many other options to validate data - see Schematics.

Wednesday, February 27, 2013

A new era - Azure Cloud

It's official I started my first Windows Azure instance


$ ssh azureuser@kickrobot.cloudapp.net
The authenticity of host 'kickrobot.cloudapp.net (168.61.33.28)' can't be established.
RSA key fingerprint is 0a:aa:74:ec:6a:0d:13:de:1c:c7:e2:8c:e5:74:0b:cf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'kickrobot.cloudapp.net,168.61.33.28' (RSA) to the list of known hosts.
azureuser@kickrobot.cloudapp.net's password: 

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-21-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Thu Feb 28 01:04:05 UTC 2013

  System load:  0.04              Processes:           92
  Usage of /:   3.0% of 29.52GB   Users logged in:     0
  Memory usage: 16%               IP address for eth0: 10.74.234.17
  Swap usage:   0%

  Graph this data and manage this system at https://landscape.canonical.com/

45 packages can be updated.
26 updates are security updates.

Get cloud support with Ubuntu Advantage Cloud Guest
  http://www.ubuntu.com/business/services/cloud

Monday, February 18, 2013

Ansible within AWS (ec2)

Ansible is a new configuration/orchestration management framework and is just awesome!

Why is that ?

  • very short learning curve
  • no need for a specific data service language
  • can be used to both execute/configure machines
  • very simple to write your own modules
  • can be used into a push or pull model
  • ... ansible.cc ... for more info

This is how you can use it within aws(ec2) to manage services.

# Install ansible via git
$ cd /tmp
$ git clone https://github.com/ansible/ansible.git
$ cd ansible
$ python setup.py install
$ pip install boto # used for the ec2 inventory

# setup aws variables
$ export ANSIBLE_HOSTS=/tmp/ansible/plugins/inventory/ec2.py # ec2 inventory
$ export AWS_ACCESS_KEY_ID='YOUR_AWS_API_KEY'
$ export AWS_SECRET_ACCESS_KEY='YOUR_AWS_API_SECRET_KEY'

# setup ssh access
$ ssh-agent 
SSH_AUTH_SOCK=/tmp/ssh-dFUXvhH31724/agent.31724; export SSH_AUTH_SOCK;
SSH_AGENT_PID=31725; export SSH_AGENT_PID;
echo Agent pid 31725;
$ ssh-add /PATH_TO/YOUR_SSH_KEY_OR_PEM

# I use ec2-user onto a amazon linux
ansible -m ping all -u ec2-user
ec2-54-242-33-49.compute-1.amazonaws.com | success >> {
    "changed": false, 
    "ping": "pong"
}

The ec2.py inventory has connected to the aws api and obtained all the instances running within the account that has the exported credentials AWS SECRET/KEY. Then ansible used the ping module -m ping to ping the host(s). The ping module just connects via ssh to a host and reports pong with changed: false.

Now that we can connect let's see if we can leverage some of the metadata offered by AWS. My server runs into the security group ssh-web and to access this information from within ansible all you have to do is to use security_group_ssh-web. Where this come from is the ec2.py inventory script, if you run the script directly you will see something like this.

$ /tmp/ansible/plugins/inventory/ec2.py

{
  "i-e4c9ca9c": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "key_mykey": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "security_group_ssh-web": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "tag_Name_srv01": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "type_t1_micro": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "us-east-1": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "us-east-1b": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ]
}

In order to start the apache web server on all instances belonging to the ssh-web group is as simple as:

ansible -m service -a "name=httpd state=started"  security_group_ssh-web  -u ec2-user -s
ec2-54-242-33-49.compute-1.amazonaws.com | success >> {
    "changed": true, 
    "name": "httpd", 
    "state": "started"
}

# notice -s which stands for use sudo without password 
From here on sky is the limit, you can take a look at the docs site http://ansible.cc/docs/ for more complex examples.

Tuesday, January 29, 2013

MongoDB EC2 Deployment

MongoDB it's part of the NoSQL ecosystem and is presented as scalable, high-performance, auto sharding database. A full list of features that MongoDB offers can be found at mongodb.org.

In this post I'll explain how MongoDB can be deployed into the cloud - EC2 specifically in order to support a three layers architecture for a web application. Before I go into technical details and show how it can be done we'll have to understand why things are done that way, since these days there are many ways to achieve the same thing. Will wear different hats and start with being the architect. First lets' establish who is involved into the whole process to build a web application. Obviously there is somebody that will pay for the application - will call this entity - the business. The next entity will be developers who will actually write the code and the last(but not the least) will be the infrastructure people the admins (the architect belongs to this group). All these three entities can be combined into a single physical person or spread apart different i departments into an enterprise but the roles remain the same.

Now that we know who is involved let's see what does the business want. Well that's quite simple and usually spans from a single line as I want an application that is resilient to failure and always responsive to a full business case that has all the fine grained details. The more details the better but is not necessarily responding to the question I have in mind. My real question is how popular is going to be the app, this will give an estimate on what sort of traffic you will get, based on the traffic volume and hopefully pattern, with this you can determine quite a lot - from sizing the environment to how much it will cost to operate(remember in EC2 there is only OPEX cost). In my experience the business will not have any more input into this equation and that is just fine. So the next step will be to start looking at how developers think about it, would be a 'real time', how many reads and writes will be done from the web servers to the database, how all pieces will fall into place, etc.

Will start building on the following premises about the application

  • has to be resilient to failures (business's requirement)
  • has to be responsive all the time (business's requirement)
  • will need to support an initial high volume of users with the option to grow (business's requirement)
  • balanced as 70% reads and 30% writes for the database traffic(developers predict)
  • to be cost effective (this is always relative to the business's budget)
  • because of the above (cost) constraint the business agree that if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours.

At this point we have enough details to start putting all the pieces together. Will not mention the load balancer (piece no.1) and the application servers (piece no.2) into the three layers architecture web app. The focus is going to be the database (piece no.3) - MongoDB.

Starting at the bottom the smallest part is an individual server (instance), will explain how this can be made resilient to failure. Into ec2 the instances are into a flat network (not talking about VPC) and the storage is divided into two types.

  • ephemeral or instance storage - this is the disk space you have on the local hypervisor that hosts your instance and will be destroy just after you terminated the instance
  • network block storage - EBS volumes - which will be resilient to terminations of your instance

Obviously you will not store the database's data on the ephemeral storage so the only real option will be the EBS (network storage). The EBS come in two flavors these days - with provision IOPS and normal. The difference between the two, the first will have a guarantee performance and the second is best effort with a minimum (which is quite low) of performance. A consequence of this is that the provision IOPS is quite expensive compared with a normal EBS. Since we do have a constraint on cost will have to look into using the normal EBS but there is hope - you can group a number of volumes and use some of the Linux tools as LVM or Raid to stripe or mirror. So will attach 10 EBS volumes to each server that acts as a MongoDB database (you can use less than that of course but I'll do a Raid 10).

    
        # you will need to have your ebs volumes attached to the server
        mdadm --create --verbose /dev/md0 --level=10 --raid-devices=8 /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq
        # now create a file system
        mkfs.xfs /dev/md0
        #mount the drive
        mount /dev/md0 /mnt/mongo/data
        # I said 10 and there are 8 ...
        # The rest of the 2 volumes are used for Journaling - threfore if Journaling is enabled will not affect the data
    

With this done will look onto how MongoDB will offer resilience to failure. The solutions that are offered are Replica Sets and Sharding more to say partitioning at the database layer.

MongoDB Replica Sets - what is it and how it works. The idea is to have a set of database group together part of a replica set which will replicate between them the data asynchronous . Within the replica set there is a PRIMARY and a number of SECONDARY databases. Who is the PRIMARY is established by a process called voting. How a database server will become PRIMARY is based on different criteria, the important thing about it is that is the only database server that accepts write into the replica set. All other servers part of the replica set will just accept reads. An other very important factor is that a replica set is considered healthy only if it has 51% of the capacity available. That is from 3 servers you can loose only 1 - hence you just lost 33.3% and still have 66.6% up. If you have 4 servers and you loose 2, guess what ... well you just lost 50% of your capacity and the replica set is not healthy - remember you need to have 51% available. Knowing this is obvious that using odd numbers into a replica set is the preferred choice. There are workarounds for this where you can have servers not participating in voting, have arbiters (special servers that only participate in voting) and you can as well give more weight to a server into the voting process.

Why all this happens you may wonder - well there is the CAP theorem and MongoDB will choose CA from it - this is how 10Gen - makers of MongoDB decided to design the product and we have to live with it. So from the CAP MongoDB will choose CA - you can read why and how at Consistency and Availability at MongDB.

Now let's pause for a minute and see how the overall database cluster will look with a Replica Set:

1ZfNjpswFIWfhmUQxkBgmTCZdtNN06pSN5UTHLDGYGScCeTpC4MJcQxt1ML8SAkyx9jc+/lwDQYM0/ITR3nyhUWYGrYVlQZ8MGzbt6z62AjVjRBzErUSkMKRRLhQJMEYFSRXxT3LMrwXinZgVJ0sRzHWhO0eUV39QSKRyOBsr9c/YxIn3W2AF7Q9hai6OSJ8QEcqFi+S3XaXlhwQyOsrKdhO0E2dKSGcGUsVgeOCnNUwD0RNd8d4hLkiUZI9XXOCGwOGnDHRttIyxLRZmA56uwqPI70XPBxn4p4BMv1nRI8y9K84Jiyrte/bxWa1/WbIGa4JFieSUpRhORhzgcvRAPq0aq9hlmLBq4a4DKDzlfSZ7crzU7+8DpRacrW0l4FI4osvc/f51g2Z8ggvPX892QTlTTMt4+ZJMdGpMGPOjnlhomdEKNoRSkT168zuBGL/EYjjKzwGcPi2TgOACWj4/wYjQgLtUIFNHtU3Xx8IpSGjrPU59B7tB6++3boQnD3hriebhpbnmK7Ca+l2wrWBAp2YG0xgH/ju7AOWzl/9483kH+B8PAOBQK0/r+sf9935B97U4yH/DFTjafzjfTz/QOC9ZQWCYACZNfhb/awPi/oPVhrVOn8xRTWGNyyAp6NwgY6ie3/7LxRDe/kIgfVcBG7LyRAAx50JwNBuNAIgnAsABMs7PDAbAn0HSlkWs4UFZvO8uuBOMPD0e0Pvr1Okq+8fMl29cE5lcH/5hvnqG4TMF87n59db3/q0//J76bv6Noeb3w==

However having all instances into one single region doesn't look too good in case of a total failure on that region is always better to have an instance outside it - in this case I choose US-WEST - California.

1ZhRj6IwEMc/DY8a2gLio7ru3cu9nHfZ5F4uVSo2W6gpdUU//cFSlNLibgyYvUQJnUo7/19nplQHLZL8m8D73Q8eEeZAN8od9ORAGLpucS0Np5YhFjSqTEAZDjQimWaSnDNJ97pxw9OUbKRm23KmD7bHMTEMqw1mpvWFRnKnnIPB1f6d0HhXTwOCadWTyVM9RkS2+MDk6N0Eq+7cVSN5Vfuk2tCb1iOnmgdnzhPNIEhGz7qXW6rcUOjWXEREaD9hNH1tYkJLBy0E57K6S/IFYeW61MyrkZ47ei90BEnlZx5Q6t8wOyjXf5KY8rSw/V6NlrPVL6d2vgEwO9KE4ZSoh4mQJO904CqrCDXCEyLFqQSuePuKt/LHV7Mdr4sLJ8q2aywsrMMRK3rxZeir3OJGKe7AZco3te7wvrxN8rjMkzE+ZuNY8MM+G+M3TBleU0bl6e+Zf5IHvMkDgbFOxAIkhCYPAHrgEd6HI8ISr3FGxiIqJp9vKWMLzngV6Ch4hk9BMd08k4K/kron7YeXF2q0Jn6Nr8GrTuEmL3/aQ/igLxc+oNZ6iR4LjzAYKH6A9/8FEAj0fJsED4sfBCy4XOtn9qe4jIovmBlEC+2yj0yaaBwuXBogfGApxEEPIGx1uEP/fCj97UAo2rZa4g+EwEydhKcxH7lgsAVHra3Gm1oEB7bNtw/Bfpdgs2b0tsKwVRwfphaYaoffKm6/eXmhq9FArnWvsADpZ68IvuJe8QGyadBKGQQfuF3YXjc6quTi/hz6CMGkXTXAQ+ukGTaqbKDhJBuJ8rgqab6SNw9oxfVlWZ7R2onT3wHNqx+5dUJzbfLvOKEVzevZ972v8ecEWv4D

This is how we are doing so far in respect to initial requirements:

  • has to be resilient to failures - Replica Set will provide that
  • if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours. Having a member of the Replica Set into US-WEST will fulfill it.

Well how about the rest of the requirements ?!

From the infrastructure point of view there is only one more requirement that you can satisfy - the place to grow. This can be solved initially by vertically scaling the instance sizes but you can upscale up to the biggest instance ... and then what else you can do is to add what MongoDB calls Shards, basically you will have to partition the database. This may sound very scary but MongoDB makes it quite easy. It will automatically split the data based on a key(s). You can provide the key yourself or let MongoDB use the _id - this is a special object that will be provided and is called Object identifier for every document stored.

Alright ! - we are doing quite ok so far - from infrastructure point of view we solved all the problems that the business asked, but is not over yet - what about backups ? You can't run a site without it ! First let's pause and think about what the application will be doing in the first place with the actual setup. Based on the capabilities that MongoDB offers will write to the PRIMARY and will read from the SECONDARY ... well that means that if the application servers are hosted into region US-EAST than will have to go over the wire to read from the servers hosted in US-WEST ?!. Not a very good idea ... but there is hope, MongoDB has a an option when you create a Replica Set which says that a specific member of the replica will be hidden, meaning will replicate data but will not make itself available for any reads or writes.

With this in place we can actually have servers into a remote region, US-WEST that will just replicate the data but never been actually used by the application servers. Well this is the best candidate for backups from all other members!

The final infrastructure diagram including two Replica Sets, two Shards and the backups looks like this:

3Vtbb+I4FP41fWwUX3LhsfSy+zAjVe2uRtqXkSEuWA0xSkwL/fXrQAIkx1Ca2ikdiVZgJ3bOd26fj50Lcj1b/pWz+fSnTHh6gf1keUFuLjCOfV//LxtWrYZJLpJNE6oaFiLhRaNJSZkqMW82jmWW8bFqtD3JtDnYnE04aHgcsxS2/hKJmlatKBzsOv7mYjKt5olxuOko1KoeI+FPbJGqy3UT3nQv/U0frS5fVb/Jdtys8QBvUs4aDTkvxFvzIZ9E9RAVciOZJzxvXJKK7HkfJXJ7Qa5zKdXm22x5zdNSLTXkm5HuDvRuwcl5pk65oRL+haWL6tF/ymwib4a68TpdFEo/bhu94lXMUpbx6laeK748OP1OKG1nXM64ylcl2pveiFAPBxXg1Q1+5FUtrzv94vqq6Z5qSVjhyioEJ9sJdiLrL5XUByCDEECJp2xefp0tJ6WreOy18Ca5XMwLj70wkbKRSIVa/X6TJ6KCj6JCkNfEJPABIDGGeCBkAY+4GxwJU2zECu7liZ58+CTS9FqmcmPsJLzDN6GeblioXD7zuiezgxeNG2hFgcGA6ADiFQwsmA85O/NBiJxgP6Ej+0H0+xkQwoMWYr3aEEEGyHzj5+o//e9S/6ErgKqWX9nwpqgZj0MIRIAgEHWa/RQQplh8QP6hK/mhMZggoIZ8ZAUC6D6zMiNf+kh/XMl8Ssapk611iYMjEsPYYUvLiL4v8i4K2Ba6drJzYh3Ui5qIYDSAecMV76DfkHgEqM1eMaU9Jo563LMiH+FJZuSKftRe/K3oB8EAs17tqM7n50FAoFcRDC3IFQcJzoKDwGBswsAVCQngoqZKydglCTEkoNjgBa6ISHCQemGXRASjLxYb8i8dIPNEN7nTdRy02BcmJpFDV/wrgGxjK7QzVcc+KHbF8bapF7kJlLvfgh/129YeGIg3JiZj9y3wAwStHcprmzJtUD8cAqL2UsRUfYgNkNip2ITdIHFLmd7BrF4n15sTCODlrFoD4dqtV0n32GHFSKLQM1XKrSxZTcToqz2HEBhO+vMc2rH4+6WegwKQhnRi6nHReoxmOfQf0qYcATHl3shV5oW2MhVJwjPdNuOzEc+Li3qf0IH4ISDXKDRyj61jWI8fbnjmO3LXT36UZiJnZT6YK2zQzPdMnWJQl+pX7nqtsif3kI2fdR4oB85K+W8eust/nGPGPiinBBhyTG3onitnD2F59z4XM7Z+zEehuDPZEWmrPjK6eeSofFLXh/cll7K84ur+vhSe5y/rUHdojUGGU5kLTRAU03Pc+BZAoWC1aT5kgILYyJ6snDMIO9a3x3I2Xyju8TGGVOHublAey+lMFTbaOlyJi0EcwQC1wIBYYMOHOha3zwuw2LB76giwqCNBPy/AkKEc4Ayxjvz9zHwSGYmsM9DoHwFa3J+ZDTpGsoyrV5k/i2zi8ZQVSox/p5Ilv0dMZ8pxeZCwDSMOS1bXHcbjeRSHoGobmPJoEBlwjD6PI/JP8FeeTPhj9VPLJdTqQYOnhMxutz3re8oLOwJBMG4DgbBxMeWbSg/H6pjVnPdS6Ec5NmG8m7Aep5CLfMyrW3ewgtFo3B4N1YvyeijF8glXYKi1hrZ4nKi0E+KFRaXpcdcw7LveBxSJBruFcj+K3Jvw44r0G2NR2nogm2o8oUrch++FsOSHie8Rh0oLaTvuEZ929r4IFKJKm2uPZlNzJ5w16kdzhp1O7MVONQeMhfixF9rTHcVOdWc4TftDUwDdMqxIAFxKWyolYGj12DfHRuqqjoRO2ZtieS5fP22dCIiLB+4ME/kHZ/uoTUI9gaFsGuQHyhhsrEr6OSzNUIxZ+oONeHovC1HGFd0/kkqVL9FsL7hKxaTsUHJuwYBpe8cMRcZ9DoNGrbxYUh97+45Q4cB4/swVVBjWi78NVGSPkPSCFUwJD3xSyo79fx8vb68e/4HY2TvBsRVs1ZRzv7Zs2nCldoSHm0j7wv+67Vf4wLCvEMUHTo1/XH79c/cq4CZa717VJLf/Aw==

Now let's switch roles and have the developer hat on. You've been told about all this setup and you may think all you have to do is drop the code on the application servers and you are done, but wait a minute what will happen if a full replica set will be unavailable ? Well this is the part called design for failure.

Let's assume the application has a few a entities, one of them will be the User. In a typical SQL world you would have a table that has something like user_id, first_name and so on. Then all other tables will be linked to this table with Foreign Primary Key on user_id. If we would try to replicate this scenario the schema will look like this:

    

    /* users collection */
    > db.users.find().pretty()
    {
            "_id" : ObjectId("5107fb6736141503d37b6a31"),
            "username" : "johnd",
            "password" : "bc7a0154948baa69ecbe1d7843b25113fc5f3f20",
            "first_name" : "fname"
    }
    >
    /* objects collection - linking back to users via user_id */
    > db.objects.find().pretty()
    {
            "_id" : ObjectId("5107fc7536141503d37b6a32"),
            "user_id" : ObjectId("5107fb6736141503d37b6a31"),
            "data" : "all the goodies you need"
    }
    >

    

What can go wrong with this schema ?! Well, let's say we will shard on the users._id key, than MongoDB will split data based on the chunk size as need it and distribute it accordingly. On the objects collection we will have different options to shard, we can use the objects._id or objects.user_id etc. However if user X is located on shard01 and most of his entries (if not all) are located on shard02 than if shard02 is down the user will be without entries! If shard01 is down the user can't even use the system. So what will be a better approach? Locality of the data, have the user collection contain the objects collection. So this will look as:



    /* users collection embeds the objects collection */
    > db.users.find( {"username": "bobc"}).pretty()
    {
            "_id" : ObjectId("5107fe2936141503d37b6a33"),
            "username" : "bobc",
            "password" : "ad7a0154948baa69ecbe1d7843b25113fc5f3f20",
            "first_name" : "fname",
            "data" : {
                    "key" : "value"
            }
    }


With this schema if any of the shards are down than the user can NOT use the system but in case it can use the system his data will be consistent.

The process of choosing the 'right' sharding key is very tricky, has a few constraints from the MongoDB part as well it depends on your data structure and requirements. For more info see Shard keys for MongoDB.

Final review of the total requirements:

  • has to be resilient to failures
  • MongoDB ability to function into a Replica Set.
  • has to be responsive all the time
  • MongoDB Replica Set, write to Primary and read from Secondary. In the case you need more capacity there are two options, upscale of instance size and sharding.
  • will need to support an initial high volume of users with the option to grow
  • Again Sharding and Replica Sets will fulfill.
  • balanced as 70% reads and 30% writes for the database traffic
  • For the writes you have the option to add more shards, for reads you can add more servers to the existing Replica Sets, I choose three servers but nobody stops you from adding five for example.
  • to be cost effective (this is always relative to the business's budget)
  • Considering all other requirements having three members per Replica Set will be the minimum to have.
  • because of the above (cost) constraint the business agree that if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours.
  • In the case of total failure of Primary Site from region US-EAST you can still have the data in US-WEST. If your infrastructure is automated it will take no time to re-create all three architecture layers.
  • Backups of the data
  • The nodes from US-WEST are the perfect candidate for this. Use EBS snapshots and ship data to S3. With these snapshots you can recover even if all nodes (including US-WEST) are down.