Showing posts with label ec2. Show all posts
Showing posts with label ec2. Show all posts

Wednesday, December 29, 2021

Victoria metrics on Aws EC2 instance

Will configure one single EC2 instance as a Victoria Metrics server to be used as Promethues storage.

The access to VM(victoria metrics) is done via port 8247 and is protected by http basic auth. All traffic is encrypted with a self sign certificate.

Installation

Will install manually by downloading the releases from github and configure the local system.

Download binaries

# create a group and user for vm
$ sudo groupadd -r victoriametrics
$ sudo useradd -g victoriametrics victoriametrics
 
# download
$ curl -L https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.70.0/victoria-metrics-amd64-v1.70.0.tar.gz --output victoria-metrics-amd64-v1.70.0.tar.gz

# unpack and install it
$ sudo tar xvf victoria-metrics-amd64-v1.70.0.tar.gz -C /usr/local/bin/
$ chown root:root /usr/local/bin/victoria-metrics-prod

# create data directory
$ sudo mkdir /var/lib/victoria-metrics-data
$ chown -v victoriametrics:victoriametrics /var/lib/victoria-metrics-data

Configure the service

cat >> /etc/systemd/system/victoriametrics.service <<EOF
[Unit]
Description=High-performance, cost-effective and scalable time series database, long-term remote storage for Prometheus
After=network.target

[Service]
Type=simple
User=victoriametrics
Group=victoriametrics
StartLimitBurst=5
StartLimitInterval=0
Restart=on-failure
RestartSec=1
ExecStart=/usr/local/bin/victoria-metrics-prod \
        -storageDataPath=/var/lib/victoria-metrics-data \
        -httpListenAddr=127.0.0.1:8428 \
        -retentionPeriod=1
ExecStop=/bin/kill -s SIGTERM $MAINPID
LimitNOFILE=65536
LimitNPROC=32000

[Install]
WantedBy=multi-user.target

EOF

At this point your can start the service systemctl enable victoriametrics.service --now, however the port 8428 is not protected in any way nor is encrypted so will add basic authentication and tls encryption with a self sign certificate, any valid certificate will work however. Note that listens only on localhost.

Vmauth

To protect the service will use vmauth which is part of a tool set released by victoria metrics.

# download and install the vm utils

$ curl -L https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.70.0/vmutils-amd64-v1.70.0.tar.gz --output vmutils-amd64-v1.70.0.tar.gz
$ sudo tar xvf vmutils-amd64-v1.70.0.tar.gz -C /usr/local/bin/
$ chown -v root:root /usr/local/bin/vm*-prod
Configure vmauth

Create a config file (config.yml) to enable basic authentication.

The format of the file is simple, you need a username and a password.

$ sudo mkdir -p /etc/victoriametrics/ssl/
$ sudo chown -vR victoriametrics:victoriametrics /etc/victoriametrics
$ sudo touch /etc/victoriametrics/config.yml
$ sudo chown -v victoriametrics:victoriametrics /etc/victoriametrics/config.yml

# generate a password for our user
$ python3  -c 'import secrets; print(secrets.token_urlsafe())'
KGKK_NoiciEMn6KdBk6CkcLHZt6TpB-Cgt12UFqnutU

# wite the config
$ sudo cat >> /etc/victoriametrics/config.yml <<EOF
> users:
>   - username: "user1"
>     password: "KGKK_NoiciEMn6KdBk6CkcLHZt6TpB-Cgt12UFqnutU"
>     url_prefix: "http://127.0.0.1:8428"
> # end config
> EOF
Install a self sign certificate
$ sudo openssl req -x509 -nodes -days 365 -newkey rsa:4096 -keyout /etc/victoriametrics/ssl/victoriametrics.key -out /etc/victoriametrics/ssl/victoriametrics.crt

$ sudo chown -Rv victoriametrics:victoriametrics /etc/victoriametrics/ssl/
Enable vmauth service
cat >> /etc/systemd/system/vmauth.service <<EOF
[Unit]
Description=Simple auth proxy, router and load balancer for VictoriaMetrics
After=network.target

[Service]
Type=simple
User=victoriametrics
Group=victoriametrics
StartLimitBurst=5
StartLimitInterval=0
Restart=on-failure
RestartSec=1
ExecStart=/usr/local/bin/vmauth-prod \
        --tls=true \
        --auth.config=/etc/victoriametrics/config.yml \
        --httpListenAddr=0.0.0.0:8247 \
        --tlsCertFile=/etc/victoriametrics/ssl/victoriametrics.crt \
        --tlsKeyFile=/etc/victoriametrics/ssl/victoriametrics.key \
ExecStop=/bin/kill -s SIGTERM $MAINPID
LimitNOFILE=65536
LimitNPROC=32000

[Install]
WantedBy=multi-user.target


EOF

Start and enable systemctl enable vmauth.service --now .

To test you will need first to construct a base64 string from the username and password you have written into the config.ymlfile.

For example user vmuser and password secret

$ echo -n 'vmuser:secret' | base64
$ dm11c2VyOnNlY3JldA==

# to test vmauth
$ curl -H 'Authorization: Basic dm11c2VyOnNlY3JldA==' --insecure https://localhost:8247/api/v1/query -d 'query={job=~".*"}'

Operations

Snaphots

List what’s available

curl 'https://localhost:8247/snapshot/list'

{"status":"ok","snapshots":["20211227145126-16C1DDB61673BA11"

Create a new snapshot

curl 'https://localhost:8247/snapshot/create'

{"status":"ok","snapshot":"20211227145526-16C1DDB61673BA12"}

List again the snapshots

curl -s 'https://localhost:8247/snapshot/list' | jq .
{
  "status": "ok",
  "snapshots": [
    "20211227145126-16C1DDB61673BA11",
    "20211227145526-16C1DDB61673BA12"
  ]
}

Backups

The snapshots are located on local disk under data path (parameter -storageDataPath=) on my instance it resolves to storageDataPath=/var/lib/victoria-metrics-data/.

The data into snapshots is compressed with Zstandard.

To push the backups to s3 you can use vmbackup.

$ sudo vmbackup-prod -storageDataPath=/var/lib/victoria-metrics-data  -snapshotName=20211227145526-16C1DDB61673BA12 -dst=s3://BUCKET-NAME/`date +%s`

...

2021-12-29T16:07:20.571Z        info    VictoriaMetrics/app/vmbackup/main.go:105        gracefully shutting down http server for metrics at ":8420"
2021-12-29T16:07:20.572Z        info    VictoriaMetrics/app/vmbackup/main.go:109        successfully shut down http server for metrics in 0.001 seconds

For more info you can see vmbackup.

Sunday, March 2, 2014

Getting started with the new AWS tools

AWS replaced their java based tool with a neat python package for linux (didn't try the windows based ones yet ...). Why are these tools nice ?! - written in python - support from one tool for all services - wizard configuration To get started

# use virtualenv or global
# this example shows virtualenv

$ mkdir AWS
$ virtualenv AWS 
...
$ source AWS/bin/activate

# install the tools from pypi

$ pip install awscli
...
# configure

$ aws configure
AWS Access Key ID [None]: XXXXXX
AWS Secret Access Key [None]: XXXXXX
Default region name [None]: us-west-1
Default output format [None]: json

$ aws ec2 describe-regions
{
    "Regions": [
        {
            "Endpoint": "ec2.eu-west-1.amazonaws.com", 
            "RegionName": "eu-west-1"
        }, 
        {
            "Endpoint": "ec2.sa-east-1.amazonaws.com", 
            "RegionName": "sa-east-1"
        }, 
        {
            "Endpoint": "ec2.us-east-1.amazonaws.com", 
            "RegionName": "us-east-1"
        }, 
        {
            "Endpoint": "ec2.ap-northeast-1.amazonaws.com", 
            "RegionName": "ap-northeast-1"
        }, 
        {
            "Endpoint": "ec2.us-west-2.amazonaws.com", 
            "RegionName": "us-west-2"
        }, 
        {
            "Endpoint": "ec2.us-west-1.amazonaws.com", 
            "RegionName": "us-west-1"
        }, 
        {
            "Endpoint": "ec2.ap-southeast-1.amazonaws.com", 
            "RegionName": "ap-southeast-1"
        }, 
        {
            "Endpoint": "ec2.ap-southeast-2.amazonaws.com", 
            "RegionName": "ap-southeast-2"
        }
    ]
}

# Done!

For more info the project is hosted at github.com The reference table Aws tools references and the home page at aws.amazon.com/cli.

Monday, February 18, 2013

Ansible within AWS (ec2)

Ansible is a new configuration/orchestration management framework and is just awesome!

Why is that ?

  • very short learning curve
  • no need for a specific data service language
  • can be used to both execute/configure machines
  • very simple to write your own modules
  • can be used into a push or pull model
  • ... ansible.cc ... for more info

This is how you can use it within aws(ec2) to manage services.

# Install ansible via git
$ cd /tmp
$ git clone https://github.com/ansible/ansible.git
$ cd ansible
$ python setup.py install
$ pip install boto # used for the ec2 inventory

# setup aws variables
$ export ANSIBLE_HOSTS=/tmp/ansible/plugins/inventory/ec2.py # ec2 inventory
$ export AWS_ACCESS_KEY_ID='YOUR_AWS_API_KEY'
$ export AWS_SECRET_ACCESS_KEY='YOUR_AWS_API_SECRET_KEY'

# setup ssh access
$ ssh-agent 
SSH_AUTH_SOCK=/tmp/ssh-dFUXvhH31724/agent.31724; export SSH_AUTH_SOCK;
SSH_AGENT_PID=31725; export SSH_AGENT_PID;
echo Agent pid 31725;
$ ssh-add /PATH_TO/YOUR_SSH_KEY_OR_PEM

# I use ec2-user onto a amazon linux
ansible -m ping all -u ec2-user
ec2-54-242-33-49.compute-1.amazonaws.com | success >> {
    "changed": false, 
    "ping": "pong"
}

The ec2.py inventory has connected to the aws api and obtained all the instances running within the account that has the exported credentials AWS SECRET/KEY. Then ansible used the ping module -m ping to ping the host(s). The ping module just connects via ssh to a host and reports pong with changed: false.

Now that we can connect let's see if we can leverage some of the metadata offered by AWS. My server runs into the security group ssh-web and to access this information from within ansible all you have to do is to use security_group_ssh-web. Where this come from is the ec2.py inventory script, if you run the script directly you will see something like this.

$ /tmp/ansible/plugins/inventory/ec2.py

{
  "i-e4c9ca9c": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "key_mykey": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "security_group_ssh-web": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "tag_Name_srv01": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "type_t1_micro": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "us-east-1": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ], 
  "us-east-1b": [
    "ec2-54-242-33-49.compute-1.amazonaws.com"
  ]
}

In order to start the apache web server on all instances belonging to the ssh-web group is as simple as:

ansible -m service -a "name=httpd state=started"  security_group_ssh-web  -u ec2-user -s
ec2-54-242-33-49.compute-1.amazonaws.com | success >> {
    "changed": true, 
    "name": "httpd", 
    "state": "started"
}

# notice -s which stands for use sudo without password 
From here on sky is the limit, you can take a look at the docs site http://ansible.cc/docs/ for more complex examples.

Tuesday, January 29, 2013

MongoDB EC2 Deployment

MongoDB it's part of the NoSQL ecosystem and is presented as scalable, high-performance, auto sharding database. A full list of features that MongoDB offers can be found at mongodb.org.

In this post I'll explain how MongoDB can be deployed into the cloud - EC2 specifically in order to support a three layers architecture for a web application. Before I go into technical details and show how it can be done we'll have to understand why things are done that way, since these days there are many ways to achieve the same thing. Will wear different hats and start with being the architect. First lets' establish who is involved into the whole process to build a web application. Obviously there is somebody that will pay for the application - will call this entity - the business. The next entity will be developers who will actually write the code and the last(but not the least) will be the infrastructure people the admins (the architect belongs to this group). All these three entities can be combined into a single physical person or spread apart different i departments into an enterprise but the roles remain the same.

Now that we know who is involved let's see what does the business want. Well that's quite simple and usually spans from a single line as I want an application that is resilient to failure and always responsive to a full business case that has all the fine grained details. The more details the better but is not necessarily responding to the question I have in mind. My real question is how popular is going to be the app, this will give an estimate on what sort of traffic you will get, based on the traffic volume and hopefully pattern, with this you can determine quite a lot - from sizing the environment to how much it will cost to operate(remember in EC2 there is only OPEX cost). In my experience the business will not have any more input into this equation and that is just fine. So the next step will be to start looking at how developers think about it, would be a 'real time', how many reads and writes will be done from the web servers to the database, how all pieces will fall into place, etc.

Will start building on the following premises about the application

  • has to be resilient to failures (business's requirement)
  • has to be responsive all the time (business's requirement)
  • will need to support an initial high volume of users with the option to grow (business's requirement)
  • balanced as 70% reads and 30% writes for the database traffic(developers predict)
  • to be cost effective (this is always relative to the business's budget)
  • because of the above (cost) constraint the business agree that if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours.

At this point we have enough details to start putting all the pieces together. Will not mention the load balancer (piece no.1) and the application servers (piece no.2) into the three layers architecture web app. The focus is going to be the database (piece no.3) - MongoDB.

Starting at the bottom the smallest part is an individual server (instance), will explain how this can be made resilient to failure. Into ec2 the instances are into a flat network (not talking about VPC) and the storage is divided into two types.

  • ephemeral or instance storage - this is the disk space you have on the local hypervisor that hosts your instance and will be destroy just after you terminated the instance
  • network block storage - EBS volumes - which will be resilient to terminations of your instance

Obviously you will not store the database's data on the ephemeral storage so the only real option will be the EBS (network storage). The EBS come in two flavors these days - with provision IOPS and normal. The difference between the two, the first will have a guarantee performance and the second is best effort with a minimum (which is quite low) of performance. A consequence of this is that the provision IOPS is quite expensive compared with a normal EBS. Since we do have a constraint on cost will have to look into using the normal EBS but there is hope - you can group a number of volumes and use some of the Linux tools as LVM or Raid to stripe or mirror. So will attach 10 EBS volumes to each server that acts as a MongoDB database (you can use less than that of course but I'll do a Raid 10).

    
        # you will need to have your ebs volumes attached to the server
        mdadm --create --verbose /dev/md0 --level=10 --raid-devices=8 /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq
        # now create a file system
        mkfs.xfs /dev/md0
        #mount the drive
        mount /dev/md0 /mnt/mongo/data
        # I said 10 and there are 8 ...
        # The rest of the 2 volumes are used for Journaling - threfore if Journaling is enabled will not affect the data
    

With this done will look onto how MongoDB will offer resilience to failure. The solutions that are offered are Replica Sets and Sharding more to say partitioning at the database layer.

MongoDB Replica Sets - what is it and how it works. The idea is to have a set of database group together part of a replica set which will replicate between them the data asynchronous . Within the replica set there is a PRIMARY and a number of SECONDARY databases. Who is the PRIMARY is established by a process called voting. How a database server will become PRIMARY is based on different criteria, the important thing about it is that is the only database server that accepts write into the replica set. All other servers part of the replica set will just accept reads. An other very important factor is that a replica set is considered healthy only if it has 51% of the capacity available. That is from 3 servers you can loose only 1 - hence you just lost 33.3% and still have 66.6% up. If you have 4 servers and you loose 2, guess what ... well you just lost 50% of your capacity and the replica set is not healthy - remember you need to have 51% available. Knowing this is obvious that using odd numbers into a replica set is the preferred choice. There are workarounds for this where you can have servers not participating in voting, have arbiters (special servers that only participate in voting) and you can as well give more weight to a server into the voting process.

Why all this happens you may wonder - well there is the CAP theorem and MongoDB will choose CA from it - this is how 10Gen - makers of MongoDB decided to design the product and we have to live with it. So from the CAP MongoDB will choose CA - you can read why and how at Consistency and Availability at MongDB.

Now let's pause for a minute and see how the overall database cluster will look with a Replica Set:

1ZfNjpswFIWfhmUQxkBgmTCZdtNN06pSN5UTHLDGYGScCeTpC4MJcQxt1ML8SAkyx9jc+/lwDQYM0/ITR3nyhUWYGrYVlQZ8MGzbt6z62AjVjRBzErUSkMKRRLhQJMEYFSRXxT3LMrwXinZgVJ0sRzHWhO0eUV39QSKRyOBsr9c/YxIn3W2AF7Q9hai6OSJ8QEcqFi+S3XaXlhwQyOsrKdhO0E2dKSGcGUsVgeOCnNUwD0RNd8d4hLkiUZI9XXOCGwOGnDHRttIyxLRZmA56uwqPI70XPBxn4p4BMv1nRI8y9K84Jiyrte/bxWa1/WbIGa4JFieSUpRhORhzgcvRAPq0aq9hlmLBq4a4DKDzlfSZ7crzU7+8DpRacrW0l4FI4osvc/f51g2Z8ggvPX892QTlTTMt4+ZJMdGpMGPOjnlhomdEKNoRSkT168zuBGL/EYjjKzwGcPi2TgOACWj4/wYjQgLtUIFNHtU3Xx8IpSGjrPU59B7tB6++3boQnD3hriebhpbnmK7Ca+l2wrWBAp2YG0xgH/ju7AOWzl/9483kH+B8PAOBQK0/r+sf9935B97U4yH/DFTjafzjfTz/QOC9ZQWCYACZNfhb/awPi/oPVhrVOn8xRTWGNyyAp6NwgY6ie3/7LxRDe/kIgfVcBG7LyRAAx50JwNBuNAIgnAsABMs7PDAbAn0HSlkWs4UFZvO8uuBOMPD0e0Pvr1Okq+8fMl29cE5lcH/5hvnqG4TMF87n59db3/q0//J76bv6Noeb3w==

However having all instances into one single region doesn't look too good in case of a total failure on that region is always better to have an instance outside it - in this case I choose US-WEST - California.

1ZhRj6IwEMc/DY8a2gLio7ru3cu9nHfZ5F4uVSo2W6gpdUU//cFSlNLibgyYvUQJnUo7/19nplQHLZL8m8D73Q8eEeZAN8od9ORAGLpucS0Np5YhFjSqTEAZDjQimWaSnDNJ97pxw9OUbKRm23KmD7bHMTEMqw1mpvWFRnKnnIPB1f6d0HhXTwOCadWTyVM9RkS2+MDk6N0Eq+7cVSN5Vfuk2tCb1iOnmgdnzhPNIEhGz7qXW6rcUOjWXEREaD9hNH1tYkJLBy0E57K6S/IFYeW61MyrkZ47ei90BEnlZx5Q6t8wOyjXf5KY8rSw/V6NlrPVL6d2vgEwO9KE4ZSoh4mQJO904CqrCDXCEyLFqQSuePuKt/LHV7Mdr4sLJ8q2aywsrMMRK3rxZeir3OJGKe7AZco3te7wvrxN8rjMkzE+ZuNY8MM+G+M3TBleU0bl6e+Zf5IHvMkDgbFOxAIkhCYPAHrgEd6HI8ISr3FGxiIqJp9vKWMLzngV6Ch4hk9BMd08k4K/kron7YeXF2q0Jn6Nr8GrTuEmL3/aQ/igLxc+oNZ6iR4LjzAYKH6A9/8FEAj0fJsED4sfBCy4XOtn9qe4jIovmBlEC+2yj0yaaBwuXBogfGApxEEPIGx1uEP/fCj97UAo2rZa4g+EwEydhKcxH7lgsAVHra3Gm1oEB7bNtw/Bfpdgs2b0tsKwVRwfphaYaoffKm6/eXmhq9FArnWvsADpZ68IvuJe8QGyadBKGQQfuF3YXjc6quTi/hz6CMGkXTXAQ+ukGTaqbKDhJBuJ8rgqab6SNw9oxfVlWZ7R2onT3wHNqx+5dUJzbfLvOKEVzevZ972v8ecEWv4D

This is how we are doing so far in respect to initial requirements:

  • has to be resilient to failures - Replica Set will provide that
  • if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours. Having a member of the Replica Set into US-WEST will fulfill it.

Well how about the rest of the requirements ?!

From the infrastructure point of view there is only one more requirement that you can satisfy - the place to grow. This can be solved initially by vertically scaling the instance sizes but you can upscale up to the biggest instance ... and then what else you can do is to add what MongoDB calls Shards, basically you will have to partition the database. This may sound very scary but MongoDB makes it quite easy. It will automatically split the data based on a key(s). You can provide the key yourself or let MongoDB use the _id - this is a special object that will be provided and is called Object identifier for every document stored.

Alright ! - we are doing quite ok so far - from infrastructure point of view we solved all the problems that the business asked, but is not over yet - what about backups ? You can't run a site without it ! First let's pause and think about what the application will be doing in the first place with the actual setup. Based on the capabilities that MongoDB offers will write to the PRIMARY and will read from the SECONDARY ... well that means that if the application servers are hosted into region US-EAST than will have to go over the wire to read from the servers hosted in US-WEST ?!. Not a very good idea ... but there is hope, MongoDB has a an option when you create a Replica Set which says that a specific member of the replica will be hidden, meaning will replicate data but will not make itself available for any reads or writes.

With this in place we can actually have servers into a remote region, US-WEST that will just replicate the data but never been actually used by the application servers. Well this is the best candidate for backups from all other members!

The final infrastructure diagram including two Replica Sets, two Shards and the backups looks like this:

3Vtbb+I4FP41fWwUX3LhsfSy+zAjVe2uRtqXkSEuWA0xSkwL/fXrQAIkx1Ca2ikdiVZgJ3bOd26fj50Lcj1b/pWz+fSnTHh6gf1keUFuLjCOfV//LxtWrYZJLpJNE6oaFiLhRaNJSZkqMW82jmWW8bFqtD3JtDnYnE04aHgcsxS2/hKJmlatKBzsOv7mYjKt5olxuOko1KoeI+FPbJGqy3UT3nQv/U0frS5fVb/Jdtys8QBvUs4aDTkvxFvzIZ9E9RAVciOZJzxvXJKK7HkfJXJ7Qa5zKdXm22x5zdNSLTXkm5HuDvRuwcl5pk65oRL+haWL6tF/ymwib4a68TpdFEo/bhu94lXMUpbx6laeK748OP1OKG1nXM64ylcl2pveiFAPBxXg1Q1+5FUtrzv94vqq6Z5qSVjhyioEJ9sJdiLrL5XUByCDEECJp2xefp0tJ6WreOy18Ca5XMwLj70wkbKRSIVa/X6TJ6KCj6JCkNfEJPABIDGGeCBkAY+4GxwJU2zECu7liZ58+CTS9FqmcmPsJLzDN6GeblioXD7zuiezgxeNG2hFgcGA6ADiFQwsmA85O/NBiJxgP6Ej+0H0+xkQwoMWYr3aEEEGyHzj5+o//e9S/6ErgKqWX9nwpqgZj0MIRIAgEHWa/RQQplh8QP6hK/mhMZggoIZ8ZAUC6D6zMiNf+kh/XMl8Ssapk611iYMjEsPYYUvLiL4v8i4K2Ba6drJzYh3Ui5qIYDSAecMV76DfkHgEqM1eMaU9Jo563LMiH+FJZuSKftRe/K3oB8EAs17tqM7n50FAoFcRDC3IFQcJzoKDwGBswsAVCQngoqZKydglCTEkoNjgBa6ISHCQemGXRASjLxYb8i8dIPNEN7nTdRy02BcmJpFDV/wrgGxjK7QzVcc+KHbF8bapF7kJlLvfgh/129YeGIg3JiZj9y3wAwStHcprmzJtUD8cAqL2UsRUfYgNkNip2ITdIHFLmd7BrF4n15sTCODlrFoD4dqtV0n32GHFSKLQM1XKrSxZTcToqz2HEBhO+vMc2rH4+6WegwKQhnRi6nHReoxmOfQf0qYcATHl3shV5oW2MhVJwjPdNuOzEc+Li3qf0IH4ISDXKDRyj61jWI8fbnjmO3LXT36UZiJnZT6YK2zQzPdMnWJQl+pX7nqtsif3kI2fdR4oB85K+W8eust/nGPGPiinBBhyTG3onitnD2F59z4XM7Z+zEehuDPZEWmrPjK6eeSofFLXh/cll7K84ur+vhSe5y/rUHdojUGGU5kLTRAU03Pc+BZAoWC1aT5kgILYyJ6snDMIO9a3x3I2Xyju8TGGVOHublAey+lMFTbaOlyJi0EcwQC1wIBYYMOHOha3zwuw2LB76giwqCNBPy/AkKEc4Ayxjvz9zHwSGYmsM9DoHwFa3J+ZDTpGsoyrV5k/i2zi8ZQVSox/p5Ilv0dMZ8pxeZCwDSMOS1bXHcbjeRSHoGobmPJoEBlwjD6PI/JP8FeeTPhj9VPLJdTqQYOnhMxutz3re8oLOwJBMG4DgbBxMeWbSg/H6pjVnPdS6Ec5NmG8m7Aep5CLfMyrW3ewgtFo3B4N1YvyeijF8glXYKi1hrZ4nKi0E+KFRaXpcdcw7LveBxSJBruFcj+K3Jvw44r0G2NR2nogm2o8oUrch++FsOSHie8Rh0oLaTvuEZ929r4IFKJKm2uPZlNzJ5w16kdzhp1O7MVONQeMhfixF9rTHcVOdWc4TftDUwDdMqxIAFxKWyolYGj12DfHRuqqjoRO2ZtieS5fP22dCIiLB+4ME/kHZ/uoTUI9gaFsGuQHyhhsrEr6OSzNUIxZ+oONeHovC1HGFd0/kkqVL9FsL7hKxaTsUHJuwYBpe8cMRcZ9DoNGrbxYUh97+45Q4cB4/swVVBjWi78NVGSPkPSCFUwJD3xSyo79fx8vb68e/4HY2TvBsRVs1ZRzv7Zs2nCldoSHm0j7wv+67Vf4wLCvEMUHTo1/XH79c/cq4CZa717VJLf/Aw==

Now let's switch roles and have the developer hat on. You've been told about all this setup and you may think all you have to do is drop the code on the application servers and you are done, but wait a minute what will happen if a full replica set will be unavailable ? Well this is the part called design for failure.

Let's assume the application has a few a entities, one of them will be the User. In a typical SQL world you would have a table that has something like user_id, first_name and so on. Then all other tables will be linked to this table with Foreign Primary Key on user_id. If we would try to replicate this scenario the schema will look like this:

    

    /* users collection */
    > db.users.find().pretty()
    {
            "_id" : ObjectId("5107fb6736141503d37b6a31"),
            "username" : "johnd",
            "password" : "bc7a0154948baa69ecbe1d7843b25113fc5f3f20",
            "first_name" : "fname"
    }
    >
    /* objects collection - linking back to users via user_id */
    > db.objects.find().pretty()
    {
            "_id" : ObjectId("5107fc7536141503d37b6a32"),
            "user_id" : ObjectId("5107fb6736141503d37b6a31"),
            "data" : "all the goodies you need"
    }
    >

    

What can go wrong with this schema ?! Well, let's say we will shard on the users._id key, than MongoDB will split data based on the chunk size as need it and distribute it accordingly. On the objects collection we will have different options to shard, we can use the objects._id or objects.user_id etc. However if user X is located on shard01 and most of his entries (if not all) are located on shard02 than if shard02 is down the user will be without entries! If shard01 is down the user can't even use the system. So what will be a better approach? Locality of the data, have the user collection contain the objects collection. So this will look as:



    /* users collection embeds the objects collection */
    > db.users.find( {"username": "bobc"}).pretty()
    {
            "_id" : ObjectId("5107fe2936141503d37b6a33"),
            "username" : "bobc",
            "password" : "ad7a0154948baa69ecbe1d7843b25113fc5f3f20",
            "first_name" : "fname",
            "data" : {
                    "key" : "value"
            }
    }


With this schema if any of the shards are down than the user can NOT use the system but in case it can use the system his data will be consistent.

The process of choosing the 'right' sharding key is very tricky, has a few constraints from the MongoDB part as well it depends on your data structure and requirements. For more info see Shard keys for MongoDB.

Final review of the total requirements:

  • has to be resilient to failures
  • MongoDB ability to function into a Replica Set.
  • has to be responsive all the time
  • MongoDB Replica Set, write to Primary and read from Secondary. In the case you need more capacity there are two options, upscale of instance size and sharding.
  • will need to support an initial high volume of users with the option to grow
  • Again Sharding and Replica Sets will fulfill.
  • balanced as 70% reads and 30% writes for the database traffic
  • For the writes you have the option to add more shards, for reads you can add more servers to the existing Replica Sets, I choose three servers but nobody stops you from adding five for example.
  • to be cost effective (this is always relative to the business's budget)
  • Considering all other requirements having three members per Replica Set will be the minimum to have.
  • because of the above (cost) constraint the business agree that if a catastrophic failures happens into a region will be ok to have downtime but not ok to not be able to bring the site up somewhere else in a manner of a few hours.
  • In the case of total failure of Primary Site from region US-EAST you can still have the data in US-WEST. If your infrastructure is automated it will take no time to re-create all three architecture layers.
  • Backups of the data
  • The nodes from US-WEST are the perfect candidate for this. Use EBS snapshots and ship data to S3. With these snapshots you can recover even if all nodes (including US-WEST) are down.

Tuesday, October 30, 2012

Ec2 (aws) - delete snapshots

Ec2 snapshots are a way to make backups of your data into the amazon cloud. To do snapshots you will need the ec2-api-tools, your access key and secret or the x509 certificates for your aws account. Obviously after you snapshot you will need eventually to delete snapshots that you don't need anymore. This example shows how to use the ec2-api-tools into a shell to delete snapshots that are not part of the current month. You can have a cronjob that runs every last day of the month, this will give you almost 30 days of snapshots.
# describe snapshots and sort by date
ec2-describe-snapshots -C cert.pem  -K key.pem | sort -k 5

# delete all but current month (not the last 30 days)
ec2-describe-snapshots -C cert.pem  -K key.pem | grep -v $(date +%Y-%M-) |  awk '{print $2}' | xargs -n 1 -t ec2-delete-snapshot -K key.pem -C cert.pem

Monday, December 12, 2011

EC2 raid10 for mongo db

Running mongo db on a raid10(software raid) into ec2 is done via the ebs volumes. I'll show you how to

  • create the raid10 on 8 ebs volumes
  • (re) start the mdadm on the raid device
  • mount the raid10 device and start using
I didn't use any config files for the raid devices so you will need to know how the devices are mapped and what uuid has the raid10.

Initial Creation of the raid

# you will need to have your ebs volumes attached to the server
mdadm --create --verbose /dev/md0 --level=10 --raid-devices=8 /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq

# now create a file system 
mkfs.xfs /dev/md0

#mount the drive
mount /dev/md0 /mnt/mongo/data


# Obtain information about the array
mdadm --detail /dev/md0 # query detail

/dev/md0:
        Version : 0.90
  Creation Time : Wed Oct 26 19:37:16 2011
     Raid Level : raid10
     Array Size : 104857344 (100.00 GiB 107.37 GB)
  Used Dev Size : 26214336 (25.00 GiB 26.84 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Dec 12 15:56:48 2011
          State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 64K

           UUID : 144894cd:3b083374:1fa88d23:e4200572
         Events : 0.30

    Number   Major   Minor   RaidDevice State
       0       8      144        0      active sync   /dev/sdj
       1       8      160        1      active sync   /dev/sdk
       2       8      176        2      active sync   /dev/sdl
       3       8      192        3      active sync   /dev/sdm
       4       8      208        4      active sync   /dev/sdn
       5       8      224        5      active sync   /dev/sdo
       6       8      240        6      active sync   /dev/sdp
       7      65        0        7      active sync   /dev/sdq

# note the UUID and the devices
# Start the mongo database
/etc/init.d/mongod start

Shutdown(reboot) the server 

# restart the array device - you need to have the ebs volumes re-attached!
mdadm -Av /dev/md0 --uuid=144894cd:3b083374:1fa88d23:e4200572  /dev/sd*
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: /dev/sdc has wrong uuid.
mdadm: cannot open device /dev/sdr: Device or resource busy
mdadm: /dev/sdr has wrong uuid.
mdadm: cannot open device /dev/sds: Device or resource busy
mdadm: /dev/sds has wrong uuid.
mdadm: /dev/sdj is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdk is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdl is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdm is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdn is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdo is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdp is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdq is identified as a member of /dev/md0, slot 7.
mdadm: added /dev/sdk to /dev/md0 as 1
mdadm: added /dev/sdl to /dev/md0 as 2
mdadm: added /dev/sdm to /dev/md0 as 3
mdadm: added /dev/sdn to /dev/md0 as 4
mdadm: added /dev/sdo to /dev/md0 as 5
mdadm: added /dev/sdp to /dev/md0 as 6
mdadm: added /dev/sdq to /dev/md0 as 7
mdadm: added /dev/sdj to /dev/md0 as 0
mdadm: /dev/md0 has been started with 8 drives.

# now you can mount the array
mount /dev/md0 /mnt/mongo/data/

# start the mongo database
/etc/init.d/mongod start

Thursday, November 3, 2011

Howto create an AMI from a running instance into Ec2 cli

In order to create an ami from an EC2 running instance you will need.

  • certificate file from your aws account credentials
  • private key for the cerificate file from your aws account credentials(you can download this only at certificate creation)
  • access by ssh to your running instance
  • access key for AWS
  • access secret key for AWS
  • any ec2 tools - I used amitools

# create the bundle under /mnt
ec2-bundle-vol -d /mnt -k /root/key.pem -c /root/cer.pem -u xxxxxxxxxxxx
# xxxxxxxxxxxx is your account number without dashes
ec2-upload-bundle -b YOURBUCKET -m /mnt/image.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_ACCESS_SECRET_KEY
# register the ami so is available 
ec2-register -K /root/key.pem -C /root/cer.pem -n SERVER_NAME YOURBUCKET/image.manifest.xml
# this will respond with something like 
IMAGE   ami-xxxxxxxx

# At this point you can go into the aws console and boot a new instance from the ami you registered.<br />
# to deregister the ami 
ec2-deregister  ami-xxxxxxxx

Tuesday, September 20, 2011

Ec2 metadata

In case that you are looking for more info while you are into a ec2 instance you can call
from within the instance the api metadata server from ec2.

$ curl http://169.254.169.254/latest/meta-data/
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
hostname
instance-action
instance-id
instance-type
kernel-id
local-hostname
local-ipv4
mac
network/
placement/
profile
public-hostname
public-ipv4
public-keys/
ramdisk-id
reservation-id