Automating Backups in Public Cloud using Cloud Files

Hey folks, I know it’s been a little while since I put an article together. However I have been putting together a really article explaining how to write bespoke backup systems for the Rackspace Community. It’s a proof of concept/demonstration/tutorial as opposed to a production application. However people looking to create custom cloud backup scripts may benefit from the experience of reading thru it.

You can see the article at the below URL:

https://community.rackspace.com/products/f/25/t/7857

Checking File integrity with Cloud Files, post upload file

So, as you may already be aware, I am working on a lightweight backup script called obscene redundancy’. An redundant backup software capable of 18 replicas of data to Rackspace Cloud Files API service. It’s so redundant… it’s obscene redundancy.

For more details visit the project URL:
https://github.com/aziouk/obsceneredundancy/

Today, I was discussing with my colleague, that it was all very well uploading your tar to cloud files, but, wouldn’t you really like to know if the file you uploaded is completely identical number of bits, and order? Enter, Cloud Files ‘HEAD’and Etag. Our MD5 friend.

What I did to improve the obscene redundancy script was quite simple here:

# We define a variable that takes the 'Etag' (MD5Sum) value for the cloud files archive
cfmd5sum=$(swiftly --conf swiftly-configs/swiftly-${SHORT_REGION,,}.conf head
"${BACKUP_DEST}/${FILE}" | grep -i Etag | awk '{print $2}')

# We Define a variable that generates an 'MD5Sum' for the local file archive
localmd5sum=$(md5sum "$BACKUP_DIR"/"$FILE")

echo "Checking Data integrity of Cloud Files upload to $REGION"
echo "Cloud Files Archive MD5:  $cfmd5sum  ....... Local File Archive MD5: $localmd5sum"

# If these values
if [[ "$cfmd5sum" -ne "$localmd5sum" ]];
then
echo "VALUES NOT EQUAL"
echo "$REGION CRC OK..."
else
echo "VALEUS EQUAL
echo "$REGION CRC missing, in error, or NOT OK..."
fi

After all this I found that the script wasn’t working properly… so I did some debugging about this to check, at least, first of all , the length of each variable.

   if [[ "$cfmd5sum" == "$localmd5sum" ]]; then
                        echo "VALUES EQUAL, (local md5sum length given first)"
                        echo "$localmd5sum"| wc -L
                        echo "$cfmd5sum"| wc -L


                        echo "$REGION CRC OK..."
                else
                        echo "VALUES NOT EQUAL"
                        echo "$localmd5sum"|wc -L
                        echo "$cfmd5sum"|wc -L
                        echo "$REGION CRC missing, in error, or NOT OK..."
                fi

The output shown me that the variable length was different. At this stage I’ve no idea why, but will add updates here. I’m going to commit this to obsceneredundancy because proof of concept is working and valid, as shown by the output of the script. (i.e. the method is fine, it’s just the way the string is compared in the if, statement, I suspect it is to do with special character or \n characters as I had before. So, when I made this addition to the multi-dc-backup.sh script.. the output now looks like:

Creating Container in LON for obsceneredundancy

LON: Backing up ...
Source: /var/www/ ---> Dest: cloudfiles://LON/obsceneredundancy/varwww-2016-07-06-6bd657e9-d268-4883-9f40-3859f690aadb.tar.gz

Checking Data integrity of Cloud Files upload to BACKUP_TO_LON
Cloud Files Archive MD5:  65147eb66f8bbeff03a229570b0a1be7  ....... Local File Archive MD5: 65147eb66f8bbeff03a229570b0a1be7  /var/backup/varwww-2016-07-06-6bd657e9-d268-4883-9f40-3859f690aadb.tar.gz
VALUES NOT EQUAL
107
32
BACKUP_TO_LON CRC missing, in error, or NOT OK...
lon: COMPLETED OK 15504796/15504796
ORD: Not backing up ...



Creating Container in IAD for obsceneredundancy

IAD: Backing up ...
Source: /var/www/ ---> Dest: cloudfiles://IAD/obsceneredundancy/varwww-2016-07-06-6bd657e9-d268-4883-9f40-3859f690aadb.tar.gz

Checking Data integrity of Cloud Files upload to BACKUP_TO_IAD
Cloud Files Archive MD5:  65147eb66f8bbeff03a229570b0a1be7  ....... Local File Archive MD5: 65147eb66f8bbeff03a229570b0a1be7  /var/backup/varwww-2016-07-06-6bd657e9-d268-4883-9f40-3859f690aadb.tar.gz
VALUES NOT EQUAL
107
32
BACKUP_TO_IAD CRC missing, in error, or NOT OK...
iad: COMPLETED OK 15504796/15504796
DFW: Not backing up ...

As we can see the 107 (localmd5size) and the 32 (cloudfilesmd5size) are different! I’ve no idea why, since when echoing the variables they look the same. I suspect gremlins and Trolls. A fresh head tomorrow will probably solve this in a few minutes!

Cheers &
Best wishes,
Adam

Obscene Redundancy utilizing Rackspace Cloud Files

So, you may have noticed over the past weeks and months I have been a little bit quieter about the articles I have been writing. Mainly because I’ve been working on a new github project, which, although simple, and lightweight is actually really rather outrageously powerful.

https://github.com/aziouk/obsceneredundancy

Imagine being able to take 15+ redundant replica copies of your files, across 5 or 6 different datacentres. Rackspace Cloud Files API powered, but also with a lot of the flexibility of Bourne Again Shell (BASH).

This was actually quite a neat achievement and I am pleased with the results. There are still some limitations of this redundant replica application, and there are a few bugs, but it is a great proof of concept which shows what you can do with the API both quickly and cheaply (ish). Using filesystems as a service will be the future with some further innovation on the world wide network infrastructure, and it would only take a small breakthrough to rapidly alter the way that OS and machines boot/backup.

If you want to see the project and read the source code before I lay out and describe/explain the entire process of writing this software as well as how to deploy it with cron on linux, then you need wait no longer. Revision 1 alpha is now tested, ready and working in 5 different datacentres.

You can actually toggle which datacentres you wish to utilize as well, it is slightly flexible. The only important consideration here is to understand that there are some limitations such as a lack of de-duping, and this uses tar’s and swiftly, instead of directly querying the API. Since directly uploading thru the API a tar file is relatively simple, I will probably implement it like that as I have before and get rid of swiftly in future iterations, however such a project is really ideal for learning more about BASH , CRON, API and programmatic automation of and sequential filesystems utilizing functional programming and division of labour between workers,

https://github.com/aziouk/obsceneredundancy

Test it (please note it will be a little bit buggy on different environments and there is no instructions yet)

git clone https://github.com/aziouk/obsceneredundancy

Cheers &

Best wishes,
Adam

Downloading / Backing up all Rackspace Cloud Files

Here’s a quick and dirty way to download your entire Rackspace Cloud Files container. This comes up a lot at work.

INSTALLING SWIFTLY

# Debian / Ubuntu systems
apt-get install python-pip
# CentOS and Redhat Systems
yum install python-pip
pip install swiftly

Once you have installed swiftly, you will want to configure your swiftly client. This is also relatively easy.

CONFIGURING SWIFTLY

# create a file in your ‘home’ environment. Using ~ is the root users directory
# if logged in as root on a unix server

touch ~/.swiftly.conf 

You will want to edit the file above

pico ~/.swiftly.conf 

The file needs to look exactly like the text below:

[swiftly]
auth_user = yourmycloudusername
auth_key = yourapikey
auth_url = https://identity.api.rackspacecloud.com/v2.0
region = LON

To save in pico you type CTRL + O

You have now installed swiftly, and configured swiftly. You should then be able to simply run the command:

Running swiftly to download all containers/files on Rackspace Cloud Files

swiftly get --all-objects --output=mycloudfiles/

This comes up a lot, I am sure that some people out there will appreciate this!

Using Rackspace Cloud Files, swiftly and cron to Backup Data to multiple data-centres cheaply

So, you have some really important data, so much so that 99.99% redundancy is not enough for you. One solution to this is to use multiple copies in multiple datacentres. Most enterprise backup will have on-site, an off-site, and an archival copy. What I’m going to show here is how to make 4 different copies of your data, in 4 different datacentres around the world. This will provide a very high redundancy of storage, and greatly reduce the likelihood of data loss. Although it costs a bit more, this kind of solution may be suitable for many small, medium and large businesses. Naturally, depending on the size of the data, and the importance of redundancy. You might not have many files to backup, perhaps a small cd worth.. it will be very inexpensive if you have a small backup to make. However, due to the way that cloud files is billed, copying data to cloud files costs money in bandwidth when writing from a server in London to a cloud files in Sydney, Chicago or Dallas for instance, so it’s very important to consider the impact of bandwidth costs when utilizing an additional 3 cloud files endpoints that are not in the local datacentre region. Which, is essentially what we are doing in this guide.

Setup swiftly

yum install python-devel python-pip -y
pip install swiftly eventlet 

Create your swiftly environments (setting the name for each file)

==> /root/.swiftly-dfw.conf <==
[swiftly]
auth_user = myusername
auth_key = censored
auth_url = https://identity.api.rackspacecloud.com/v2.0
region = dfw

==> /root/.swiftly-iad.conf <==
[swiftly]
auth_user = myusername
auth_key = censored
auth_url = https://identity.api.rackspacecloud.com/v2.0
region = iad

==> /root/.swiftly-ord.conf <==
[swiftly]
auth_user = myusername
auth_key = censored
auth_url = https://identity.api.rackspacecloud.com/v2.0
region = ord

==> /root/.swiftly-syd.conf <==
[swiftly]
auth_user = myusername
auth_key = censored
auth_url = https://identity.api.rackspacecloud.com/v2.0
region = syd

Create your Script

# Adam Bull
# Adam Bull, Rackspace UK
# May 17, 2016


# This can be sequential or, it can be parallel, not sure which is better yet use & for parallel
# This backs up /documents file and puts it in the 'managed_backup' cloud files container at the following 4 datacentres ,DFW, IAD, ORD and SYD

swiftly --verbose --conf ~/.swiftly-dfw.conf --concurrency 100 put -i /documents /managed_backup
swiftly --verbose --no-snet --conf ~/.swiftly-iad.conf --concurrency 100 put -i /documents /managed_backup
swiftly --verbose --no-snet --conf ~/.swiftly-ord.conf --concurrency 100 put -i /documents /managed_backup
swiftly --verbose --no-snet --conf ~/.swiftly-syd.conf --concurrency 100 put -i /documents /managed_backup

Because the other 3 endpoints are in different datacentres, we can't use servicenet, so we defined --no-snet option for swiftly as above.

Execute your script

chmod +x multibackup.sh
./multibackup.sh

This obviously is a basic system and script of taking backups, and it is not for production use (yet). This is an alpha project I started today. The cool thing is that it works, and quite nicely. Although it is far from finished yet as a workable script.

Once the script is made, you can simply add it to crontab -e as you would usually. Make sure the user you execute with cron has access to the .conf files in their home directory!

Delete All Cloud Backup from Cloud Files

Please note that by performing the below commands the effect can be destructive.

TAKE CAUTION WHEN USING THIS COMMAND IT CAN DELETE EVERYTHING IF YOU DO SOMETHING WRONG!!!!

# swiftly --verbose --eventlet --concurrency=100 for "" --prefix z_DO_NOT_DELETE --output-names do delete "" --recursive --until-empty

This particularly command *should* only remove the cloud files directories starting with z_DO_NOT_DELETE. I have tested it and it appears to work correctly.