Reverting a yum transaction & controlling auto-updates for yum-cron with excludes

Hey, so a customer was running yum-cron, the Redhat version of Canoninical’s unattended upgrades. An auto update for a package ran, which actually broke their backend.

# yum history info 172
Loaded plugins: replace, rhnplugin
This system is receiving updates from RHN Classic or Red Hat Satellite.
Transaction ID : 172
Begin time     : Sat May  6 05:45:29 2017
Begin rpmdb    : 742:11fcb243cc5701b9d2293d90cb4161e5edc34bb8
End time       :            05:45:31 2017 (2 seconds)
End rpmdb      : 742:0afbbf517315f18985b2d01d4c2e5250caf0afb5
User           : root <root>
Return-Code    : Success
Transaction performed with:
    Installed     rpm-4.11.3-21.el7.x86_64                @rhel-x86_64-server-7
    Installed     yum-3.4.3-150.el7.noarch                @rhel-x86_64-server-7
    Installed     yum-metadata-parser-1.1.4-10.el7.x86_64 @anaconda/7.1
    Installed     yum-rhn-plugin-2.0.1-6.1.el7_3.noarch   @rhel-x86_64-server-7
Packages Altered:
    Updated php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
    Update                     2.5.3-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
history info

The customer asked us to reverse the transaction, so I did, this was quite simple to do;

[root@web user]# yum history undo 172
Loaded plugins: replace, rhnplugin
This system is receiving updates from RHN Classic or Red Hat Satellite.
Undoing transaction 172, from Sat May  6 05:45:29 2017
    Updated php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
    Update                     2.5.3-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
Resolving Dependencies
--> Running transaction check
---> Package php56u-pecl-xdebug.x86_64 0:2.5.1-1.ius.el7 will be a downgrade
---> Package php56u-pecl-xdebug.x86_64 0:2.5.3-1.ius.el7 will be erased
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================================================================================
 Package                                                  Arch                                         Version                                               Repository                                                                Size
============================================================================================================================================================================================================================================
Downgrading:
 php56u-pecl-xdebug                                       x86_64                                       2.5.1-1.ius.el7                                       rackspace-rhel-x86_64-server-7-ius                                       205 k

Transaction Summary
============================================================================================================================================================================================================================================
Downgrade  1 Package

Total download size: 205 k
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64.rpm                                                                                                                                                                        | 205 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64                                                                                                                                                                                1/2
  Cleanup    : php56u-pecl-xdebug-2.5.3-1.ius.el7.x86_64                                                                                                                                                                                2/2
  Verifying  : php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64                                                                                                                                                                                1/2
  Verifying  : php56u-pecl-xdebug-2.5.3-1.ius.el7.x86_64                                                                                                                                                                                2/2

Removed:
  php56u-pecl-xdebug.x86_64 0:2.5.3-1.ius.el7

Installed:
  php56u-pecl-xdebug.x86_64 0:2.5.1-1.ius.el7

Complete!

In CentOS 7 and I believe RedHat RHEL 7 as well, if you don’t want to disable yum-cron altogether by running a yum remove yum-cron, you could exclude the specific package, or use wildcards to exclude all of them like php* , http*, etc.

vi /etc/yum/yum-cron.conf.

If you wish to exclude some packages from auto-update mechanism, you’ll have to add an exclude line, at the bottom of the file, in the base section.

[base]
exclude = kernel* php* httpd*

etc, I hope that this is of some assist,

cheers &
Best wishes,
Adam

Magento Rewrite Woes … really woes

I had a customer this week that had some terrible rewrite woes with their magento site. They knew that a whole ton of their images were getting 404’s most likely because rewrite wasn’t getting to the correct filesystem path that the file resided. This was due to their cache being broken, and their second developer not creating proper rewrite rule.

As a sysadmin our job is not a development role, we are a support role, and in order to enable the developer to fix the problem, the developer needs to be able to see exactly what it is, enter the sysads task. I wrote this really ghetto script, which essentially hunts in the nginx error log for requests that failed with no such file, and then qualifies them by grepping for jpg file types. This is not a perfect way of doing it, however, it is really effective at identifying the broken links.

Then I have a seperate routine that strips the each of the file uri’s down to the filename, and locates the file on the filesystem, and matches the filename on the filesystem that the rewrite should be going to, as well as the incorrect path that the rewrite is presently putting the url to. See the script below:

#!/bin/bash

# Author: Adam Bull
# Company: Rackspace LTD, Hayes
# Purpose:
#          This customer has a difficulty with nginx rewriting to the incorrect file
# this script goes into the nginx error.log and finds all the images which have the broken rewrite rules
# then after it has identified the broken rewrite rule files, it searches for the correct file on the filesystem
# then correlates it with the necessary rewrite rule that is required
# this could potentially be used for in-place upgrade by developers
# to ensure that website has proper redirects in the case of bugs with the ones which exist.

# This script will effectively find all 404's and give necessary information for forming a rewrite rule, i.e. the request url, from nginx error.log vs the actual filesystem location
# on the hard disk that the request needs to go to, but is not being rewritten to file path correctly already

# that way this data could be used to create rewrite rules programmatically, potentially
# This is a work in progress


# These are used for display output
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination missing:",$7"\n"}'
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination detected missing:",$7"\n"}'

# These below are used for variable population for locating actual file paths of missing files needed to determine the proper rewrite path destination (which is missing)
# we qualify this with only *.jpg files

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg > lost.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg >> lost.txt

# FULL REQUEST URL NEEDED AS WELL
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "http://mycustomerswebsite.com",$21}' | sed 's/\"//g' | grep jpg > lostfullurl.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "http://customerwebsite.com/",$21}' | sed 's/\"//g' | grep jpg >> lostfullurl.txt

# The below section is used for finding the lost files on filesystem and pairing them together in variable pairs
# for programmatic usage for rewrite rules


while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf '\n\n'
  printf 'Found a broken link getting a 404 at : %s\n'
  printf "$f1\n"
  printf 'Locating the correct link of the file on the filesystem: %s\n'
        find /var/www/magento | grep $f2
done 3<lostfullurl.txt 4<lost.txt

I was particularly proud of the last section which uses a ‘dual loop for two input files’ in a single while statement, allowing me to achieve the descriptions above.

Output is in the form of:


Found a broken link getting a 404 at :
http://customerswebsite.com/media/catalog/product/cache/1/image/800x700/9df78eab33525d08d6e5fb8d27136e95/b/o/image-magick-file-red.jpg
Locating the correct link of the file on the filesystem:
/var/www/magento/media/catalog/product/b/o/image-magick-file-red.jpg

As you can see the path is different on the filesystem to the url that the rewrite is putting the request to, hence the 404 this customer is getting.

This could be a really useful script, and, I see no reason why the script could not generate the rewrite rules programatically from the 404 failures it finds, it could actually create rules that are necessary to fix the problem. Now, this is not an ideal fix, however the script will allow you to have an overview either to fix this properly as a developer, or as a sysadmin to patch up with new rewrite rules.

I’m really proud of this one, even though not everyone may see a use for it. There really really is, and this customer is stoked, think of it like this, how can a developer fix it if he doesn’t have a clear idea of the things that are broken, and this is the sysads job,

Cheers &
Best wishes,
Adam

Less Ghetto Log Parser for Website Hitcount/Downtime Analysis

Yesterday I created a proof of concept script, which basically goes off and identifies the hitcounts of a website, and can give a technician within a short duration of time (minutes instead of hours) exactly where hitcounts are coming from and where.

This is kind of a tradeoff, between a script that is automated, and one that is flexible.

The end goal is to provide a hitcount vs memory commit metric value. A NEW TYPE OF METRIC! HURRAH! (This is needed by the industry IMO).

And also would be nice to generate graphing and mean, average, and ranges, etc. So can provide output like ‘stat’ tool. Here is how I have progress

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/google.com-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: MARCH"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "$day/$month/$year:$i" "$logfile" | wc -l;

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "$day/$month/$year:$i:$j" "$logfile" | wc -l;
done

done

Thing is, after I wrote this, I wasn’t really happy, so I refactored it a bit more;

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/someweb.biz-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: $month"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do
hitsperhour=$(grep "$day/$month/$year:$i" "$logfile" | wc -l;);
echo "    > $day $month $year, hits this $ith hour: $hitsperhour"

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..59};
do
hitsperminute=$(grep "$day/$month/$year:$i:$j" "$logfile" | wc -l);
echo "                  >>hits at $i:$j  $hitsperminute";
done

done

Now it’s pretty leet.. well, simple. but functional. Here is what the output of the more nicely refined script; I’m really satisfied with the tabulation.

[root@822616-db1 automation]# ./list-visits.sh
Total HITS: Mar
6019301
    > 9 Mar 2017, hits this  hour: 28793
                  >>hits at 01:01  416
                  >>hits at 01:02  380
                  >>hits at 01:03  417
                  >>hits at 01:04  408
                  >>hits at 01:05  385
^C

All About NOVA and Xen Tools in Rackspace Cloud – why can’t I connect to my Windows server?

Why can’t I connect to my Rackspace Windows cloud-server, you ask? 2 important questions.

1. Is it a new build?
2. Is it using a custom image (a non rackspace base image).

(because the rackspace base images all have correct nova-agent and xen tools, so get networking information OK. But customer images don’t!). In the case you have run the below tests to see if nova-agent is running (or installed), you will need to install them.

Checking for the nova-agent and xe-guest-utilities

ps auxfwww | grep nova-agent
yum -qa xe-guest-utilities nova-agent
dpkg -l xe-guest-utilities nova-agent

Explanation and solution

Thanks for reaching out to us with your inquiry today. I’m glad to convey to you that I understand what the problem is with your cloud-server not being contactable.

Main reasons for breakage

The main reason why this is not working is most likely caused by some important pieces of software being missing. There is a piece of software called nova-agent, which is responsible for setting your cloud-servers IPV4 address, network subnet/mask, and ip routes, when it is first built. This is important, since the server image you built the server from, has different network details.

The rackspace build process giving networking detail to the VM is completely dependent on xe-guest-utilities and nova-agent

What has happened in this case, because the nova-agent wasn’t running on the cloud-server, the hypervisor software Rackspace use to automate cloud-server builds wasn’t able to contact the nova-agent running on your cloud-server, and therefore nova-agent wasn’t able to update the networking information. And hence, your not able to connect to it on it’s IPv4 address you are given at build time.

The steps to resolution: installing nova-agent and xen guest utilities
As such, nova-agent needs to be installed on the cloud-server you take the image from, it can be installed as follows:

https://community.rackspace.com/products/f/25/t/5694

Also nova-agent uses another piece of important software called xe-guest-utilities, or (Xen Tools) for your windows servers, this is an important ‘PV’ paravirtualization tools, responsible for seamless management of cloud-servers. Sorry that in this case it’s not working out seamlessly, but this can happen with images taken of servers which have had nova-agent disabled, uninstalled, or similar.

Upgrading the tools that nova-agent depends upon, can be installed by following the instructions at the following location:

https://support.rackspace.com/how-to/upgrade-citrix-xen-server-tools-for-windows-cloud-servers/

# Options of how to do this / Summary of Solution Steps

Naturally, you might be wondering how to achieve these changes, if you cannot RDP to the server. This is quite understandable, there are two ways to get this working;

Option 1) Manually install nova-agent on the current server you cannot access, then manually install the Xen Tools in the same way. This shall fix the OS on the server itself, and not the original image you built the server from. So it is important to create a new cloud-server image after performing these steps and us verifying tools + nova-agent installed correctly.

2) Manually install nova-agent on the source server you initially taken the image from, and install Xen Tools, then re-image the server, and then re-deploy. This should seamlesssly work each time on build with that image, provided the tools are installed. You will not need to recreate the image, since your fixing the problem on the cloud-server source that the original image was taken from.

I appreciate that these things are not 100% simple to get your head around and can be confusing for customers, I hope my explanation and summary makes this a little more painless to fix. Of course if you have additional questions, comments or concerns or don’t understand something I’ve said, please don’t hesitate to reach out to us, we are here to help!

Adding nodes and Updating nodes behind a Cloud Load Balancer

I have succeeded in putting together a basic script documenting exactly how API works and for adding node(s), listing the nods behind the LB, as well as updating the nodes (such as DRAINING, DISABLED, ENABLED).

Use update node to set one of your nodes to gracefully drain (not accept new connections, wait for present connections to die). Naturally, you will want to put the secondary server in behind the load balancer first, with addnode.sh.

Once new node is added as enabled, set the old node to ‘DRAINING’. This will gracefully switch over the server.

# List Load Balancers

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='10017858'

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: application/json" |  python -mjson.tool | grep -A5 token | grep id | cut -d '"' -f4`



curl -v -H "X-Auth-Token: $TOKEN" -H "content-type: application/json" -X GET "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID"

#

# Add Node(s) addnode.sh

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='10017858'

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: application/json" |  python -mjson.tool | grep -A5 token | grep id | cut -d '"' -f4`

# Add Node
curl -v -H "X-Auth-Token: $TOKEN" -d @addnode.json -H "content-type: application/json" -X POST "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID/nodes"



## 

For the addnode script you require a file, called addnode.json
that file must contain the snet ip's you wish to add

#
# addnode.json

{"nodes": [
        {
            "address": "10.0.0.1",
            "port": 80,
            "condition": "ENABLED",
            "type":"PRIMARY"
        }
    ]
}

##

##

# updatenode.sh

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='100101010'
NODE_ID=719425

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: applic

# Update Node

curl -v -H "X-Auth-Token: $TOKEN" -d @updatenode.json -H "content-type: application/json" -X PUT "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID/nodes/$NODE_ID"

##

##

## updatenode.json

{"node":{
            "condition": "DISABLED",
            "type":"PRIMARY"
        }
}

Naturally, you will be able to change condition to ENABLED, DISABLED, or DRAINING.

I recommend to use DRAINING, since it will gracefully remove the cloud-server, and any existing connections will be waited on, before removing the server from LB.

Automating Backups in Public Cloud using Cloud Files

Hey folks, I know it’s been a little while since I put an article together. However I have been putting together a really article explaining how to write bespoke backup systems for the Rackspace Community. It’s a proof of concept/demonstration/tutorial as opposed to a production application. However people looking to create custom cloud backup scripts may benefit from the experience of reading thru it.

You can see the article at the below URL:

https://community.rackspace.com/products/f/25/t/7857

Using Nova and Supernova to manage Firewall IP access lists, automation & more

So, a customer today reached out to us asking if Rackspace provided the entire infrastructure IP address ranges in use on cloud. The answer is, no. However, that doesn’t mean that making your firewall rules, or autoscale automation need to be painful.

In fact, Rackspace Cloud utilizes Openstack which fully supports API calls which will easily be able to provide this detail in just a few simple short steps. To do this you require nova to be installed, this is really relatively easy to install, and instructions for installing it can be found here;

https://support.rackspace.com/how-to/installing-python-novaclient-on-linux-and-mac-os/

Once you have installed nova, it’s simply a case of making sure you set these 4 lines correctly in your .bash_profile

OS_USERNAME=mycloudusernamegoeshere
OS_TENANT_NAME=yourrackspaceaccountnumbergoeshereusuallysomethinglike1010101010
OS_AUTH_SYSTEM=rackspace
OS_PASSWORD=apikeygoeshere
OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
OS_REGION_NAME=LON
OS_NO_CACHE=1
export OS_USERNAME OS_TENANT_NAME OS_AUTH_SYSTEM OS_PASSWORD OS_AUTH_URL OS_REGION_NAME OS_NO_CACHE

OS_USERNAME is your mycloud login username (normally the primary user).
OS_TENANT_NAME is your Customer ID, it’s the number that appears in the URL of your control panel link, see below picture for illustration

Screen Shot 2016-08-10 at 2.45.05 PM

OS_PASSWORD is a bit misleading, this is actually where your apikey goes , but I think it’s possible to authenticate using your control panel password too, don’t do that for security reasons.

OS_REGION_NAME is pretty self explanatory, this is simply the region that you would like to list cloud-server IP’s in or rather, the region that you wish to perform NOVA API calls.

Making the API call using the nova API wrapper

# supernova lon list --tenant 100010101 --fields accessIPv4,name
[SUPERNOVA] Running nova against lon...
+--------------------------------------+-----------------+-----------+
| ID                                   | accessIPv4      | Name      |
+--------------------------------------+-----------------+-----------+
| 7e5a7f99-60ae-4c28-b2b8              | 1.1.1.1  |  xapp      |
| 94747603-812d-4594-850b              | 1.1.1.1  |   rabbit2   |
| d5b318aa-0fa2-4269-ae00              | 1.1.1.1  |   elastic5  |
| 6c1d8d33-ae5e-44be-b9f0              | 1.1.1.1  | | elastic6  |
| 9f79a7dc-fd19-4f8f-9c26              |1.1.1.1   | | elastic3  |
| 05b1c52b-6ced-4db0-8af2              | 11.1.1.1 | | elastic1  |
| c8302366-f2f9-4c36-8f7a              | 1.1.1.1  | | app5      |
| b159cd07-8e68-49bc-83ee              | 1.1.1.1  | | app6      |
| f1f31eef-97c6-4c68-b01a              | 1.1.1.1  | | ruby1     |
| 64b7f0fd-8f2f-4d5f-8f89              | 1.1.1.1  | | build3    |
| e320c051-b5cf-473a-9f96              | 1.1.1.1  |   mysql2    |
| 4fddd022-59a8-4502-bf6e              | 1.1.1.1  | | mysql1    |
| c9ad6951-f5f9-4351-b31d              | 1.1.1.1  | | worker2   |
+--------------------------------------+-----------------+-----------+

This is pretty useful for managing autoscale permissions if you need to make sure your corporate network can be connected to from your cloud-servers when new cloud-servers with new IP are built out. considerations like this are really important when putting together a solution. The nice thing is the tools are really quite simple and flexible. If I wanted I could have pulled out detail for servicenet instead. I hope this helps make some folks lives a bit easier and works to demystify API to others that haven’t had the opportunity to use it.

You are probably wondering though, what field names can I use? a nova show will reveal this against one of your server UUID’s

# supernova lon show someuuidgoeshere
+-------------------------------------+------------------------------------------------------------------+
| Property                            | Value                                                            |
+-------------------------------------+------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                           |
| OS-EXT-SRV-ATTR:host                | censored                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | censored                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-734834278-sdfdsfds-                   |
| OS-EXT-STS:power_state              | 1                                                                |
| OS-EXT-STS:task_state               | -                                                                |
| OS-EXT-STS:vm_state                 | active                                                           |
| censorednet network                 | censored                                                     |
| accessIPv4                          | censored                                                 |
| accessIPv6                          | censored                      |
| created                             | 2015-12-11T14:12:08Z                                             |
| flavor                              | 15 GB I/O v1 (io1-15)                                            |
| hostId                              | 860...         |
| id                                  | 9f79a7dc-fd19-4f8f-9c26-72a335ed2be8                             |
| image                               | Debian 8 (Jessie) (PVHVM) (cf16c435-7bed-4dc3-b76e-57b09987866d) |
| metadata                            | {"build_config": "", "rax_service_level_automation": "Complete"} |
| name                                | elastic3                                                         |
| private network                     |                                                 |
| progress                            | 100                                                              |
| public network                      |          |
| status                              | ACTIVE                                                           |
| tenant_id                           |                                                    |
| updated                             | 2016-02-27T09:30:20Z                                             |
| user_id                             |                             |
+-------------------------------------+------------------------------------------------------------------+

I censored some of the fields.. but you can see all of the column names, so if you wanted to see metadata and progress only, with the server uuid and server name.



nova list --fields name, metadata, progress

This could be pretty handy for detecting when a process has finished building, or detecting once automation has completed. The possibilities with API are quite endless. API is certainly the future, and, there is no reason why, in the future, people won't be building and deploying websites thru API only, and some sophisticated UI wrapper like NOVA.

Admittedly, this is very far away, but that should be what the future technology will be made of, stuff like LAMBDA, serverless architecture, will be the future.

Adding some excludes for Lsyncd

A customer was having some issues with their syncing, as was shown by their inotify

Error: Terminating since out of inotify watches.
Consider increasing /proc/sys/fs/inotify/max_user_watches

Fix was quite simple, to remove other folders from sync that aren’t necessary.

Adding this line to the /etc/lsyncd.conf

excludeFrom="/etc/lsyncd-excludes.txt",

And creating the ‘excludes’ file for LsyncD, i.e. what folders you want to ignore, in this case we wanted to ignore old httpdocs.OLD backup.

# cat /etc/lsyncd-excludes.txt
somewebsite.com/httpdocs.OLD/

A shockingly simple fix.

Please note that the path in lsyncd-excludes.txt is determined by the path in lsyncd. (do not give full path, give relative path inside the parent). It was a simple fix.

Preventing /etc/resolv.conf reset on startup/boot

Today, a customer approached us after a Host Server Down complaining that, although the server is up again their website and application were down & not working. Even though the server was online and functioning correctly.

The customer discovered that the source of the issue was that there /etc/resolv.conf is blank, this means that they will not be able to resolve DNS A/PTR/CNAME record hostnames into a resolved IP. This is called hostname to IP resolution. Its means that if /etc/resolv.conf is blank and the customer uses hostnames in their calls, such a failure will break the connectivity due to failure to resolve to IP to communicate on the TCP stack.

There is actually a very simple way to prevent the /etc/resolv.conf file from being changed. But first, it’s important to understand why /etc/resolv.conf is being reset.

On All Rackspace cloud-servers there is a process called nova-agent, and when the server starts up, the /etc/resolv.conf file will be reset along with the networking configuration. This happens each time your server is restarted and is used to set new networking details, specifically if you take an image and build server on a new ip address or if your server is live-migrated to a new host, it makes sure on the next reboot it comes up with correct networking detail transparently. However this can cause some issues, such as in this case with the /etc/resolv.conf file. Fortunately there are some novel ways of preventing your /etc/resolv.conf being modified after you have added the correct nameservers you desire to it.

You can use the chattr immutable file setting to stop processes from modifying it after you have made the changes to your resolv.conf that are desired;

Set Immutable File

chattr +i /etc/resolv.conf 

Un-set Immutable File

chattr -i /etc/resolv.conf 

This /etc/resolv.conf issue is a common problem, however using immutable file flag and chattr should prevent it from being changed ever again.

Simple way to perform a body check on a website

So, I was testing with curl today and I know that it’s possible to direct to /dev/null to suppress the page. But that’s not very handy if you are checking whether html page loads, so I came up with some better body checks to use.

A Basic body check using wc -l to count the lines of the site

 time curl https://www.google.com/ > 1; echo "non zero indicates server up and served content of n lines"; cat 1 | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  79771      0 --:--:--  0:00:02 --:--:-- 79756

real	0m2.162s
user	0m0.042s
sys	0m0.126s
non zero indicates server up and served content of n lines
2134

A body check for Google analytics

$ time curl https://www.groundworkjobs.com/ > 1; echo "Checking for google analytics html elements string"; cat 1 | grep "www.google-analytics.com/analytics.js"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  76143      0 --:--:--  0:00:02 --:--:-- 76152

real	0m2.265s
user	0m0.042s
sys	0m0.133s
Checking for google analytics html elements string
				})(window,document,'script','//www.google-analytics.com/analytics.js','ga');

Such commands might be useful when troubleshooting a cluster for instance, where one server shows more up to date versions, (different number of lines). There’s probably better way to do this with ls and awk and use the html filesize, since number of lines wouldn’t be so accurate.

Check Filesize from request

$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  79467      0 --:--:--  0:00:02 --:--:-- 79461

real	0m2.170s
user	0m0.048s
sys	0m0.111s
Page size is: 171876 kB

Pretty simple.. but you could take the oneliner even further… populate a variable called $var with the filesize using ls and awk , and then use an if statement to check that var is not 0, indicating the page is answering positively, or alternatively not answering at all.

Check Filesize and populate a variable with the filesize, then validate variable

$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"; if [ "$var" -gt 0 ] ; then echo "The filesize was greater than 0, which indicates box is up but may be giving an error page"; fi
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  78915      0 --:--:--  0:00:02 --:--:-- 78950

real	0m2.185s
user	0m0.041s
sys	0m0.132s
Page size is: 171876 kB
The filesize was greater than 0, which indicates box is up but may be giving an error page

The second exercise is not particularly useful or practical as a means of testing, since if the site was timing out the script would take ages to reply and make the whole test pointless, but as a learning exercise being able to assemble one liners on the fly like this is an enjoyable, rewarding and useful investment of time and effort. Understanding such things are the fundamentals of automating tasks. In this case with output filtering, variable creation, and subsequent validation logic. It’s a simple test, but the concept is exactly the same for any advanced automation procedure too.