Less Ghetto Log Parser for Website Hitcount/Downtime Analysis

Yesterday I created a proof of concept script, which basically goes off and identifies the hitcounts of a website, and can give a technician within a short duration of time (minutes instead of hours) exactly where hitcounts are coming from and where.

This is kind of a tradeoff, between a script that is automated, and one that is flexible.

The end goal is to provide a hitcount vs memory commit metric value. A NEW TYPE OF METRIC! HURRAH! (This is needed by the industry IMO).

And also would be nice to generate graphing and mean, average, and ranges, etc. So can provide output like ‘stat’ tool. Here is how I have progress

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/google.com-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: MARCH"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "$day/$month/$year:$i" "$logfile" | wc -l;

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "$day/$month/$year:$i:$j" "$logfile" | wc -l;
done

done

Thing is, after I wrote this, I wasn’t really happy, so I refactored it a bit more;

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/someweb.biz-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: $month"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do
hitsperhour=$(grep "$day/$month/$year:$i" "$logfile" | wc -l;);
echo "    > $day $month $year, hits this $ith hour: $hitsperhour"

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..59};
do
hitsperminute=$(grep "$day/$month/$year:$i:$j" "$logfile" | wc -l);
echo "                  >>hits at $i:$j  $hitsperminute";
done

done

Now it’s pretty leet.. well, simple. but functional. Here is what the output of the more nicely refined script; I’m really satisfied with the tabulation.

[root@822616-db1 automation]# ./list-visits.sh
Total HITS: Mar
6019301
    > 9 Mar 2017, hits this  hour: 28793
                  >>hits at 01:01  416
                  >>hits at 01:02  380
                  >>hits at 01:03  417
                  >>hits at 01:04  408
                  >>hits at 01:05  385
^C

PHP5 newrelic agent not collecting data

So today I had a newrelic customer who was having issues after installing the newrelic php plugin. He couldn’t understand why it wasn’t collecting data. For it to collecting data you need to make sure newrelic-daemon process is running by using ps auxfwww | grep newrelic-daemon.

We check the process of the daemon is running

[root@rtd-production-1 ~]# ps -ef | grep newrelic-daemon
root     26007 18914  0 09:59 pts/0    00:00:00 grep newrelic-daemon

We check the status of the daemon process

[root@rtd-production-1 ~]# service newrelic-daemon status
newrelic-daemon is stopped...

Copy basic NewRelic configuration template to correct location

[root@rtd-production-1 ~]# cp /etc/newrelic/newrelic.cfg.template /etc/newrelic/newrelic.cfg

Start the daemon

[root@rtd-production-1 ~]# service newrelic-daemon start
Starting newrelic-daemon:                                  [  OK  ]

Site keeps on going down because of spiders

So a Rackspace customer was consistently having an issue with their site going down, even after the number of workers were increased. It looked like in this customers case they were being hit really hard by yahoo slurp, google bot, a href bot, and many many others.

So I checked the hour the customer was affected, and found that over that hour just yahoo slurp and google bot accounted for 415 of the requests. This made up like 25% of all the requests to the site so it was certainly a possibility the max workers were being reached due to spikes in traffic from bots, in parallel with potential spikes in usual visitors.

[root@www logs]#  grep '01/Mar/2017:10:' access_log | egrep -i 'www.google.com/bot.html|http://help.yahoo.com/help/us/ysearch/slurp' |  wc -l
415

It wasn’t a complete theory, but was the best with all the available information I had, since everything else had been checked. The only thing that remains is the number of retransmits for that machine. All in all it was a victory, and this was so awesome, I’m now thinking of making a tool that will do this in more automated way.

I don’t know if this is the best way to find google bot and yahoo bot spiders, but it seems like a good method to start.

Aaron Mehar’s CBS to VHD solution for Rackspace Cloud

Hey. So another one of my colleagues put together this really awesome article. Although I was aware this could be done, he’s done a really good job or putting together the procedures, of turning your CBS BFV (boot from (network) volume) disk into a VHD file.

Rackspace CBS disks works over iscsi and are presented via the network. The difference between instance store on the hypervisor, (utilized by cloud-server images), and the disk store on the CBS is that the CBS disk is not a VHD, but an disk presented over network via iscsi.

So, to take a VHD, or an equivalent cloud-server image snapshot, you need to image the disk manually, as well as convert it to VHD.

Taking an image of a volume is not possible, and would not be downloadable. However there are some workarounds that can be done.

*** Please NOTE ****
This is not supported, and we can not assist beyond these instructions. I could provide some clarity if required, however, my collegaues may not be able to help should I become unavailable.

If you just want the data, then you could just download the data to your local machine, however, if you a VHD to create a local VM, then the below instructions will achieve this.

Steps

Please take special care, making a mistake working with partitioner can wipe all your data

1. Shutdown the server
2. Clone the disk, by Starting a volume clone and start the server back up.
3. Attach the newly created clone to the server
4. create another new CBS volume of a slightly larger size (+5GB is OK)

Now that is done, we can image the disk. You will need to ensure you have the corrects disks. The second disk with data should be xvbd and the new CBS should be xvdc

Create partition and filesystem for xvdbc. Please see this guide: https://support.rackspace.com/how-to/prepare-your-cloud-block-storage-volume/

the image xvdb to xvdc

   dd if=/dev/xvdb of=/mnt/cbsvolume1/myimage.dd

The download the image to your workstation, and install VirtualBox, and run the below command

   VBoxManage convertfromraw myfile.dd myfile.vhd --format VHD

Please take special care, making a mistake working with partitioner can wipe all your data

All About NOVA and Xen Tools in Rackspace Cloud – why can’t I connect to my Windows server?

Why can’t I connect to my Rackspace Windows cloud-server, you ask? 2 important questions.

1. Is it a new build?
2. Is it using a custom image (a non rackspace base image).

(because the rackspace base images all have correct nova-agent and xen tools, so get networking information OK. But customer images don’t!). In the case you have run the below tests to see if nova-agent is running (or installed), you will need to install them.

Checking for the nova-agent and xe-guest-utilities

ps auxfwww | grep nova-agent
yum -qa xe-guest-utilities nova-agent
dpkg -l xe-guest-utilities nova-agent

Explanation and solution

Thanks for reaching out to us with your inquiry today. I’m glad to convey to you that I understand what the problem is with your cloud-server not being contactable.

Main reasons for breakage

The main reason why this is not working is most likely caused by some important pieces of software being missing. There is a piece of software called nova-agent, which is responsible for setting your cloud-servers IPV4 address, network subnet/mask, and ip routes, when it is first built. This is important, since the server image you built the server from, has different network details.

The rackspace build process giving networking detail to the VM is completely dependent on xe-guest-utilities and nova-agent

What has happened in this case, because the nova-agent wasn’t running on the cloud-server, the hypervisor software Rackspace use to automate cloud-server builds wasn’t able to contact the nova-agent running on your cloud-server, and therefore nova-agent wasn’t able to update the networking information. And hence, your not able to connect to it on it’s IPv4 address you are given at build time.

The steps to resolution: installing nova-agent and xen guest utilities
As such, nova-agent needs to be installed on the cloud-server you take the image from, it can be installed as follows:

https://community.rackspace.com/products/f/25/t/5694

Also nova-agent uses another piece of important software called xe-guest-utilities, or (Xen Tools) for your windows servers, this is an important ‘PV’ paravirtualization tools, responsible for seamless management of cloud-servers. Sorry that in this case it’s not working out seamlessly, but this can happen with images taken of servers which have had nova-agent disabled, uninstalled, or similar.

Upgrading the tools that nova-agent depends upon, can be installed by following the instructions at the following location:

https://support.rackspace.com/how-to/upgrade-citrix-xen-server-tools-for-windows-cloud-servers/

# Options of how to do this / Summary of Solution Steps

Naturally, you might be wondering how to achieve these changes, if you cannot RDP to the server. This is quite understandable, there are two ways to get this working;

Option 1) Manually install nova-agent on the current server you cannot access, then manually install the Xen Tools in the same way. This shall fix the OS on the server itself, and not the original image you built the server from. So it is important to create a new cloud-server image after performing these steps and us verifying tools + nova-agent installed correctly.

2) Manually install nova-agent on the source server you initially taken the image from, and install Xen Tools, then re-image the server, and then re-deploy. This should seamlesssly work each time on build with that image, provided the tools are installed. You will not need to recreate the image, since your fixing the problem on the cloud-server source that the original image was taken from.

I appreciate that these things are not 100% simple to get your head around and can be confusing for customers, I hope my explanation and summary makes this a little more painless to fix. Of course if you have additional questions, comments or concerns or don’t understand something I’ve said, please don’t hesitate to reach out to us, we are here to help!

Creating a proper Method of Retrieving, Sorting, and Parsing Rackspace CDN Access Logs

So, this has been rather a bane on the life which is lived as Adam Bull. Basically, a large customer of ours had 50+ CDN’s, and literally hundreds of gigabytes of Log Files. They were all in Rackspace Cloud Files, and the big question was ‘how do I know how busy my CDN is?’.

screen-shot-2016-11-07-at-12-41-30-pm

This is a remarkably good question, because actually, not many tools are provided here, and the customer will, much like on many other CDN services, have to download those logs, and then process them. But that is actually not easier either, and I spent a good few weeks (albeit when I had time), trying to figure out the best way to do this. I dabbled with using tree to display the most commonly used logs, I played with piwik, awstats, and many others such as goaccess, all to no avail, and even used a sophisticated AWK script from our good friends in Operations. No luck, nothing, do not pass go, or collect $200. So, I was actually forced to write something to try and achieve this, from start to finish. There are 3 problems.

1) how to easily obtain .CDN_ACCESS_LOGS from Rackspace Cloud Files to Cloud Server (or remote).
2) how to easily process these logs, in which format.
3) how to easily present these logs, using which application.

The first challenge was actually retrieving the files.

swiftly --verbose --eventlet --concurrency=100 get .CDN_ACCESS_LOGS --all-objects -o ./

Naturally to perform this step above, you will need a working, and setup swiftly environment. If you don’t know what swiftly, is or understand how to set up a swiftly envrionment, please see this article I wrote on the subject of deleting all files with swiftly (The howto explains the environment setup first! Just don’t follow the article to the end, and continue from here, once you’ve setup and installed swiftly)

Fore more info see:
https://community.rackspace.com/products/f/25/t/7190

Processing the Rackspace CDN Logs that we’ve downloaded, and organising them for further log processing
This required a lot more effort, and thought

The below script sits in the same folder as all of the containers

# ls -al 
total 196
drwxrwxr-x 36 root root  4096 Nov  7 12:33 .
drwxr-xr-x  6 root root  4096 Nov  7 12:06 ..
# used by my script
-rw-rw-r--  1 root root  1128 Nov  7 12:06 alldirs.txt

# CDN Log File containers as we downloaded them from swiftly Rackspace Cloud Files (.CDN_ACCESS_LOGS)
drwxrwxr-x  3 root root  4096 Oct 19 11:22 dev.demo.video.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:22 europe.assets.lon.tv
drwxrwxr-x  5 root root  4096 Oct 19 11:22 files.lon.cdn.lon.com
drwxrwxr-x  3 root root  4096 Oct 19 11:23 files.blah.cdn..com
drwxrwxr-x  5 root root  4096 Oct 19 11:24 files.demo.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:25 files.invesco.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:25 files.test.cdn..com
-rw-r--r--  1 root root   561 Nov  7 12:02 generate-report.sh
-rwxr-xr-x  1 root root  1414 Nov  7 12:15 logparser.sh

# Used by my script
drwxr-xr-x  2 root root  4096 Nov  7 12:06 parsed
drwxr-xr-x  2 root root  4096 Nov  7 12:33 parsed-combined
#!/bin/bash

# Author : Adam Bull
# Title: Rackspace CDN Log Parser
# Date: November 7th 2016

echo "Deleting previous jobs"
rm -rf parsed;
rm -rf parsed-combined

ls -ld */ | awk '{print $9}' | grep -v parsed > alldirs.txt


# Create Location for Combined File Listing for CDN LOGS
mkdir parsed

# Create Location for combined CDN or ACCESS LOGS
mkdir parsed-combined

# This just builds a list of the CDN Access Logs
echo "Building list of Downloaded .CDN_ACCESS_LOG Files"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
echo $folder
        echo "$m" | xargs -i find ./{} -type f -print > "parsed/$folder.log"
done < alldirs.txt

# This part cats the files and uses xargs to produce all the Log oiutput, before cut processing and redirecting to parsed-combined/$folder
echo "Combining .CDN_ACCESS_LOG Files for bulk processing and converting into NCSA format"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
cat "parsed/$folder.log" | xargs -i zcat {} | cut -d' ' -f1-10  > "parsed-combined/$folder"
done < alldirs.txt


# This part processes the Log files with Goaccess, generating HTML reports
echo "Generating Goaccess HTML Logs"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
goaccess -f "parsed-combined/$folder" -a -o "/var/www/html/$folder.html"
done < alldirs.txt

How to easily present these logs

I kind of deceived you with the last step. Actually, because I have already done it, with the above script. Though, you will naturally need to have an httpd installed, and a documentroot in /var/www/html, so make sure you install apache2:

yum install httpd awstats

De de de de de de da! da da!

screen-shot-2016-11-07-at-12-41-30-pm

Some little caveats:

Generating a master index.html file of all the sites


[root@cdn-log-parser-mother html]# pwd
/var/www/html
[root@cdn-log-parser-mother html]# ls -al | awk '{print $9}' | xargs -i echo " {}
" > index.html

I will expand the script to generate this automatically soon, but for now, leaving like this due to time constraints.

DISASTER RECOVERY! Exporting a Broken Cloud-server image VHD from Rackspace and attempting to recover data

Thanks to my colleague Marcin for thie guestmount tools protip.

I wrote a previous guide which explains how to download/export a Cloud server image VHD from Rackspace Cloud, which is failing to build. It might allow you to perform data recovery, even if the image can’t be booted. Which I’m guessing someone is going to run into sooner or later, and will be pleased to see this article, it will at least give you a best shot at reading the VHD and recovering it, since as you might know already, just because boot or kernel is broken, doesn’t mean that the data isn’t there!

# A better article to use if you want to download via commandline
https://community.rackspace.com/products/f/25/t/3583

# My article doing this thru a web-browser which might be useful too for some customers
https://community.rackspace.com/products/f/25/t/7089

Once the image gets downloaded to your new cloud instance you can use ‘libguestfs-tools’ package (same name on Ubuntu and CentOS) which contains tools necessary for mounting .vhd image files.

The command would be (read-only mode):

guestmount -a {image-name}.vhd -i --ro {mount-point}

Taking strace output from stderr and piping to other utilities

Well, this is a strange thing to do, but say you want to know how fast an application is processing data. how to tell? Enter strace, and… a bit of wc -l with the assistance of tee and 2>&1 proper redirection.

strace -p 9653 2>&1 | tee >(wc -l)

where 9653 is the process id (pid) and wc -l is the command you want to pipe to!

read(4, "2.74.2.34 - - [26/05/2015:15:15"..., 8192) = 8192
read(4, "o) Version/7.1.6 Safari/537.85.1"..., 8192) = 8192
read(4, "ident/6.0)\"\n91.233.154.8 - - [26"..., 8192) = 8192
^C1290

1290 lines in the output.. perfect, thats what I wanted to know, roughly how quickly this log parser is going thru my logs 😀

Checking Rackspace DNS API Rate Limits

so, you want to check your Rackspace DNS API rate Limits?

Here is how you do it.


#!/bin/bash

# Username used to login to control panel
USERNAME='mycloudusernamegoeshere'

# Find the APIKey in the 'account settings' part of the menu of the control panel
APIKEY='mycloudapikeygoeshere'

# Customer ID of the account (the numbers that are in the URL when you login to mycloud control panel)
CUSTOMER_ID="100101010"

# This section simply retrieves the TOKEN
TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: application/json" |  python -mjson.tool | grep -A5 token | grep id | cut -d '"' -f4`

# Retrieve Rate Limit detail for the account
curl "https://dns.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/limits" -H "X-Auth-Token: $token" | python -m json.tool

Cheers &
Best wishes,
Adam