Rapid Troubleshooting mail not arriving at its destination

Today I had a customer whose mail was not arriving at it’s destination. IN my customers case, they knew that it was arriving at the destination but going into the spam folder. This is likely due to blacklisting, however some ISP or clients dont know it’s reaching the spam folder at the other end, and since the most common cause of this happening is due to a missing SPF (Sending policy framework record), MX record or DKIM record it is possibly to rapidly check the DNS of each using dig, if the sender domain is known.

To check for the IP Blacklistings on a mailserver use it’s ip in one of the many spam a checker;

https://mxtoolbox.com/SuperTool.aspx?action=blacklist

In other cases it might be being caused by email bounces, due to the PTR, MX or DKIM records, and not even getting into the inbox, you can see that on the sending mail server using a simple grep command;

cat /var/log/maillog | grep -i status=bounced

You probably want to save the file as well

cat /var/log/maillog | grep -i status=bounced > bouncedemail.txt 

If you wanted to know which domains bounced email, if you’ve ensured all sending domains are correctly configured via DNS..

[root@api ~]# cat /var/log/maillog | grep -i status=bounced | awk '{print $7}'
to=<pi.magnum@generationgame.com>,
to=<john.partridge@boeeb.com>,

You could use sed to extract which domains failed..

Then you could use whois against the domains to reach out to the email contact with some automation that explains that ‘weve checked DKIM, MX and SPF and all are configured correctly and believe this is an error on your behalf, etc, blah blah’..

Such a thing would be a good idea to implement for some large email providers, and I’m sure you could automate the DNS checking as well. You could likely automate the whole thing, just by watching the logs and DNS records, and some intelligent grep and awking.

Not bad.

Enabling MySQL Slow Query Logs

In the case your seeing very long pageload times, and have checked your application. It always pays to check the way in which the database performs when interacting with the application, especially if they are on either same or seperate server, as these significantly affect the way that your application will run.


mysql> SET GLOBAL slow_query_log = 'ON' ;
Query OK, 0 rows affected (0.01 sec)

mysql> SET GLOBAL slow_query_log_file = '/slow_query_logs/slow_query_logs.txt';
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)

Less Ghetto Log Parser for Website Hitcount/Downtime Analysis

Yesterday I created a proof of concept script, which basically goes off and identifies the hitcounts of a website, and can give a technician within a short duration of time (minutes instead of hours) exactly where hitcounts are coming from and where.

This is kind of a tradeoff, between a script that is automated, and one that is flexible.

The end goal is to provide a hitcount vs memory commit metric value. A NEW TYPE OF METRIC! HURRAH! (This is needed by the industry IMO).

And also would be nice to generate graphing and mean, average, and ranges, etc. So can provide output like ‘stat’ tool. Here is how I have progress

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/google.com-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: MARCH"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "$day/$month/$year:$i" "$logfile" | wc -l;

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "$day/$month/$year:$i:$j" "$logfile" | wc -l;
done

done

Thing is, after I wrote this, I wasn’t really happy, so I refactored it a bit more;

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/someweb.biz-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: $month"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do
hitsperhour=$(grep "$day/$month/$year:$i" "$logfile" | wc -l;);
echo "    > $day $month $year, hits this $ith hour: $hitsperhour"

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..59};
do
hitsperminute=$(grep "$day/$month/$year:$i:$j" "$logfile" | wc -l);
echo "                  >>hits at $i:$j  $hitsperminute";
done

done

Now it’s pretty leet.. well, simple. but functional. Here is what the output of the more nicely refined script; I’m really satisfied with the tabulation.

[root@822616-db1 automation]# ./list-visits.sh
Total HITS: Mar
6019301
    > 9 Mar 2017, hits this  hour: 28793
                  >>hits at 01:01  416
                  >>hits at 01:02  380
                  >>hits at 01:03  417
                  >>hits at 01:04  408
                  >>hits at 01:05  385
^C

Ghetto but simple Log Parser for testing website performance

So… I got fedup with constantly writing my own stuff for basic things. I’m going to turn this into something more spectacular that accepts commandline input, and also, allows you to define which days, and months, ranges, and stuff like that.

It’s a no-frills-ghetto log parser.

#!/bin/bash

echo "Total HITS: MARCH"
grep "/Mar/2017" /var/log/httpd/somewebsite.com-access_log | wc -l;

for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "09/Mar/2017:$i" /var/log/httpd/somesite.com-access_log | wc -l;

        # break down the minutes in a nested visual way thats AWsome
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "09/Mar/2017:$i:$j" /var/log/httpd/somesite.com-access_log | wc -l;
done

done

It’s not perfect, it’s just a proof of concept, really.

PHP5 newrelic agent not collecting data

So today I had a newrelic customer who was having issues after installing the newrelic php plugin. He couldn’t understand why it wasn’t collecting data. For it to collecting data you need to make sure newrelic-daemon process is running by using ps auxfwww | grep newrelic-daemon.

We check the process of the daemon is running

[root@rtd-production-1 ~]# ps -ef | grep newrelic-daemon
root     26007 18914  0 09:59 pts/0    00:00:00 grep newrelic-daemon

We check the status of the daemon process

[root@rtd-production-1 ~]# service newrelic-daemon status
newrelic-daemon is stopped...

Copy basic NewRelic configuration template to correct location

[root@rtd-production-1 ~]# cp /etc/newrelic/newrelic.cfg.template /etc/newrelic/newrelic.cfg

Start the daemon

[root@rtd-production-1 ~]# service newrelic-daemon start
Starting newrelic-daemon:                                  [  OK  ]

Site keeps on going down because of spiders

So a Rackspace customer was consistently having an issue with their site going down, even after the number of workers were increased. It looked like in this customers case they were being hit really hard by yahoo slurp, google bot, a href bot, and many many others.

So I checked the hour the customer was affected, and found that over that hour just yahoo slurp and google bot accounted for 415 of the requests. This made up like 25% of all the requests to the site so it was certainly a possibility the max workers were being reached due to spikes in traffic from bots, in parallel with potential spikes in usual visitors.

[root@www logs]#  grep '01/Mar/2017:10:' access_log | egrep -i 'www.google.com/bot.html|http://help.yahoo.com/help/us/ysearch/slurp' |  wc -l
415

It wasn’t a complete theory, but was the best with all the available information I had, since everything else had been checked. The only thing that remains is the number of retransmits for that machine. All in all it was a victory, and this was so awesome, I’m now thinking of making a tool that will do this in more automated way.

I don’t know if this is the best way to find google bot and yahoo bot spiders, but it seems like a good method to start.

Count number of IP’s over a given time period in Apache Log

So, a customer had an outage, and wasn’t sure what caused it. It looked like some IP’s were hammering the site, so I wrote this quite one liner just to sort the IP’s numerically, so that uniq -c can count the duplicate requests, this way we can count exactly how many times a given IP makes a request in any given minute or hour:

Any given minute

# grep '24/Feb/2017:10:03' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any given hour

# grep '24/Feb/2017:10:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given day

# grep '24/Feb/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given Month

# grep '/Feb/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given Year

# grep '/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any given year might cause dupes though, and I’m sure there is a better way of doing that which is more specific

A Unique Situation for grep (finding the files with content matching a specific pattern Linux)

This article explains how to find all the files that have a specific text or pattern within them, this is the article you’ve been looking for!

So today, I was dealing with a customers server where he had tried to configure BASIC AUTH. I’d found the httpd.conf file for the specific site, but I couldn’t see which file had basic auth setup as wrong. To save me looking through hundreds of configurations (and also to save YOU from looking through hundreds of configuration files) for this specific pattern. Why not use grep to recursively search files for the pattern, and why not use -n to give the filename and line number of files which have text in that match this pattern.

I really enjoyed this oneliner, and been meaning to work to put something like this together, because this kind of issue comes up a lot, and this can save a lot of time!

 grep -rnw '/' -e "PermitRootLogin"

# OUTPUT looks like

/usr/share/vim/vim74/syntax/sshdconfig.vim:157:syn keyword sshdconfigKeyword PermitRootLogin
/usr/share/doc/openssh-5.3p1/README.platform:37:instead the PermitRootLogin setting in sshd_config is used.

The above searches recursively all files in the root filesystem ‘/’ looking for PermitRootLogin.

I wanted to find which .htaccess file was responsible so I ran;

# grep -rnw '/' -e "/path/to/.htpasswd'

# OUTPUT looks like
/var/www/vhosts/somesite.com/.htaccess:14:AuthUserFile /path/to/.htpasswd

Creating a proper Method of Retrieving, Sorting, and Parsing Rackspace CDN Access Logs

So, this has been rather a bane on the life which is lived as Adam Bull. Basically, a large customer of ours had 50+ CDN’s, and literally hundreds of gigabytes of Log Files. They were all in Rackspace Cloud Files, and the big question was ‘how do I know how busy my CDN is?’.

screen-shot-2016-11-07-at-12-41-30-pm

This is a remarkably good question, because actually, not many tools are provided here, and the customer will, much like on many other CDN services, have to download those logs, and then process them. But that is actually not easier either, and I spent a good few weeks (albeit when I had time), trying to figure out the best way to do this. I dabbled with using tree to display the most commonly used logs, I played with piwik, awstats, and many others such as goaccess, all to no avail, and even used a sophisticated AWK script from our good friends in Operations. No luck, nothing, do not pass go, or collect $200. So, I was actually forced to write something to try and achieve this, from start to finish. There are 3 problems.

1) how to easily obtain .CDN_ACCESS_LOGS from Rackspace Cloud Files to Cloud Server (or remote).
2) how to easily process these logs, in which format.
3) how to easily present these logs, using which application.

The first challenge was actually retrieving the files.

swiftly --verbose --eventlet --concurrency=100 get .CDN_ACCESS_LOGS --all-objects -o ./

Naturally to perform this step above, you will need a working, and setup swiftly environment. If you don’t know what swiftly, is or understand how to set up a swiftly envrionment, please see this article I wrote on the subject of deleting all files with swiftly (The howto explains the environment setup first! Just don’t follow the article to the end, and continue from here, once you’ve setup and installed swiftly)

Fore more info see:
https://community.rackspace.com/products/f/25/t/7190

Processing the Rackspace CDN Logs that we’ve downloaded, and organising them for further log processing
This required a lot more effort, and thought

The below script sits in the same folder as all of the containers

# ls -al 
total 196
drwxrwxr-x 36 root root  4096 Nov  7 12:33 .
drwxr-xr-x  6 root root  4096 Nov  7 12:06 ..
# used by my script
-rw-rw-r--  1 root root  1128 Nov  7 12:06 alldirs.txt

# CDN Log File containers as we downloaded them from swiftly Rackspace Cloud Files (.CDN_ACCESS_LOGS)
drwxrwxr-x  3 root root  4096 Oct 19 11:22 dev.demo.video.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:22 europe.assets.lon.tv
drwxrwxr-x  5 root root  4096 Oct 19 11:22 files.lon.cdn.lon.com
drwxrwxr-x  3 root root  4096 Oct 19 11:23 files.blah.cdn..com
drwxrwxr-x  5 root root  4096 Oct 19 11:24 files.demo.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:25 files.invesco.cdn..com
drwxrwxr-x  3 root root  4096 Oct 19 11:25 files.test.cdn..com
-rw-r--r--  1 root root   561 Nov  7 12:02 generate-report.sh
-rwxr-xr-x  1 root root  1414 Nov  7 12:15 logparser.sh

# Used by my script
drwxr-xr-x  2 root root  4096 Nov  7 12:06 parsed
drwxr-xr-x  2 root root  4096 Nov  7 12:33 parsed-combined
#!/bin/bash

# Author : Adam Bull
# Title: Rackspace CDN Log Parser
# Date: November 7th 2016

echo "Deleting previous jobs"
rm -rf parsed;
rm -rf parsed-combined

ls -ld */ | awk '{print $9}' | grep -v parsed > alldirs.txt


# Create Location for Combined File Listing for CDN LOGS
mkdir parsed

# Create Location for combined CDN or ACCESS LOGS
mkdir parsed-combined

# This just builds a list of the CDN Access Logs
echo "Building list of Downloaded .CDN_ACCESS_LOG Files"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
echo $folder
        echo "$m" | xargs -i find ./{} -type f -print > "parsed/$folder.log"
done < alldirs.txt

# This part cats the files and uses xargs to produce all the Log oiutput, before cut processing and redirecting to parsed-combined/$folder
echo "Combining .CDN_ACCESS_LOG Files for bulk processing and converting into NCSA format"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
cat "parsed/$folder.log" | xargs -i zcat {} | cut -d' ' -f1-10  > "parsed-combined/$folder"
done < alldirs.txt


# This part processes the Log files with Goaccess, generating HTML reports
echo "Generating Goaccess HTML Logs"
sleep 3
while read m; do
folder=$(echo "$m" | sed 's@/@@g')
goaccess -f "parsed-combined/$folder" -a -o "/var/www/html/$folder.html"
done < alldirs.txt

How to easily present these logs

I kind of deceived you with the last step. Actually, because I have already done it, with the above script. Though, you will naturally need to have an httpd installed, and a documentroot in /var/www/html, so make sure you install apache2:

yum install httpd awstats

De de de de de de da! da da!

screen-shot-2016-11-07-at-12-41-30-pm

Some little caveats:

Generating a master index.html file of all the sites


[root@cdn-log-parser-mother html]# pwd
/var/www/html
[root@cdn-log-parser-mother html]# ls -al | awk '{print $9}' | xargs -i echo " {}
" > index.html

I will expand the script to generate this automatically soon, but for now, leaving like this due to time constraints.