Ghetto but simple Log Parser for testing website performance

So… I got fedup with constantly writing my own stuff for basic things. I’m going to turn this into something more spectacular that accepts commandline input, and also, allows you to define which days, and months, ranges, and stuff like that.

It’s a no-frills-ghetto log parser.

#!/bin/bash

echo "Total HITS: MARCH"
grep "/Mar/2017" /var/log/httpd/somewebsite.com-access_log | wc -l;

for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "09/Mar/2017:$i" /var/log/httpd/somesite.com-access_log | wc -l;

        # break down the minutes in a nested visual way thats AWsome
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "09/Mar/2017:$i:$j" /var/log/httpd/somesite.com-access_log | wc -l;
done

done

It’s not perfect, it’s just a proof of concept, really.

Migrating a Plesk site after moving keeps going to default plesk page

So today a customer had this really weird issue where we could see that the website domain that had been moved from one server to a new plesk server, wasn’t correctly loading. It actually turned out to be simple, and when trying to access a file on the domain like I would get the phpinfo.php file.

curl http://www.customerswebsite.com/info.php 

This suggested to me the website documentroot was working, and the only thing missing was probably the index. This is what it actually did turn out to me.

I wanted to test though that info.php really was in this documentroot, and not some other virtualhost documentroot, so I moved the info.php file to randomnumbers12313.php and the page still loaded, this confirms by adding that file on the filesystem that all is well, and that I found correct site, important when troubleshooting vast configurations.

I also found a really handy one liner for troubleshooting which file it comes out, this might not be great on a really busy server, but you could still grep for your IP address as well.

Visit the broken/affected website we will troubleshoot

curl -I somecustomerswebsite.com

Give all visitors to all apache websites occurring now whilst we visit it ourselves for testing

tail -f /var/log/httpd/*.log 

This will show us which virtualhost and/or path is being accessed, from where.

Give only visitors to all apache websites occurring on a given IP

tail -f /var/log/httpd/*.log  | grep 4.2.2.4

Where 4.2.2.4 is your IP address your using to visit the site. If you don’t know what your Ip is type icanhazip into google, or ‘what is my ip’, job done.

Fixing the Plesk website without a directory index

[root@mehcakes-App1 conf]# plesk bin domain --update somecustomerswebsite.com -nginx-serve-php true -apache-directory-index index.php

Simple enough… but could be a pain if you don’t know what your looking for.

Site keeps on going down because of spiders

So a Rackspace customer was consistently having an issue with their site going down, even after the number of workers were increased. It looked like in this customers case they were being hit really hard by yahoo slurp, google bot, a href bot, and many many others.

So I checked the hour the customer was affected, and found that over that hour just yahoo slurp and google bot accounted for 415 of the requests. This made up like 25% of all the requests to the site so it was certainly a possibility the max workers were being reached due to spikes in traffic from bots, in parallel with potential spikes in usual visitors.

[root@www logs]#  grep '01/Mar/2017:10:' access_log | egrep -i 'www.google.com/bot.html|http://help.yahoo.com/help/us/ysearch/slurp' |  wc -l
415

It wasn’t a complete theory, but was the best with all the available information I had, since everything else had been checked. The only thing that remains is the number of retransmits for that machine. All in all it was a victory, and this was so awesome, I’m now thinking of making a tool that will do this in more automated way.

I don’t know if this is the best way to find google bot and yahoo bot spiders, but it seems like a good method to start.

A Unique Situation for grep (finding the files with content matching a specific pattern Linux)

This article explains how to find all the files that have a specific text or pattern within them, this is the article you’ve been looking for!

So today, I was dealing with a customers server where he had tried to configure BASIC AUTH. I’d found the httpd.conf file for the specific site, but I couldn’t see which file had basic auth setup as wrong. To save me looking through hundreds of configurations (and also to save YOU from looking through hundreds of configuration files) for this specific pattern. Why not use grep to recursively search files for the pattern, and why not use -n to give the filename and line number of files which have text in that match this pattern.

I really enjoyed this oneliner, and been meaning to work to put something like this together, because this kind of issue comes up a lot, and this can save a lot of time!

 grep -rnw '/' -e "PermitRootLogin"

# OUTPUT looks like

/usr/share/vim/vim74/syntax/sshdconfig.vim:157:syn keyword sshdconfigKeyword PermitRootLogin
/usr/share/doc/openssh-5.3p1/README.platform:37:instead the PermitRootLogin setting in sshd_config is used.

The above searches recursively all files in the root filesystem ‘/’ looking for PermitRootLogin.

I wanted to find which .htaccess file was responsible so I ran;

# grep -rnw '/' -e "/path/to/.htpasswd'

# OUTPUT looks like
/var/www/vhosts/somesite.com/.htaccess:14:AuthUserFile /path/to/.htpasswd

Comparing Files on the internet or CDN with MD5 to determine if they present same content

So, a customer today was having some issues with their CDN. They said that their SSL CDN was presenting a different image, than the HTTP CDN. So, I thought the best way to begin any troubleshooting process would firstly be to try and recreate those issues. To do that, I need a way to compare the files programmatically, enter md5sum a handly little shell application usually installed by default on most Linux OS.

[user@cbast3 ~]$ curl https://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.ssl.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file ; cat file | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  1726k      0 --:--:-- --:--:-- --:--:-- 1732k
e917a67bbe34d4eb2d4fe5a87ce90de0  -
[user@cbast3 ~]$ curl http://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.r45.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file2 ; cat file2 | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  2071k      0 --:--:-- --:--:-- --:--:-- 2081k
e917a67bbe34d4eb2d4fe5a87ce90de0  -

As we can see from the output of both, the md5sum (the hashing) of the two files is the same, this means there is a statistically very very very high chance the content is exactly the same, especially when passing several hundred characters or more. The hashing algorithm is combination based, so the more characters, the less likely same combination is of coming around twice!

In this case I was able to disprove the customers claim’s. Not because I wanted to, but because I wanted to solve their issue. These results show me, the issue must be, if it is with the CDN, with a local edgenode local to the customer having the issue. Since I am unable to recreate it from my location, it is therefore not unreasonable to assume that it is a client side issue, or a failure on our CDN edgenode side, local to the customer. That’s how I troubleshooted this, and quite happy with this one! Took about 2 minutes to do, and a few minutes to come up with. A quick and useful check indeed, which reduces the number of possibilities considerably in tracing down the issue!

Cheers &
Best wishes,
Adam

Please note the real CDN location has been altered for privacy reasons