Less Ghetto Log Parser for Website Hitcount/Downtime Analysis

Yesterday I created a proof of concept script, which basically goes off and identifies the hitcounts of a website, and can give a technician within a short duration of time (minutes instead of hours) exactly where hitcounts are coming from and where.

This is kind of a tradeoff, between a script that is automated, and one that is flexible.

The end goal is to provide a hitcount vs memory commit metric value. A NEW TYPE OF METRIC! HURRAH! (This is needed by the industry IMO).

And also would be nice to generate graphing and mean, average, and ranges, etc. So can provide output like ‘stat’ tool. Here is how I have progress

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/google.com-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: MARCH"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do echo "      > 9th March 2017, hits this $i hour";
grep "$day/$month/$year:$i" "$logfile" | wc -l;

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..60};
do echo "                  >>hits at $i:$j";
grep "$day/$month/$year:$i:$j" "$logfile" | wc -l;
done

done

Thing is, after I wrote this, I wasn’t really happy, so I refactored it a bit more;

#!/bin/bash
#
# Author: 	Adam Bull, Cirrus Infrastructure, Rackspace LTD
# Date: 	March 20 2017
# Use:		This script automates the analysis of webserver logs hitcounts and
# 		provides a breakdown to indicate whether outages are caused by website visits
#		In correlation to memory and load avg figures


# Settings

# What logfile to get stats for
logfile="/var/log/httpd/someweb.biz-access.log"

# What year month and day are we scanning for minute/hour hits
year=2017
month=Mar
day=9

echo "Total HITS: $month"
grep "/$month/$year" "$logfile" | wc -l;

# Hours
for i in 0{1..9} {10..24};

do
hitsperhour=$(grep "$day/$month/$year:$i" "$logfile" | wc -l;);
echo "    > $day $month $year, hits this $ith hour: $hitsperhour"

        # break down the minutes in a nested visual way thats AWsome

# Minutes
for j in 0{1..9} {10..59};
do
hitsperminute=$(grep "$day/$month/$year:$i:$j" "$logfile" | wc -l);
echo "                  >>hits at $i:$j  $hitsperminute";
done

done

Now it’s pretty leet.. well, simple. but functional. Here is what the output of the more nicely refined script; I’m really satisfied with the tabulation.

[root@822616-db1 automation]# ./list-visits.sh
Total HITS: Mar
6019301
    > 9 Mar 2017, hits this  hour: 28793
                  >>hits at 01:01  416
                  >>hits at 01:02  380
                  >>hits at 01:03  417
                  >>hits at 01:04  408
                  >>hits at 01:05  385
^C

Site keeps on going down because of spiders

So a Rackspace customer was consistently having an issue with their site going down, even after the number of workers were increased. It looked like in this customers case they were being hit really hard by yahoo slurp, google bot, a href bot, and many many others.

So I checked the hour the customer was affected, and found that over that hour just yahoo slurp and google bot accounted for 415 of the requests. This made up like 25% of all the requests to the site so it was certainly a possibility the max workers were being reached due to spikes in traffic from bots, in parallel with potential spikes in usual visitors.

[root@www logs]#  grep '01/Mar/2017:10:' access_log | egrep -i 'www.google.com/bot.html|http://help.yahoo.com/help/us/ysearch/slurp' |  wc -l
415

It wasn’t a complete theory, but was the best with all the available information I had, since everything else had been checked. The only thing that remains is the number of retransmits for that machine. All in all it was a victory, and this was so awesome, I’m now thinking of making a tool that will do this in more automated way.

I don’t know if this is the best way to find google bot and yahoo bot spiders, but it seems like a good method to start.

A Unique Situation for grep (finding the files with content matching a specific pattern Linux)

This article explains how to find all the files that have a specific text or pattern within them, this is the article you’ve been looking for!

So today, I was dealing with a customers server where he had tried to configure BASIC AUTH. I’d found the httpd.conf file for the specific site, but I couldn’t see which file had basic auth setup as wrong. To save me looking through hundreds of configurations (and also to save YOU from looking through hundreds of configuration files) for this specific pattern. Why not use grep to recursively search files for the pattern, and why not use -n to give the filename and line number of files which have text in that match this pattern.

I really enjoyed this oneliner, and been meaning to work to put something like this together, because this kind of issue comes up a lot, and this can save a lot of time!

 grep -rnw '/' -e "PermitRootLogin"

# OUTPUT looks like

/usr/share/vim/vim74/syntax/sshdconfig.vim:157:syn keyword sshdconfigKeyword PermitRootLogin
/usr/share/doc/openssh-5.3p1/README.platform:37:instead the PermitRootLogin setting in sshd_config is used.

The above searches recursively all files in the root filesystem ‘/’ looking for PermitRootLogin.

I wanted to find which .htaccess file was responsible so I ran;

# grep -rnw '/' -e "/path/to/.htpasswd'

# OUTPUT looks like
/var/www/vhosts/somesite.com/.htaccess:14:AuthUserFile /path/to/.htpasswd

Using omconfig to add a RAID 1 device for a Perc 6/i Dell Raid Controller

So, I’ve been provisioning disks, and stuff recently.. this is how I did it on a Dell. Quite an easy thing to do!

omconfig storage controller action=createvdisk controller=0 raid=r1 size=max readpolicy=ara pdisk=0:0:2,0:0:3
Command successful!

In this case the two disks newly added were 0:0:2 and 0:0:3 on the SAS ‘bus’.

An additional primary partition was created and added for this device sdb1, and a filesystem of the same kind (ext3) as the system disk was created;

 mkfs.ext3 /dev/sdb1

mke2fs 1.39 (29-May-2006)
....
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

You will naturally need to mount the partition and create an fstab entry to make this permanent;

mount /dev/sdb1 /mnt/backup

echo "/dev/sdb1               /mnt/backup             ext3    defaults        1 1" >> /etc/fstab

You may wish to consider adding the above to fstab manually. It’s not a good idea using echo with it incase you make a mistake ;-D

Cheers &
Best wishes,
Adam

Configuring SFTP without chroot (the easy way)

So, I wouldn’t normally recommend this to customers. However, there are secure ways to add SFTP access, without the SFTP subsystem having to be modified. It’s also possible to achieve similar setup in a location like /home/john/public_html.

Let’s assume that public_html and everything underneath it is chowned john:john. So john:john has all the access, and apache2 runs with it’s own gid;uid. This was a pretty strange setup, and you don’t see it every day. But actually, it allowed me to solve another problem that I’ve been seeing/seeing customers have for a long time. That problem is the problem of effectively and easily managing permissions. Once I figured this out it was a serious ‘aha!’ moment!. Here’s why.

Inside the /etc/group, we find the customers developer has done something tragic:

[root@web public_html]# cat /etc/group | grep apache
apache:x:48:john,bob

But fine.. we’ll run with it.

We can see all the files inside their /home/john/public_html , the sight is not good

]# ls -al 
total 232
drwxrwxr-x 27 john john  4096 Dec 20 15:56 .
drwxr-xr-x 12 john john  4096 Dec 15 11:08 ..
drwxrwxr-x 10 john john  4096 Dec 16 09:56 administrator
drwxrwxr-x  2 john john  4096 Dec 14 11:18 bin
drwxrwxr-x  4 john john  4096 Nov  2 15:05 build
-rw-rw-r--  1 john john   714 Nov  2 15:05 build.xml
drwxrwxr-x  3 john john  4096 Nov  2 15:05 c
drwxrwxr-x  3 john john 45056 Dec 20 13:09 cache
drwxrwxr-x  2 john john  4096 Dec 14 11:18 cli
drwxrwxr-x 32 john john  4096 Dec 14 11:18 components
-rw-rw-r--  1 john john  1863 Nov  2 15:05 configuration-live.php
-rw-r--r--  1 john john  3173 Dec 15 11:08 configuration.php
drwxrwxr-x  3 john john  4096 Nov  2 15:05 docs
drwxrwxr-x  8 john john  4096 Dec 16 17:17 .git
-rw-rw-r--  1 john john  1734 Dec 14 11:21 .gitignore

It gets worse..

# cat /etc/passwd | grep john
john:x:501:501::/home/john:/bin/sh

Now, adding an sftp user into this, might look like a nightmare, but actually with some retrospective thought it was really easy.

Solving this mess:

Install Scponly

yum install scponly

Create new ‘SFTP’ user:

scponlyuser:x:504:505::/home/john:/usr/bin/scponly

Create a password for user scponlyuser

 
passwd scponlyuser

Solution to john:john permissions

[root@web public_html]# cat /etc/group | grep john
apache:x:48:john,bob
john:x:501:scponlyuser

We simply make scponlyuser part of the john group by adding the second line there. That way, the scponlyuser will have read/write access to the same files as the shell user, without exposing any additional stuff.

This was a cool solution to fixing this customers insecure solution, that they wanted to keep it the way they had, and was also great way to add an sftp account without requiring root jail. Whether it’s better than the root jail, is really debatable, however scponly enforces that only this account can be used only for SCP, as well as achieving sftp user access, without a jail.

I was proud of this achievement.. goes to show Linux permissions are really more flexible than we can imagine. And, whether you really want to flex those permissions muscles though, should be of concern. I advised this customer to change this setup, remove the /bin/sh, among other things..

We finally test SFTP is working as expected with the new scponlyuser


sftp> rmdir test
sftp> get index.php
Fetching /home/john/public_html/index.php to index.php
/home/john/public_html/index.php                                                                                     100% 1420     1.4KB/s   00:00
sftp> put index.php
Uploading index.php to /home/john/public_html/index.php
index.php                                                                                                                100% 1420     1.4KB/s   00:00
sftp> mkdir test
sftp> rmdir test

Just replace ‘scponly’ with whatever username your setting up. The only part that you need to keep the ‘scponly’ bit, is /usr/bin/scponly, this is the environment logging into. Apologies that scponly is so similar to scponlyuser ;-D

scponlyuser:x:504:505::/home/john:/usr/bin/scponly

I was very pleased with this! Hope that you find this useful too!

Help! I can’t login to my cloud-server even though I’ve reset my root password

The most common cause of this is the permit root login is set to no, although there might be other causes, like a really broken sshd_config, instead of just one variable. The procedure for looking into this is pretty much the same regardless of the breakage that has occurred. Here is what you need to do:

Here’s the full procedure:

1) Put server into rescue mode.
2) Login to cloud-server on SSH port, please note rescue mode gives you a new temporary root password allowing you to reset the password for SSH on the ‘original disk’.
3) once logged in mount the /dev/xvdb devices, this may be /dev/xvdb1 or /dev/xvdb2 but is usually /dev/xvdb1 and chroot (change root to the ‘original disk’)

# Mount old disk
mnt /dev/xvdb1 /mnt

# Change to the ‘old disk’
chroot /mnt

# Set the new password for root on the old disk:

passwd
# enter the new password when prompted

and specifically ensure that /etc/ssh/sshd_config has this line:

PermitRootLogin no

changed to:

PermitRootLogin yes

Your developer or sysad won’t be able to login until you reset the root password here, and if you do not know the username to su to root from, it is absolutely critical to perform this work, otherwise you won’t be able to access the server.

Also, once you have allowed the root login, and changed the password to something you recognise you will be able to exit rescue mode thru the control panel and login to the machine as normal.

For more detail about how to do this (although all the steps are here pretty much, please see):

https://support.rackspace.com/how-to/rackspace-cloud-essentials-rescue-mode-on-linux-cloud-servers/

I hope this helps you folks out some,

DISASTER RECOVERY! Exporting a Broken Cloud-server image VHD from Rackspace and attempting to recover data

Thanks to my colleague Marcin for thie guestmount tools protip.

I wrote a previous guide which explains how to download/export a Cloud server image VHD from Rackspace Cloud, which is failing to build. It might allow you to perform data recovery, even if the image can’t be booted. Which I’m guessing someone is going to run into sooner or later, and will be pleased to see this article, it will at least give you a best shot at reading the VHD and recovering it, since as you might know already, just because boot or kernel is broken, doesn’t mean that the data isn’t there!

# A better article to use if you want to download via commandline
https://community.rackspace.com/products/f/25/t/3583

# My article doing this thru a web-browser which might be useful too for some customers
https://community.rackspace.com/products/f/25/t/7089

Once the image gets downloaded to your new cloud instance you can use ‘libguestfs-tools’ package (same name on Ubuntu and CentOS) which contains tools necessary for mounting .vhd image files.

The command would be (read-only mode):

guestmount -a {image-name}.vhd -i --ro {mount-point}

Securing your WordPress with chmod 644 and chmod 755 the easy (but pro) way

Let’s say we have a document root like:

It’s interesting to note the instructions for this will vary from environment to environment, it depends on which user is looking after apache2, etc.

/var/www/mysite.com/htdocs

Make all files read/write and owned by www-data apache2 user only

root@meine:/var/www/mysite.com/htdocs# find . -type f -exec chown apache2:apache2 {} \; 
root@meine:/var/www/mysite.com/htdocs# find . -type f -exec chmod 644 {} \;

Make all folders accessible Read + Execute, but no write permissions

root@meine:/var/www/mysite.com/htdocs# find . -type d -exec chmod 755 {} \;
root@meine:/var/www/mysite.com/htdocs# find . -type d -exec chown apache2:apache2 {} \;

PLEASE NOTE THIS BREAKS YOUR WORDPRESS ABILITY TO AUTO-UPDATE ITSELF. BUT IT IS MORE SECURE 😀

Note debian users, may need to use www-data:www-data instead.

Using Sar to tell a story

So, a customer is experiencing slowness/sluggishness in their app. You know there is not issue with the hypervisor from instinct, but instinct isn’t enough. Using tools like xentop, sar, bwm-ng are critical parts of live and historical troubleshooting.

Sar can tell you a story, if you can ask the storyteller the write questions, or even better, pick up the book and read it properly. You’ll understand what the plot, scenario, situation and exactly how to proceed with troubleshooting by paying attention to these data and knowing which things to check under certain circumstances.

This article doesn’t go in depth to that, but it gives you a good reference of a variety of tests, the most important being, cpu usage, io usage, network usage, and load averages.

CPU Usage of all processors

# Grab details live
sar -u 1 3

# Use historical binary sar file
# sa10 means '10th day' of current month.
sar -u -f /var/log/sa/sa10 

CPU Usage of a particular Processor

sar -P ALL 1 1

‘-P 1’ means check only the 2nd Core. (Core numbers start from 0).

sar -P 1 1 5

The above command displays real time CPU usage for core number 1, every 1 second for 5 times.

Observing Changes in Memory over time

sar -r 1 3

The above command provides memory stats every 1 second for a total of 3 times.

Observing Swap usage over time

sar -S 1 5

The above command reports swap statistics every 1 seconds, a total 3 times.

Overall I/O activity

sar -b 1 3 

The above command checks every 1 seconds, 3 times.

Individual Block Device I/O Activities

This is a useful check for LUN , block devices and other specific mounts

sar -d 1 1 
sar -p d

DEV – indicates block device, i.e. sda, sda1, sdb1 etc.

Total Number processors created a second / Context switches

sar -w 1 3

Run Queue and Load Average

sar -q 1 3 

This reports the run queue size and load average of last 1 minute, 5 minutes, and 15 minutes. “1 3” reports for every 1 seconds a total of 3 times.

Report Network Statistics

sar -n KEYWORD

KEYWORDS Available;

DEV – Displays network devices vital statistics for eth0, eth1, etc.,
EDEV – Display network device failure statistics
NFS – Displays NFS client activities
NFSD – Displays NFS server activities
SOCK – Displays sockets in use for IPv4
IP – Displays IPv4 network traffic
EIP – Displays IPv4 network errors
ICMP – Displays ICMPv4 network traffic
EICMP – Displays ICMPv4 network errors
TCP – Displays TCPv4 network traffic
ETCP – Displays TCPv4 network errors
UDP – Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL – This displays all of the above information. The output will be very long.

sar -n DEV 1 1

Specify Start Time

sar -q -f /var/log/sa/sa11 -s 11:00:00
sar -q -f /var/log/sa/sa11 -s 11:00:00 | head -n 10