Rapid Troubleshooting mail not arriving at its destination

Today I had a customer whose mail was not arriving at it’s destination. IN my customers case, they knew that it was arriving at the destination but going into the spam folder. This is likely due to blacklisting, however some ISP or clients dont know it’s reaching the spam folder at the other end, and since the most common cause of this happening is due to a missing SPF (Sending policy framework record), MX record or DKIM record it is possibly to rapidly check the DNS of each using dig, if the sender domain is known.

To check for the IP Blacklistings on a mailserver use it’s ip in one of the many spam a checker;


In other cases it might be being caused by email bounces, due to the PTR, MX or DKIM records, and not even getting into the inbox, you can see that on the sending mail server using a simple grep command;

cat /var/log/maillog | grep -i status=bounced

You probably want to save the file as well

cat /var/log/maillog | grep -i status=bounced > bouncedemail.txt 

If you wanted to know which domains bounced email, if you’ve ensured all sending domains are correctly configured via DNS..

[root@api ~]# cat /var/log/maillog | grep -i status=bounced | awk '{print $7}'

You could use sed to extract which domains failed..

Then you could use whois against the domains to reach out to the email contact with some automation that explains that ‘weve checked DKIM, MX and SPF and all are configured correctly and believe this is an error on your behalf, etc, blah blah’..

Such a thing would be a good idea to implement for some large email providers, and I’m sure you could automate the DNS checking as well. You could likely automate the whole thing, just by watching the logs and DNS records, and some intelligent grep and awking.

Not bad.

Calculating the Average Hits per minute en-mass for thousands of sites

So, I had a customer having some major MySQL woes, and I wanted to know whether the MySQL issues were query related, as in due to the frequency of queries alone, or the size of the database. VS it being caused by the number of visitors coming into apache, therefore causing more frequency of MySQL hits, and explaining the higher CPU usage.

The best way to achieve this is to inspect /var/log/httpd with ls -al,

First we take a sample of all of the requests coming into apache2, as in all of them.. provided the customer has used proper naming conventions this isn’t a nightmare. Apache is designed to make this easy for you by the way it is setup by default, hurrah!

[root@box-DB1 logparser]# time tail -f /var/log/httpd/*access_log > allhitsnow

real	0m44.560s
user	0m0.006s
sys	0m0.031s

Time command prefixed here, will tell you how long you ran it for.

[root@box-DB1 logparser]# cat allhitsnow | wc -l

The above command shows you the number of lines in allhitsnow file, which was written to with all the new requests coming into sites from all the site log files. Simples! 1590 queries a minute is quite a lot.

Enabling MySQL Slow Query Logs

In the case your seeing very long pageload times, and have checked your application. It always pays to check the way in which the database performs when interacting with the application, especially if they are on either same or seperate server, as these significantly affect the way that your application will run.

mysql> SET GLOBAL slow_query_log = 'ON' ;
Query OK, 0 rows affected (0.01 sec)

mysql> SET GLOBAL slow_query_log_file = '/slow_query_logs/slow_query_logs.txt';
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)

Setting X-Frame-Options HTTP Header to allow SAME or NON SAME ORIGINS

It’s possible to increase the security of a webserver running a website, by ensuring that the X-FRAME-OPTIONS header pushes a header to the browser, which enforces the origin (server) serving the site. It prevents the website then providing objects which are not local to the site, in the stream. An admirable option for those which wish to increase their server security.

Naturally, there are some reasons why you might want to disable this, and in proper context, it can be secure. Always be sure to discuss with your pentester or PCI compliance officer, such considerations before proceeding, especially making sure that if you do not want to use SAME ORIGIN you always use the most secure option for the required task. Always check if there is a better way to achieve what your trying to do, when making such changes to your server configuration.

Insecure X-Frame-Option allows remote non matching origins

Header always append X-Frame-Options ALLOWALL

Secure X-Frame-Option imposes on the browser to not allow non origin(al) connections for the domain, which can prevent clickjack and other attacks.

Header always append X-Frame-Options SAMEORIGIN

Moving a WordPress site – much ado about nothing !

Have you noticed, there is all kinds of advise on the internet about the best way to move WordPress websites? There is literally a myriad of ways to achieve this. One of the methods I read on
wp.com was:

Changing Your Domain Name and URLs

Moving a website and changing your domain name or URLs (i.e. from http://example.com/site to http://example.com, or http://example.com to http://example.net) requires the following steps - in sequence.

    Download your existing site files.
    Export your database - go in to MySQL and export the database.
    Move the backed up files and database into a new folder - somewhere safe - this is your site backup.
    Log in to the site you want to move and go to Settings > General, then change the URLs. (ie from http://example.com/ to http://example.net ) - save the settings and expect to see a 404 page.
    Download your site files again.
    Export the database again.
    Edit wp-config.php with the new server's MySQL database name, user and password.
    Upload the files.
    Import the database on the new server.

I mean this is truly horrifying steps to take, and I don’t see the point at all. This is how I achieved it for one my customers.

1. Take customer Database Dump
2. Edit the database searching for 'siteurl' with vi
vi mysqldump.sql

And just swap out the values, confirming after editing the file;

[root@box]# cat somemysqldump.sql  | grep siteurl -A 2
(1, 'siteurl', 'https://www.newsiteurl.com', 'yes'),
(2, 'home', 'https://www.newsiteurl.com', 'yes'),
(3, 'blogname', 'My website name', 'yes'),

Job done, no stress https://codex.wordpress.org/Moving_WordPress.

There might be additional bits but this is certainly enough for them to access the wp-admin panel. If you have problems add this line to the wp-config.php file;


Just before the line which says

/* That’s all, stop editing! Happy blogging. */

And then just do the import/restore as normal;

mysql -u newmysqluser -p newdatabase_to_import_to < old_database.sql

Simples! I really have no idea why it is made to be so complicated on other hosting sites or platforms.

Magento Rewrite Woes … really woes

I had a customer this week that had some terrible rewrite woes with their magento site. They knew that a whole ton of their images were getting 404’s most likely because rewrite wasn’t getting to the correct filesystem path that the file resided. This was due to their cache being broken, and their second developer not creating proper rewrite rule.

As a sysadmin our job is not a development role, we are a support role, and in order to enable the developer to fix the problem, the developer needs to be able to see exactly what it is, enter the sysads task. I wrote this really ghetto script, which essentially hunts in the nginx error log for requests that failed with no such file, and then qualifies them by grepping for jpg file types. This is not a perfect way of doing it, however, it is really effective at identifying the broken links.

Then I have a seperate routine that strips the each of the file uri’s down to the filename, and locates the file on the filesystem, and matches the filename on the filesystem that the rewrite should be going to, as well as the incorrect path that the rewrite is presently putting the url to. See the script below:


# Author: Adam Bull
# Company: Rackspace LTD, Hayes
# Purpose:
#          This customer has a difficulty with nginx rewriting to the incorrect file
# this script goes into the nginx error.log and finds all the images which have the broken rewrite rules
# then after it has identified the broken rewrite rule files, it searches for the correct file on the filesystem
# then correlates it with the necessary rewrite rule that is required
# this could potentially be used for in-place upgrade by developers
# to ensure that website has proper redirects in the case of bugs with the ones which exist.

# This script will effectively find all 404's and give necessary information for forming a rewrite rule, i.e. the request url, from nginx error.log vs the actual filesystem location
# on the hard disk that the request needs to go to, but is not being rewritten to file path correctly already

# that way this data could be used to create rewrite rules programmatically, potentially
# This is a work in progress

# These are used for display output
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination missing:",$7"\n"}'
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination detected missing:",$7"\n"}'

# These below are used for variable population for locating actual file paths of missing files needed to determine the proper rewrite path destination (which is missing)
# we qualify this with only *.jpg files

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg > lost.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg >> lost.txt

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "http://mycustomerswebsite.com",$21}' | sed 's/\"//g' | grep jpg > lostfullurl.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "http://customerwebsite.com/",$21}' | sed 's/\"//g' | grep jpg >> lostfullurl.txt

# The below section is used for finding the lost files on filesystem and pairing them together in variable pairs
# for programmatic usage for rewrite rules

while true
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf '\n\n'
  printf 'Found a broken link getting a 404 at : %s\n'
  printf "$f1\n"
  printf 'Locating the correct link of the file on the filesystem: %s\n'
        find /var/www/magento | grep $f2
done 3<lostfullurl.txt 4<lost.txt

I was particularly proud of the last section which uses a ‘dual loop for two input files’ in a single while statement, allowing me to achieve the descriptions above.

Output is in the form of:

Found a broken link getting a 404 at :
Locating the correct link of the file on the filesystem:

As you can see the path is different on the filesystem to the url that the rewrite is putting the request to, hence the 404 this customer is getting.

This could be a really useful script, and, I see no reason why the script could not generate the rewrite rules programatically from the 404 failures it finds, it could actually create rules that are necessary to fix the problem. Now, this is not an ideal fix, however the script will allow you to have an overview either to fix this properly as a developer, or as a sysadmin to patch up with new rewrite rules.

I’m really proud of this one, even though not everyone may see a use for it. There really really is, and this customer is stoked, think of it like this, how can a developer fix it if he doesn’t have a clear idea of the things that are broken, and this is the sysads job,

Cheers &
Best wishes,

Recovering Corrupt RPM DB

I had a support specialist today that had an open yum task they couldn’t kill gracefully, after kill -9, this was happening

[root@mybox home]# yum list | grep -i xml
rpmdb: Thread/process 31902/140347322918656 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv-&gt;failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
Error: rpmdb open failed
[root@mybox home]#

The solution to fix this is to manually erase the yumdb and to manually rebuild it;

[root@mybox home]# rm -f /var/lib/rpm/__*
[root@mybox home]# rpm --rebuilddb

Increasing the Limits of PHP-FPM

It’s important to know how to increase the limits for php-fpm www pools, or any other named alias pools you might have setup.

You might see an error like

tail -f /var/log/php7.1-fpm.log
[24-Apr-2017 11:23:09] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 11 total


[24-Apr-2017 10:51:38] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

The solution is quite simple, we just need to go in and edit the php fpm configuration on the server and increase these values to safe ones that is supported by available RAM.

pm.max_children = 15

; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 2

; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 1

; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 8

Then monitor the site with

tail -f /var/log/php7.1-fpm.log

To ensure no further limits are being hit.

Obviously if you are using different version of fpm your log location might be different.

Finding all of the php.ini Files and finding what the memory_limit in them is

Simple as it says.

root@aea-php-slave:~# grep -rnw '/etc' -e 'memory_limit'
/etc/php/5.6/cli/php.ini:393:memory_limit = -1
/etc/php/5.6/apache2/php.ini:393:memory_limit = 128M
/etc/php5/fpm/php.ini:406:memory_limit = 128M
/etc/php5/fpm/pool.d/www.conf.disabled:392:;php_admin_value[memory_limit] = 32M
/etc/php5/fpm/pool.d/site.conf:392:;php_admin_value[memory_limit] = 32M
/etc/php5/cli/php.ini:406:memory_limit = -1
/etc/php5/apache2/php.ini:406:memory_limit = 128M


Setting 404 Error page in nginx

It’s easy to setup an custom error page in nginx. Just create a file in your documentroot like /404.html so if your documentroot is /var/www/html create a file called 404.html, then go into your /etc/nginx/nginx.conf, or /etc/nginx/conf.d/mysite.conf, and add this in your configuration between server { } directive.

Remember if your running https and http you will need the directive for each of them.

error_page 404 /404.html;

It’s possible to define a really custom location securely;

        error_page 404 /cust_404.html;
        location = /cust_404.html {
                root /usr/share/nginx/html;

Just make sure you don’t have a filename called 404.html in /404.html as I’m not sure this would work otherwise.