Magento Rewrite Woes … really woes

I had a customer this week that had some terrible rewrite woes with their magento site. They knew that a whole ton of their images were getting 404’s most likely because rewrite wasn’t getting to the correct filesystem path that the file resided. This was due to their cache being broken, and their second developer not creating proper rewrite rule.

As a sysadmin our job is not a development role, we are a support role, and in order to enable the developer to fix the problem, the developer needs to be able to see exactly what it is, enter the sysads task. I wrote this really ghetto script, which essentially hunts in the nginx error log for requests that failed with no such file, and then qualifies them by grepping for jpg file types. This is not a perfect way of doing it, however, it is really effective at identifying the broken links.

Then I have a seperate routine that strips the each of the file uri’s down to the filename, and locates the file on the filesystem, and matches the filename on the filesystem that the rewrite should be going to, as well as the incorrect path that the rewrite is presently putting the url to. See the script below:


# Author: Adam Bull
# Company: Rackspace LTD, Hayes
# Purpose:
#          This customer has a difficulty with nginx rewriting to the incorrect file
# this script goes into the nginx error.log and finds all the images which have the broken rewrite rules
# then after it has identified the broken rewrite rule files, it searches for the correct file on the filesystem
# then correlates it with the necessary rewrite rule that is required
# this could potentially be used for in-place upgrade by developers
# to ensure that website has proper redirects in the case of bugs with the ones which exist.

# This script will effectively find all 404's and give necessary information for forming a rewrite rule, i.e. the request url, from nginx error.log vs the actual filesystem location
# on the hard disk that the request needs to go to, but is not being rewritten to file path correctly already

# that way this data could be used to create rewrite rules programmatically, potentially
# This is a work in progress

# These are used for display output
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination missing:",$7"\n"}'
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination detected missing:",$7"\n"}'

# These below are used for variable population for locating actual file paths of missing files needed to determine the proper rewrite path destination (which is missing)
# we qualify this with only *.jpg files

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg > lost.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg >> lost.txt

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "",$21}' | sed 's/\"//g' | grep jpg > lostfullurl.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "",$21}' | sed 's/\"//g' | grep jpg >> lostfullurl.txt

# The below section is used for finding the lost files on filesystem and pairing them together in variable pairs
# for programmatic usage for rewrite rules

while true
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf '\n\n'
  printf 'Found a broken link getting a 404 at : %s\n'
  printf "$f1\n"
  printf 'Locating the correct link of the file on the filesystem: %s\n'
        find /var/www/magento | grep $f2
done 3<lostfullurl.txt 4<lost.txt

I was particularly proud of the last section which uses a ‘dual loop for two input files’ in a single while statement, allowing me to achieve the descriptions above.

Output is in the form of:

Found a broken link getting a 404 at :
Locating the correct link of the file on the filesystem:

As you can see the path is different on the filesystem to the url that the rewrite is putting the request to, hence the 404 this customer is getting.

This could be a really useful script, and, I see no reason why the script could not generate the rewrite rules programatically from the 404 failures it finds, it could actually create rules that are necessary to fix the problem. Now, this is not an ideal fix, however the script will allow you to have an overview either to fix this properly as a developer, or as a sysadmin to patch up with new rewrite rules.

I’m really proud of this one, even though not everyone may see a use for it. There really really is, and this customer is stoked, think of it like this, how can a developer fix it if he doesn’t have a clear idea of the things that are broken, and this is the sysads job,

Cheers &
Best wishes,

Leave a Reply

Your email address will not be published. Required fields are marked *