So I run a server at home hosting several sites, including a few personal sandboxes for development. I went to check one of sites, and noticed it was taking an incredibly long time to load. When I logged into the machine, I got the dreaded Usage of /: 99.9% of 27.50GB disk space error. Now, this can happen, and for all sorts of reasons. Maybe a program got overzealous writing out logs, or backing up files, or any one of a thousand rouge operations. I have run into this error before, and thankfully I had a few quick commands to run to determine the reason.
Find Largest Folders
du -kx / |sort -nr|head -10
Find Largest Files
find . -type f -exec ls -s {} \; | sort -n -r | head -5
Find Largest Folders and Files
du -a /var | sort -n -r | head -n 10
Problem
Eventually I found the culprit in my apache error and access logs. After some poking around in the huge files and rolled over logs, I determined that some bots were trying to gain access to the admin sections of one of my hosted sites. I found hundreds of thousands of lines looking like the below (identifying information has been removed).
[Fri Aug 16 22:34:52 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/phpMyAdmin [Fri Aug 16 22:34:52 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/PMA [Fri Aug 16 22:34:52 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/pma [Fri Aug 16 22:34:53 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/admin [Fri Aug 16 22:34:53 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/dbadmin [Fri Aug 16 22:34:53 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/sql [Fri Aug 16 22:34:53 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/mysql [Fri Aug 16 22:34:54 2013] [error] [client XX.XX.XX.XX] File does not exist: /var/www/XXXXXXXXXX/myadmin
Not only were these attacks filling up my server with garbage in the log files, but they were causing degradation in page load times. I had already setup a robots.txt file to keep proper crawlers in the correct locations, so I needed to come up with a better solution.
Solution
After much analysis of the issue, I determined that all of these attacks were coming from overseas from Chinese and Russian IPs based on IP lookups. After some more research I decided to try implementing mod_geoip2. While this will disallow any traffic from the countries I will disallow, I examined the usual traffic using Google Analytics, and determine that no traffic generally originates from either country I was having issues with. I followed the installation instructions on maxmind. Based on the examples provided, I setup an .htaccess file looking like the below:
GeoIPEnable On GeoIPScanProxyHeaders On GeoIPDBFile /etc/apache2/GeoIP.dat SetEnvIf GEOIP_COUNTRY_CODE CN BlockCountry SetEnvIf GEOIP_COUNTRY_CODE RU BlockCountry Deny from env=BlockCountry
Conclusion
So far my logs have been clean. Other than the occasional bad link from old/moved/cached files, I have had no requests for files that don’t exist. Performance issues can be a problem, but I haven’t noticed any degraded performance within the system. In fact, I’ve noticed much quicker response from the pages loading.