Photo by Isaac Quesada / Unsplash

Wordpress site went down due to bots and crawlers like facebookexternalhit and mj12bot

Tech Issues Jul 3, 2024

One of my wordpress site went down 20 days back and it is in the construction phase so i was less cared and didn't check for a while.

Last weekend I just tried to ssh the server and could not connect to it which is hosted in aws lightsail. I restarted the server and after 5 mins I got into the server and checked all the services like webmin, mysqld etc. Everything looks fine and the site came up and was able to browse the pages. But, after 5 mins again it went down and I went and checked I was able to ssh and was curious is there any malicious script is blocking the site or keeping the server busy, got no clue.

Finally today I found in the access log of the site and understood that it was under attack by crawlers and bots of facebookexternalhit and mj12bot . With the below given links I was not able to bring down the traffic. Like htaccess or robots.txt file did not help.

85.208.96.204 - - [03/Jul/2024:04:41:48 +0000] "GET /shop/page/1/?filter_flavor=robust%2Chigh%2Cvery-low%2Coriginal%2Csugar-free%2Clow&lay_style=3&query_typ
e_flavor=or&shop_layout=4 HTTP/1.1" 301 4326 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"
95.91.111.56 - - [03/Jul/2024:04:41:47 +0000] "GET /size/5-5oz-20-servings/?filter_flavor=sugar-free,very-low&query_type_flavor=or&lay_style=1 HTTP/1.1" 200
 252260 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"

https://webmasters.stackexchange.com/questions/129316/trying-to-determine-if-bot-crawling-my-site-is-malicious-mj12bot

https://wordpress.org/support/topic/facebookexternalhit-1-1-thousands-of-requests/

https://stackoverflow.com/questions/9773954/why-facebook-is-flooding-my-site

https://developers.facebook.com/community/threads/974370274080457/

We hosted our dns in cloudflare and it sparked the idea to explore and see any option for this. Found the WAF rules will help to sort the issue . Configured the custom rules to block the useragent of above mentioned bots. Still the traffic did not came down. After a while understood that to use the cloudflare rules , I should enable the proxy in DNS entries. Soon after this the traffic literally came down and I could see there are no entries anymore regarding the bots. Uff !!!

If any of you face such issues do let me know in the comments how did you solve or even you can suggest good solutions to the above said issue.

Tags