Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout what to do

Baidu Spider captures diagnostic exception information: What should I do if the socket reads and writes incorrectly?

Assuming that your website has not been included by Baidu, you must first perform spider crawling diagnosis on the Baidu search resource platform.

What should I do if Baidu crawler fails to crawl diagnostic links?

If the Baidu crawler crawl diagnosis fails several times, the firewall may have blocked the crawler program.

Baidu Search Resource Platform > Crawl Diagnosis > Crawl Exception Information: Socket read and write errors ▼

Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout what to do

  • Especially when using Cloudflare CDN, it is blocked by default.
  • On the Internet, it is said to add the IP address xxx.xxx.xxx.xxx/24
  • However, tried that to no avail.

I didn't block Baidu spiders on the server, so the problem should be Cloudflare's WAF!

Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rule

  • Find the WAF rules related to crawlers on Cloudflare, and found the option of "legitimate robot crawler" ▼

What's wrong with Baidu crawler Sitemap crawling failure and connection timeout?sheet 2

    • After creating the firewall rules, wait for 10 minutes, and then grab the diagnosis, and all of them were successfully captured!

What's wrong with Baidu crawler Sitemap crawling failure and connection timeout?

If you submit the address of the Sitemap file on the Baidu search resource platform, there will be problems such as crawling failure and connection timeout ▼

Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout what to do

Solution to the failure of Baidu crawler to grab the Sitemap map

Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rules ▼

  1. field, select "User Agent"
  2. operator, select Contains
  3. Add a new user agent, click the last "Or"
  4. Value, respectively enter the following Baidu Spider UA user agent:
    • Baiduspider/2.0
    • Baiduspider-image
    • Baiduspider-render/2.0
    • http://www.baidu.com/search/spider.html
    • Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
    • Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

    After the completion, test the fetching again, and the result returns HTTP header 200, indicating that the fetching is successful▼

    • 抓取诊断 > 抓取详情
      以下是百度Spider抓取结果及页面信息:
    • 提交网址: https://www.etufo.org/sitemap_baidu.xml
    • 抓取网址: https://www.etufo.org/sitemap_baidu.xml
    • 抓取UA: Mozilla/5.0 (compatible; Baiduspider/2.0;
    • +http://www.baidu.com/search/spider.html)
    • 抓取时间: 2022-11-11 19:03:44
    • 网站IP: 172.***.***.149
    • 下载时长: 0.868秒
    • 返回HTTP头:HTTP/2 200

    The user agents of other spiders and crawlers can also search for themselves in the same way.

    Hope Chen Weiliang Blog ( https://www.chenweiliang.com/ ) shared "Baidu Spider Crawl Failure Diagnosis Abnormal Information What to Do if Socket Read and Write Error Connection Timed Out", which is helpful to you.

    Welcome to share the link of this article:https://www.chenweiliang.com/cwl-29315.html

    Welcome to the Telegram channel of Chen Weiliang's blog to get the latest updates!

    🔔 Be the first to get the valuable "ChatGPT Content Marketing AI Tool Usage Guide" in the channel top directory! 🌟
    📚 This guide contains huge value, 🌟This is a rare opportunity, don’t miss it! ⏰⌛💨
    Share and like if you like!
    Your sharing and likes are our continuous motivation!

     

    Comment

    Your email address will not be published. Required fields * Callout

    scroll to top