Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout what to do

Baidu Spider captures diagnostic exception information: What should I do if the socket reads and writes incorrectly?

Assuming that your website has not been included by Baidu, you must first perform spider crawling diagnosis on the Baidu search resource platform.

What should I do if Baidu crawler fails to crawl diagnostic links?

If the Baidu crawler crawl diagnosis fails several times, the firewall may have blocked the crawler program.

Baidu Search Resource Platform > Crawl Diagnosis > Crawl Exception Information: Socket read and write errors ▼

Solve the problem of Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout

  • Especially when using Cloudflare CDN, it is blocked by default.
  • On the Internet, it is said to add the IP address xxx.xxx.xxx.xxx/24
  • However, tried that to no avail.

I didn't block Baidu spiders on the server, so the problem should be Cloudflare's WAF!

Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rule

  • Find the WAF rules related to crawlers on Cloudflare, and found the option of "legitimate robot crawler" ▼

What's wrong with Baidu crawler Sitemap crawling failure and connection timeout?sheet 2

    • After creating the firewall rules, wait for 10 minutes, and then grab the diagnosis, and all of them were successfully captured!

What's wrong with Baidu crawler Sitemap crawling failure and connection timeout?

If you submit the address of the Sitemap file on the Baidu search resource platform, there will be problems such as crawling failure and connection timeout ▼

Baidu spider crawling failure diagnosis abnormal information socket read and write error connection timeout what to do

Solution to the failure of Baidu crawler to grab the Sitemap map

Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rules ▼

  1. field, select "User Agent"
  2. operator, select Contains
  3. Add a new user agent, click the last "Or"
  4. Value, respectively enter the following Baidu Spider UA user agent:
    • Baiduspider/2.0
    • Baiduspider-image
    • Baiduspider-render/2.0
    • http://www.baidu.com/search/spider.html
    • Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
    • Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

    After the completion, test the fetching again, and the result returns HTTP header 200, indicating that the fetching is successful▼

    • 抓取诊断 > 抓取详情
      以下是百度Spider抓取结果及页面信息:
    • 提交网址: https://www.etufo.org/sitemap_baidu.xml
    • 抓取网址: https://www.etufo.org/sitemap_baidu.xml
    • 抓取UA: Mozilla/5.0 (compatible; Baiduspider/2.0;
    • +http://www.baidu.com/search/spider.html)
    • 抓取时间: 2022-11-11 19:03:44
    • 网站IP: 172.***.***.149
    • 下载时长: 0.868秒
    • 返回HTTP头:HTTP/2 200

    The user agents of other spiders and crawlers can also search for themselves in the same way.

    Comment

    Your email address will not be published. Required fields * Callout

    Scroll to Top