Article directory
Baidu Spider captures diagnostic exception information: What should I do if the socket reads and writes incorrectly?
Assuming that your website has not been included by Baidu, you must first perform spider crawling diagnosis on the Baidu search resource platform.
What should I do if Baidu crawler fails to crawl diagnostic links?
If the Baidu crawler crawl diagnosis fails several times, the firewall may have blocked the crawler program.
Baidu Search Resource Platform > Crawl Diagnosis > Crawl Exception Information: Socket read and write errors ▼

- Especially when using Cloudflare CDN, it is blocked by default.
- On the Internet, it is said to add the IP address
xxx.xxx.xxx.xxx/24 - However, tried that to no avail.
I didn't block Baidu spiders on the server, so the problem should be Cloudflare's WAF!
Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rule
- Find the WAF rules related to crawlers on Cloudflare, and found the option of "legitimate robot crawler" ▼

- After creating the firewall rules, wait for 10 minutes, and then grab the diagnosis, and all of them were successfully captured!
What's wrong with Baidu crawler Sitemap crawling failure and connection timeout?
If you submit the address of the Sitemap file on the Baidu search resource platform, there will be problems such as crawling failure and connection timeout ▼

Solution to the failure of Baidu crawler to grab the Sitemap map
Login to Cloudflare → Security → WAF → Firewall Rules → Create Firewall Rules ▼

- field, select "User Agent"
- operator, select Contains
- Add a new user agent, click the last "Or"
- Value, respectively enter the following Baidu Spider UA user agent:
-
Baiduspider/2.0 -
Baiduspider-image -
Baiduspider-render/2.0 -
http://www.baidu.com/search/spider.html -
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) -
Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
After the completion, test the fetching again, and the result returns HTTP header 200, indicating that the fetching is successful▼
-
抓取诊断 > 抓取详情以下是百度Spider抓取结果及页面信息: -
提交网址: https://www.etufo.org/sitemap_baidu.xml -
抓取网址: https://www.etufo.org/sitemap_baidu.xml -
抓取UA: Mozilla/5.0 (compatible; Baiduspider/2.0; -
+http://www.baidu.com/search/spider.html) -
抓取时间: 2022-11-11 19:03:44 -
网站IP: 172.***.***.149 -
下载时长: 0.868秒 -
返回HTTP头:HTTP/2 200
The user agents of other spiders and crawlers can also search for themselves in the same way.
Hope Chen Weiliang Blog ( https://www.chenweiliang.com/ ) shared "Baidu Spider Crawl Failure Diagnosis Abnormal Information What to Do if Socket Read and Write Error Connection Timed Out", which is helpful to you.
Welcome to share the link of this article:https://www.chenweiliang.com/cwl-29315.html
To unlock more hidden tricks🔑, welcome to join our Telegram channel!
If you like it, please share and like it! Your sharing and likes are our continuous motivation!