python selenium有一个好处就是真正模拟浏览器访问,坏处就是慢
我们先通过bing搜索抓取它的数据源
就是这个:
然后用selenium+lxml进行抓取和解析
代码如下:
import time
from lxml import etree
from selenium import webdriver
edge=webdriver.Edge()
url="https://cn.bing.com/images/async?q=%E4%BA%8C%E6%AC%A1%E5%85%83%E5%9B%BE%E7%89%87&first=1&count=100&cw=1177&ch=938&relp=35&datsrc=I&layout=ColumnBased&apc=0&mmasync=1&dgState=c*6_y*1664s1912s1975s1789s1831s1804_i*36_w*186&IG=A6460BCE95654EA3990CEED39809611F&SFX=2&iid=images.5562"
edge.get(url)
time.sleep(3)
content=etree.HTML(edge.page_source)
detail=content.xpath("//img/@src")
f=open("bimg.txt","a",encoding="UTF-8")
for i in detail:
f.write(f"{i}\n")
f.close()