2 Star 7 Fork 15

ayuliao/AntiCrawlers

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
案例5-关键数据图片化反爬.py 1.36 KB
一键复制 编辑 原始数据 按行查看 历史
二两的分身 提交于 2021-06-28 15:02 +08:00 . enjoy code
import re
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
import pytesseract
from PIL import Image
cookies = {
'session': '.eJyrViotTi1SsqpWyiyOT0zJzcxTsjLQUcrJTwexSopKU3WUcvOTMnNSlayUDM3gQEkHrDE-M0XJyhjCzkvMBSmKKTVNMjMDkiamFkq1tQDfeR3n.YLOC4w.Xbnx1QbrvUh8OUPb5jauC_Aau9U',
}
headers = {
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
}
root_url = 'http://47.103.13.124:8001/'
url = urljoin(root_url, 'phone_picture')
response = requests.get(url, headers=headers, cookies=cookies, verify=False)
soup = BeautifulSoup(response.text, 'lxml')
imgs = soup.find('tbody').find_all('img')
urls = [img.attrs.get('src') for img in imgs]
for url in urls:
# 获得完整的url
url = urljoin(root_url, url)
img = requests.get(url, stream=True).raw
img = Image.open(img)
res = pytesseract.image_to_string(img)
phone_number = re.match('\d*', res).group()
print(f'URL: {url}, OCR Result: {res}, PhoneNumber: {phone_number}')
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/ayuLiao/anti-crawlers.git
git@gitee.com:ayuLiao/anti-crawlers.git
ayuLiao
anti-crawlers
AntiCrawlers
master

搜索帮助