您现在的位置是:首页 >技术交流 >使用 图灵验证码识别平台+Python+Selenium,智能识别B站/bilibili的中文验证码,并实现自动登陆网站首页技术交流
使用 图灵验证码识别平台+Python+Selenium,智能识别B站/bilibili的中文验证码,并实现自动登陆
一直想用python写一个程序帮我自动登陆B站,完成一些点击任务,懂的都懂 =v=
最近终于腾出时间来搞了,其实最难的部分就是中文验证码的识别。这个借助API接口也能轻松搞定。下面分享一下全部源码(前面是过程讲解,只需要全部源码的可以直接翻到最后):
首先是导入需要的库,定义需要的常量:
import os.path
import random
import time
from selenium import *
from selenium import webdriver
from PIL import Image
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import base64
import json
import requests
from selenium.webdriver import ActionChains
url='https://passport.bilibili.com/login'
driver = webdriver.Chrome(executable_path=r"C:UsersASUSAppDataLocalGoogleChromeApplicationchromedriver.exe")
driver.set_window_size(1100, 958)
bool=True
注意,上面的executable_path需要改成你自己 chromedriver.exe 的位置!chromedriver.exe是selenium驱动chrome游览器的必要组件,所以一定是需要的。
另外,window_size(1100, 958) 这里不要改,因为和后面的数字有对应关系,要改也可以,后面截图裁减图片的数字你也要对应修改。
while bool:
driver.get(url)
'''填写用户名和密码'''
xpath='//*[@id="login-username"]'
driver.find_element_by_xpath(xpath).send_keys('your_username')
xpath='//*[@id="login-passwd"]'
driver.find_element_by_xpath(xpath).send_keys('your_password')
'''点击登录'''
time.sleep(0.5)
class_name='btn-login'
driver.find_element_by_class_name(class_name).click()
这里 your_username 换成你的B站登陆用户名,your_password 换成你的登陆密码。
'''创建文件保存验证码'''
image_name=str(int(1000000 * time.time()))+'.png'
if not os.path.exists("yzm_small"):
os.mkdir("yzm_small")
if not os.path.exists("yzm_large"):
os.mkdir("yzm_large")
'''等待验证码图像出现'''
class_name='geetest_tip_img'
try:
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME,class_name)))
except Exception as e:
print(e)
continue
'''截取验证码图片,并分为大图和小图验证码'''
time.sleep(0.5)
driver.save_screenshot('yzm.png')
'''这里计算屏幕的拉伸率,不同电脑的windows缩放比例会影响到截图的像素点位置'''
width=Image.open('yzm.png').size[0]
stretch_rate=width/1100
print('当前屏幕的缩放比例为:'+str(stretch_rate))
img = Image.open('yzm.png')
cropped = img.crop((800*stretch_rate,246*stretch_rate,925*stretch_rate,283*stretch_rate)) # (left, upper, right, lower)
cropped.save('yzm_small/'+image_name)
cropped = img.crop((668.5*stretch_rate,287*stretch_rate,925*stretch_rate,547*stretch_rate)) # (left, upper, right, lower)
cropped.save('yzm_large/'+image_name)
这里是获取验证码的关键步骤,我是通过游览器屏幕直接截取的,所以如果你要直接拿来用,一定要注意这个stretch_rate。不同的电脑不一样(屏幕拉伸率,可以在你的电脑设置里面看到),有可能造成截图验证码截不到正确的位置,所以我这里添加了stretch_rate修正过了,除非网站本身发生变动,否则截取到需要的验证码是没有任何问题的。
接下来就是最麻烦的中文验证码识别了。这里调用的是图灵验证码识别平台,这也是我全网唯一找到能够准确识别中文验证码的平台了。(主要是便宜 汗……)
在线图片验证码识别平台-验证码打码平台-中文验证码图像识别平台-图灵
官网网址:http://fdyscloud.com.cn/
选择中文验证码识别模型,找到自己需要的模型ID:
我们需要采用的是模型12和9,分别来识别验证码小图和大图:
API的调用方式就不赘述了,自己去看图灵验证码识别网站上已经写得很详细了,直接贴代码:
'''使用图灵验证码识别平台,进行验证码识别'''
'''图灵验证码识别平台: http://www.fdyscloud.com.cn/static/index.html '''
def tuling_api(username, password, img_path, ID):
with open(img_path, 'rb') as f:
b64_data = base64.b64encode(f.read())
b64 = b64_data.decode()
data = {"username": username, "password": password, "ID": ID, "b64": b64}
data_json = json.dumps(data)
result = json.loads(requests.post("http://www.fdyscloud.com.cn/tuling/predict", data=data_json).text)
return result
'''小图部分识别'''
img_path = 'yzm_small/'+image_name
result_small = tuling_api(username="你的图灵验证码识别平台账号", password="你的图灵验证码识别平台密码", img_path=img_path, ID="02156188")
result_small=result_small['result']
print(result_small)
'''大图部分识别'''
img_path = 'yzm_large/' + image_name
result_large = tuling_api(username="你的图灵验证码识别平台账号", password="你的图灵验证码识别平台密码", img_path=img_path, ID="05156485")
print(result_large)
最后使用selenium完成自动化点击中文汉字:
class_name='geetest_tip_content'
element = driver.find_element_by_class_name(class_name)
try:
for i in range(0, len(result_small)):
result=result_large[result_small[i]]
ActionChains(driver).move_to_element(element).move_by_offset(-70+int(result['X坐标值']/stretch_rate), 24+int(result['Y坐标值'])/stretch_rate).click().perform()
time.sleep(1)
except: continue
class_name = 'geetest_commit_tip'
driver.find_element_by_class_name(class_name).click()
break
'''自动登陆成功!'''
自动登陆成功!!!
接下来贴完整版的全部代码:
import os.path
import random
import time
from selenium import *
from selenium import webdriver
from PIL import Image
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import base64
import json
import requests
from selenium.webdriver import ActionChains
url='https://passport.bilibili.com/login'
driver = webdriver.Chrome(executable_path=r"C:UsersASUSAppDataLocalGoogleChromeApplicationchromedriver.exe")
driver.set_window_size(1100, 958)
bool=True
while bool:
driver.get(url)
'''填写用户名和密码'''
xpath='//*[@id="login-username"]'
driver.find_element_by_xpath(xpath).send_keys('your_username')
xpath='//*[@id="login-passwd"]'
driver.find_element_by_xpath(xpath).send_keys('your_password')
'''点击登录'''
time.sleep(0.5)
class_name='btn-login'
driver.find_element_by_class_name(class_name).click()
'''创建文件保存验证码'''
image_name=str(int(1000000 * time.time()))+'.png'
if not os.path.exists("yzm_small"):
os.mkdir("yzm_small")
if not os.path.exists("yzm_large"):
os.mkdir("yzm_large")
'''等待验证码图像出现'''
class_name='geetest_tip_img'
try:
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME,class_name)))
except Exception as e:
print(e)
continue
'''截取验证码图片,并分为大图和小图验证码'''
time.sleep(0.5)
driver.save_screenshot('yzm.png')
'''这里计算屏幕的拉伸率,不同电脑的windows缩放比例会影响到截图的像素点位置'''
width=Image.open('yzm.png').size[0]
stretch_rate=width/1100
print('当前屏幕的缩放比例为:'+str(stretch_rate))
img = Image.open('yzm.png')
cropped = img.crop((800*stretch_rate,246*stretch_rate,925*stretch_rate,283*stretch_rate)) # (left, upper, right, lower)
cropped.save('yzm_small/'+image_name)
cropped = img.crop((668.5*stretch_rate,287*stretch_rate,925*stretch_rate,547*stretch_rate)) # (left, upper, right, lower)
cropped.save('yzm_large/'+image_name)
'''使用图灵验证码识别平台,进行验证码识别'''
'''图灵验证码识别平台: http://www.tulingtech.xyz/static/index.html '''
def tuling_api(username, password, img_path, ID):
with open(img_path, 'rb') as f:
b64_data = base64.b64encode(f.read())
b64 = b64_data.decode()
data = {"username": username, "password": password, "ID": ID, "b64": b64}
data_json = json.dumps(data)
result = json.loads(requests.post("http://www.fdyscloud.com.cn/tuling/predict", data=data_json).text)
return result
'''小图部分识别'''
img_path = 'yzm_small/'+image_name
result_small = tuling_api(username="你的图灵验证码识别平台账号", password="你的图灵验证码识别平台密码", img_path=img_path, ID="02156188")
result_small=result_small['result']
print(result_small)
'''大图部分识别'''
img_path = 'yzm_large/' + image_name
result_large = tuling_api(username="你的图灵验证码识别平台账号", password="你的图灵验证码识别平台密码", img_path=img_path, ID="05156485")
print(result_large)
class_name='geetest_tip_content'
element = driver.find_element_by_class_name(class_name)
try:
for i in range(0, len(result_small)):
result=result_large[result_small[i]]
ActionChains(driver).move_to_element(element).move_by_offset(-70+int(result['X坐标值']/stretch_rate), 24+int(result['Y坐标值'])/stretch_rate).click().perform()
time.sleep(1)
except: continue
class_name = 'geetest_commit_tip'
driver.find_element_by_class_name(class_name).click()
break
'''自动登陆成功!'''
有问题可以私信我吧,觉得写得不错麻烦给个三连好评哈哈~