python 模拟网站登录——滑块验证码的识别

更新时间：2021年03月17日 17:21:00 作者：可爱的黑精灵

这篇文章主要介绍了python 模拟网站登录——滑块验证码的识别，帮助大家更好的理解和学习使用python的爬虫技术，感兴趣的朋友可以了解下

普通滑动验证

以http://admin.emaotai.cn/login.aspx为例这类验证码只需要我们将滑块拖动指定位置，处理起来比较简单。拖动之前需要先将滚动条滚动到指定元素位置。

import time
from selenium import webdriver
from selenium.webdriver import ActionChains

# 新建selenium浏览器对象，后面是geckodriver.exe下载后本地路径
browser = webdriver.Firefox()

# 网站登陆页面
url = 'http://admin.emaotai.cn/login.aspx'

# 浏览器访问登录页面
browser.get(url)

browser.maximize_window()

browser.implicitly_wait(5)


draggable = browser.find_element_by_id('nc_1_n1z')

# 滚动指定元素位置
browser.execute_script("arguments[0].scrollIntoView();", draggable)

time.sleep(2)

ActionChains(browser).click_and_hold(draggable).perform()

# 拖动
ActionChains(browser).move_by_offset(xoffset=247, yoffset=0).perform()

ActionChains(browser).release().perform()

拼图滑动验证

我们以欧模网很多网站使用的都是类似的方式。因为验证码及拼图都有明显明亮的边界,图片辨识度比较高。所以我们尝试先用cv2的边缘检测识别出边界，然后进行模糊匹配，匹配出拼图在验证码图片的位置。

边缘检测

cv2模块提供了多种边缘检测算子，包括Sobel、Scharr、Laplacian、prewitt、Canny或Marr—Hildreth等，每种算子得出的结果不同。这里我们用Canny算子，测试了很多算子，这种效果最好。

我们通过一个程序调整一下canny算子的阈值，使得输出图片只包含拼图轮廓。

import cv2

lowThreshold = 0
maxThreshold = 100

# 最小阈值范围 0 ~ 500
# 最大阈值范围 100 ~ 1000

def canny_low_threshold(intial):
  blur = cv2.GaussianBlur(img, (3, 3), 0)
  canny = cv2.Canny(blur, intial, maxThreshold)
  cv2.imshow('canny', canny)


def canny_max_threshold(intial):
  blur = cv2.GaussianBlur(img, (3, 3), 0)
  canny = cv2.Canny(blur, lowThreshold, intial)
  cv2.imshow('canny', canny)


# 参数0以灰度方式读取
img = cv2.imread('vcode.png', 0)

cv2.namedWindow('canny', cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)
cv2.createTrackbar('Min threshold', 'canny', lowThreshold, max_lowThreshold, canny_low_threshold)
cv2.createTrackbar('Max threshold', 'canny', maxThreshold, max_maxThreshold, canny_max_threshold)
canny_low_threshold(0)

# esc键退出
if cv2.waitKey(0) == 27:
  cv2.destroyAllWindows()

测试了若干个图片发现最小阈值100、最大阈值500输出结果比较理想。

拼图匹配

我们用cv2的matchTemplate方法进行模糊匹配，匹配方法用CV_TM_CCOEFF_NORMED归一化相关系数匹配。

几种方法算法详见。

【1】平方差匹配 method=CV_TM_SQDIFF square dirrerence(error)
这类方法利用平方差来进行匹配,最好匹配为0.匹配越差,匹配值越大.
【2】标准平方差匹配 method=CV_TM_SQDIFF_NORMED standard square dirrerence(error)
【3】相关匹配 method=CV_TM_CCORR
这类方法采用模板和图像间的乘法操作,所以较大的数表示匹配程度较高,0标识最坏的匹配效果.
【4】标准相关匹配 method=CV_TM_CCORR_NORMED
【5】相关匹配 method=CV_TM_CCOEFF
这类方法将模版对其均值的相对值与图像对其均值的相关值进行匹配,1表示完美匹配,
-1表示糟糕的匹配,0表示没有任何相关性(随机序列).
【6】标准相关匹配 method=CV_TM_CCOEFF_NORMED

canndy_test.py:

import cv2
import numpy as np

def matchImg(imgPath1,imgPath2):

  imgs = []

  # 原始图像，用于展示
  sou_img1 = cv2.imread(imgPath1)
  sou_img2 = cv2.imread(imgPath2)

  # 原始图像，灰度
  # 最小阈值100,最大阈值500
  img1 = cv2.imread(imgPath1, 0)
  blur1 = cv2.GaussianBlur(img1, (3, 3), 0)
  canny1 = cv2.Canny(blur1, 100, 500)
  cv2.imwrite('temp1.png', canny1)

  img2 = cv2.imread(imgPath2, 0)
  blur2 = cv2.GaussianBlur(img2, (3, 3), 0)
  canny2 = cv2.Canny(blur2, 100, 500)
  cv2.imwrite('temp2.png', canny2)

  target = cv2.imread('temp1.png')
  template = cv2.imread('temp2.png')

  # 调整显示大小
  target_temp = cv2.resize(sou_img1, (350, 200))
  target_temp = cv2.copyMakeBorder(target_temp, 5, 5, 5, 5, cv2.BORDER_CONSTANT, value=[255, 255, 255])

  template_temp = cv2.resize(sou_img2, (200, 200))
  template_temp = cv2.copyMakeBorder(template_temp, 5, 5, 5, 5, cv2.BORDER_CONSTANT, value=[255, 255, 255])

  imgs.append(target_temp)
  imgs.append(template_temp)

  theight, twidth = template.shape[:2]

  # 匹配拼图
  result = cv2.matchTemplate(target, template, cv2.TM_CCOEFF_NORMED)

  # 归一化
  cv2.normalize( result, result, 0, 1, cv2.NORM_MINMAX, -1 )

  min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

  # 匹配后结果画圈
  cv2.rectangle(target,max_loc,(max_loc[0]+twidth,max_loc[1]+theight),(0,0,255),2)


  target_temp_n = cv2.resize(target, (350, 200))
  target_temp_n = cv2.copyMakeBorder(target_temp_n, 5, 5, 5, 5, cv2.BORDER_CONSTANT, value=[255, 255, 255])

  imgs.append(target_temp_n)

  imstack = np.hstack(imgs)

  cv2.imshow('stack'+str(max_loc), imstack)

  cv2.waitKey(0)
  cv2.destroyAllWindows()



matchImg('vcode_data/out_'+str(1)+'.png','vcode_data/in_'+str(1)+'.png')

我们测试几组数据，发现准确率拿来玩玩尚可。max_loc就是匹配出来的位置信息，我们只需要按照位置进行拖动即可。

完整程序

完整流程

1.实例化浏览器

2.点击登陆，弹出滑动验证框

3.分别新建标签页打开背景图及拼图

4.全屏截图后按照尺寸裁剪

5.模糊匹配两张图片，获取匹配结果位置信息

6.将位置信息转为页面上的位移距离

7.拖动滑块到指定位置

import time
import cv2
import canndy_test
from selenium import webdriver
from selenium.webdriver import ActionChains

# 新建selenium浏览器对象，后面是geckodriver.exe下载后本地路径
browser = webdriver.Firefox()

# 网站登陆页面
url = 'https://www.om.cn/login'

# 浏览器访问登录页面
browser.get(url)

handle = browser.current_window_handle

# 等待3s用于加载脚本文件
browser.implicitly_wait(3)

# 点击登陆按钮，弹出滑动验证码
btn = browser.find_element_by_class_name('login_btn1')
btn.click()

# 获取iframe元素，切到iframe
frame = browser.find_element_by_id('tcaptcha_iframe')
browser.switch_to.frame(frame)

time.sleep(1)

# 获取背景图src
targetUrl = browser.find_element_by_id('slideBg').get_attribute('src')

# 获取拼图src
tempUrl = browser.find_element_by_id('slideBlock').get_attribute('src')


# 新建标签页
browser.execute_script("window.open('');")
# 切换到新标签页
browser.switch_to.window(browser.window_handles[1])

# 访问背景图src
browser.get(targetUrl)
time.sleep(3)
# 截图
browser.save_screenshot('temp_target.png')

w = 680
h = 390

img = cv2.imread('temp_target.png')

size = img.shape

top = int((size[0] - h) / 2)
height = int(h + ((size[0] - h) / 2))
left = int((size[1] - w) / 2)
width = int(w + ((size[1] - w) / 2))

cropped = img[top:height, left:width]

# 裁剪尺寸
cv2.imwrite('temp_target_crop.png', cropped)

# 新建标签页
browser.execute_script("window.open('');")

browser.switch_to.window(browser.window_handles[2])

browser.get(tempUrl)
time.sleep(3)

browser.save_screenshot('temp_temp.png')

w = 136
h = 136

img = cv2.imread('temp_temp.png')

size = img.shape

top = int((size[0] - h) / 2)
height = int(h + ((size[0] - h) / 2))
left = int((size[1] - w) / 2)
width = int(w + ((size[1] - w) / 2))

cropped = img[top:height, left:width]

cv2.imwrite('temp_temp_crop.png', cropped)

browser.switch_to.window(handle)

# 模糊匹配两张图片
move = canndy_test.matchImg('temp_target_crop.png', 'temp_temp_crop.png')

# 计算出拖动距离
distance = int(move / 2 - 27.5) + 2

draggable = browser.find_element_by_id('tcaptcha_drag_thumb')

ActionChains(browser).click_and_hold(draggable).perform()

# 拖动
ActionChains(browser).move_by_offset(xoffset=distance, yoffset=0).perform()

ActionChains(browser).release().perform()

time.sleep(10)

tips：可能会存在第一次不成功的情况，虽然拖动到了指定位置但是提示网络有问题、拼图丢失。可以进行循环迭代直到拼成功为止。通过判断iframe中id为slideBg的元素是否存在，如果成功了则不存在，失败了会刷新拼图让你重新拖动。

 if(isEleExist(browser,'slideBg')):
    # retry
  else:
    return

def isEleExist(browser,id):
  try:
    browser.find_element_by_id(id)
    return True
  except:
    return False

以上就是python 模拟网站登录——滑块验证码的识别的详细内容，更多关于python 模拟网站登录的资料请关注脚本之家其它相关文章！

您可能感兴趣的文章:

Python中字符串和列表去重方法总结
这篇文章主要为大家整理了Python中实现字符串和列表去重的常用方法，文中的示例代码讲解详细，对我们深入了解Python有一定的帮助，感兴趣的可以了解一下
2023-04-04
python 计算数据偏差和峰度的方法
今天小编就为大家分享一篇python 计算数据偏差和峰度的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2019-06-06
python自动化测试selenium核心技术处理弹框
这篇文章主要为大家介绍了python自动化测试selenium核心技术处理弹框的示例详解，有需要的朋友可以借鉴参考下，希望能够有所帮助
2021-11-11
Python使用configparser读取ini配置文件
这篇文章主要介绍了Python使用configparser读取ini配置文件,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
2020-05-05
python 使用 with open() as 读写文件的操作方法
这篇文章主要介绍了python 使用 with open()as 读写文件的操作代码，写文件和读文件是一样的，唯一区别是调用open()函数时，传入标识符'w'或者'wb'表示写文本文件或写二进制文件,需要的朋友可以参考下
2022-11-11
Python学习之集合的常用方法总结
集合并不是一种数据处理类型，而是一种中间类型。集合(set)是一个无序、不重复的元素序列，经常被用来处理两个列表进行交并差的处理性。本文将详细讲解集合的一些常用方法，感兴趣的可以了解一下
2022-03-03
python使用 zip 同时迭代多个序列示例
这篇文章主要介绍了python使用 zip 同时迭代多个序列,结合实例形式分析了Python使用zip遍历迭代长度相等与不等的序列相关操作技巧,需要的朋友可以参考下
2019-07-07
Python2.x中文乱码问题解决方法
这篇文章主要介绍了Python2.x中文乱码问题解决方法,本文解释问题原因、给出了处理办法并讲解了编码解码的一些知识,需要的朋友可以参考下
2015-06-06
Python Pytorch深度学习之核心小结
今天小编就为大家分享一篇关于Pytorch核心小结的文章，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2021-10-10
python解决网站的反爬虫策略总结
网站做了很多反爬虫工作，爬起来有些艰难，本文详细介绍了python解决网站的反爬虫策略，有需要的小伙伴可以参考下。
2016-10-10