python查找指定具有相同内容文件的方法

更新时间：2015年06月28日 12:34:11 作者：秋风秋雨

这篇文章主要介绍了python查找指定具有相同内容文件的方法,涉及Python针对文件操作的相关技巧,需要的朋友可以参考下

本文实例讲述了python查找指定具有相同内容文件的方法。分享给大家供大家参考。具体如下：

python代码用于查找指定具有相同内容的文件，可以同时指定多个目录
调用方式：python doublesdetector.py c:\;d:\;e:\ > doubles.txt

# Hello, this script is written in Python - http://www.python.org
# doublesdetector.py 1.0p
import os, os.path, string, sys, sha
message = """
doublesdetector.py 1.0p
This script will search for files that are identical
(whatever their name/date/time).
 Syntax : python %s <directories>
   where <directories> is a directory or a list of directories
   separated by a semicolon (;)
Examples : python %s c:\windows
      python %s c:\;d:\;e:\ > doubles.txt
      python %s c:\program files > doubles.txt
This script is public domain. Feel free to reuse and tweak it.
The author of this script Sebastien SAUVAGE <sebsauvage at sebsauvage dot net>
http://sebsauvage.net/python/
""" % ((sys.argv[0], )*4)
def fileSHA ( filepath ) :
  """ Compute SHA (Secure Hash Algorythm) of a file.
    Input : filepath : full path and name of file (eg. 'c:\windows\emm386.exe')
    Output : string : contains the hexadecimal representation of the SHA of the file.
             returns '0' if file could not be read (file not found, no read rights...)
  """
  try:
    file = open(filepath,'rb')
    digest = sha.new()
    data = file.read(65536)
    while len(data) != 0:
      digest.update(data)
      data = file.read(65536)
    file.close()
  except:
    return '0'
  else:
    return digest.hexdigest()
def detectDoubles( directories ):
  fileslist = {}
  # Group all files by size (in the fileslist dictionnary)
  for directory in directories.split(';'):
    directory = os.path.abspath(directory)
    sys.stderr.write('Scanning directory '+directory+'...')
    os.path.walk(directory,callback,fileslist)
    sys.stderr.write('\n')
  sys.stderr.write('Comparing files...')
  # Remove keys (filesize) in the dictionnary which have only 1 file
  for (filesize,listoffiles) in fileslist.items():
    if len(listoffiles) == 1:
      del fileslist[filesize]
  # Now compute SHA of files that have the same size,
  # and group files by SHA (in the filessha dictionnary)
  filessha = {}
  while len(fileslist)>0:
    (filesize,listoffiles) = fileslist.popitem()
    for filepath in listoffiles:
      sys.stderr.write('.')
      sha = fileSHA(filepath)
      if filessha.has_key(sha):
        filessha[sha].append(filepath)
      else:
        filessha[sha] = [filepath]
  if filessha.has_key('0'):
    del filessha['0']
  # Remove keys (sha) in the dictionnary which have only 1 file
  for (sha,listoffiles) in filessha.items():
    if len(listoffiles) == 1:
      del filessha[sha]
  sys.stderr.write('\n')
  return filessha
def callback(fileslist,directory,files):
  sys.stderr.write('.')
  for fileName in files:
    filepath = os.path.join(directory,fileName)
    if os.path.isfile(filepath):
      filesize = os.stat(filepath)[6]
      if fileslist.has_key(filesize):
        fileslist[filesize].append(filepath)
      else:
        fileslist[filesize] = [filepath]
if len(sys.argv)>1 :
  doubles = detectDoubles(" ".join(sys.argv[1:]))
  print 'The following files are identical:'
  print '\n'.join(["----\n%s" % '\n'.join(doubles[filesha]) for filesha in doubles.keys()])
  print '----'
else:
  print message

希望本文所述对大家的Python程序设计有所帮助。

您可能感兴趣的文章:

python和C++共享内存传输图像的示例
这篇文章主要介绍了python和C++共享内存传输图像的示例，帮助大家利用python处理图片，感兴趣的朋友可以了解下
2020-10-10
Python selenium的这三种等待方式一定要会!
今天给大家带来的是关于Python的相关知识,文章围绕着selenium三种等待方式展开,文中有非常详细的介绍,需要的朋友可以参考下
2021-06-06
用python实现五子棋实例
这篇文章主要为大家详细介绍了用python实现五子棋实例，文中示例代码介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2022-05-05
使用scrapy ImagesPipeline爬取图片资源的示例代码
这篇文章主要介绍了使用scrapy ImagesPipeline爬取图片资源的示例代码，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2020-09-09
在Python中绘制带有连接线的双饼图(操作代码)
这篇文章主要介绍了如何在Python中绘制带有连接线的双饼图,本文通过实例代码给大家介绍的非常详细，对大家的学习或工作具有一定的参考借鉴价值，需要的朋友可以参考下
2023-05-05
python如何使用腾讯云发送短信
这篇文章主要介绍了python如何使用腾讯云发送短信，帮助大家更好的理解和使用python，感兴趣的朋友可以了解下
2020-09-09
Python编程实现小姐姐跳舞并生成词云视频示例
本文用Python做了一个词云视频，以另一种角度来看小姐姐跳舞视频左半部分是小姐姐跳舞视频，右半部分是根据动作生成的的词云视频，有需要的朋友可以借鉴参考下
2021-10-10
一个Python优雅的数据分块方法详解
在做需求过程中有一个对大量数据分块处理的场景，具体来说就是几十万量级的数据，分批处理，每次处理100个。这时就需要一个分块功能的代码。本文为大家分享了一个Python中优雅的数据分块方法，需要的可以参考一下
2022-05-05
Python运维开发之psutil库的使用详解
这篇文章主要介绍了Python运维开发之psutil库的使用，psutil能够轻松实现获取系统运行的进程和系统利用率。小编觉得挺不错的，现在分享给大家，也给大家做个参考。一起跟随小编过来看看吧
2018-10-10
Python pandas索引的设置和修改方法
索引的作用相当于图书的目录，可以根据目录中的页码快速找到所需的内容,下面这篇文章主要给大家介绍了关于Python pandas索引的设置和修改的相关资料,文中通过实例代码介绍的非常详细,需要的朋友可以参考下
2022-06-06

python查找指定具有相同内容文件的方法

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具