在 python 和 powershell 中读取文件答案

【问题标题】：reading in files in python and powershell在 python 和 powershell 中读取文件
【发布时间】：2016-05-19 17:23:48
【问题描述】：

我正在将一堆脚本从 PowerShell 迁移到 Python，以实现更多跨平台兼容性。但是我在读取文件并获得相同的哈希时遇到了问题。在 PowerShell 中，我使用的函数是：

Function Get-Hash {
param (
    [string]$someText
)
$hasher = new-object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider
$utf8 = new-object -TypeName System.Text.UTF8Encoding
return ([System.BitConverter]::ToString($hasher.ComputeHash($utf8.GetBytes($someText)))).Replace("-","")
}

在 Python 中我这样做：

import hashlib
hashlib.sha1(some_text.encode('utf-8')).hexdigest().upper()

使用这两个函数，我可以得到一个字符串的相同哈希值。例如，执行此操作时哈希匹配：

#Powershell:
get-hash -someText 'testing'
#Python:
hashlib.sha1('testing'.encode('utf-8')).hexdigest().upper()

但是，当我尝试读取包含换行符的文件时会出现问题：

#Powershell:
$fileContent = get-content 'c:\path\to\file.txt'
get-hash -someText $fileContent

#Python:
with open('c:\path\to\file.txt', 'r', encoding='utf-8') as file: 
    file_content = file.read()
hashlib.sha1(file_content.encode('utf-8')).hexdigest().upper()

哈希值不一样。我认为这是我在文件中读取的方式，但我似乎无法让它们匹配。

【问题讨论】：

为什么在对文件进行哈希处理之前对其进行编码？将其读取为二进制数据并对原始二进制数据进行哈希处理。
你能详细说明一下吗？我在 Python 'Unicode 对象必须在散列之前编码'hashlib.sha1(file_content).hexdigest().upper() 中遇到错误
hashlib.sha1(open('c:\\path\\to\\file.txt','rb').read()).hexdigest().upper()
但是您仍然可能得不到相同的值，因为您在 PowerShell 中对包含文件数据的编码字符串进行哈希处理，而在这里您正在对文件的 binary 内容进行哈希处理，即散列文件的推荐方法

标签： python powershell hash sha1

【解决方案1】：

你试过脱衣吗？喜欢file_content.strip() ?
或者尝试删除换行符：''.join((c for c in file_content if c != '\n'))

【讨论】：

.strip() 给我的哈希值与没有 .strip() 的情况相同，删除换行符给我的哈希值不同，但与 Powershell 函数不同。