好的,有足够多的人弄错了,我将发布一些我必须识别 TIFF 的代码:
private const int kTiffTagLength = 12;
private const int kHeaderSize = 2;
private const int kMinimumTiffSize = 8;
private const byte kIntelMark = 0x49;
private const byte kMotorolaMark = 0x4d;
private const ushort kTiffMagicNumber = 42;
private bool IsTiff(Stream stm)
{
stm.Seek(0);
if (stm.Length < kMinimumTiffSize)
return false;
byte[] header = new byte[kHeaderSize];
stm.Read(header, 0, header.Length);
if (header[0] != header[1] || (header[0] != kIntelMark && header[0] != kMotorolaMark))
return false;
bool isIntel = header[0] == kIntelMark;
ushort magicNumber = ReadShort(stm, isIntel);
if (magicNumber != kTiffMagicNumber)
return false;
return true;
}
private ushort ReadShort(Stream stm, bool isIntel)
{
byte[] b = new byte[2];
_stm.Read(b, 0, b.Length);
return ToShort(_isIntel, b[0], b[1]);
}
private static ushort ToShort(bool isIntel, byte b0, byte b1)
{
if (isIntel)
{
return (ushort)(((int)b1 << 8) | (int)b0);
}
else
{
return (ushort)(((int)b0 << 8) | (int)b1);
}
}
我破解了一些更通用的代码来得到这个。
对于 PDF,我的代码如下所示:
public bool IsPdf(Stream stm)
{
stm.Seek(0, SeekOrigin.Begin);
PdfToken token;
while ((token = GetToken(stm)) != null)
{
if (token.TokenType == MLPdfTokenType.Comment)
{
if (token.Text.StartsWith("%PDF-1."))
return true;
}
if (stm.Position > 1024)
break;
}
return false;
}
现在,GetToken() 是对扫描仪的调用,它将流标记为 PDF 令牌。这很重要,所以我不打算在这里粘贴它。我正在使用标记器而不是查看子字符串来避免这样的问题:
% the following is a PostScript file, NOT a PDF file
% you'll note that in our previous version, it started with %PDF-1.3,
% incorrectly marking it as a PDF
%
clippath stroke showpage
上面的代码 sn-p 将这段代码标记为不是 PDF,而更简单的代码块会错误地将其标记为 PDF。
我还应该指出,当前的 ISO 规范没有 Adobe 以前拥有的规范中的实施说明。最重要的是来自 PDF 参考,1.6 版:
Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.