解码base45字符串答案

【问题标题】：Decode base45 string解码base45字符串
【发布时间】：2021-09-07 21:15:42
【问题描述】：

我们正在尝试对新的欧盟冠状病毒测试/疫苗接种证书进行验证，但无法使 base45 解码工作。

规格在这里：https://datatracker.ietf.org/doc/draft-faltstrom-base45/

我们几乎完成了我们的课程，但我们有时会得到错误的值..

目标是这样的：

Encoding example 1: The string "AB" is the byte sequence [65 66].
The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11
+ 45 * 45 * 8 so the sequence in base 45 is [11 11 8].  By looking up
these values in the Table 1 we get the encoded string "BB8".

Encoding example 2: The string "Hello!!" as ASCII is the byte
sequence [72 101 108 108 111 33 33].  If we look at each 16 bit
value, it is [18533 27756 28449 33].  Note the 33 for the last byte.
When looking at the values modulo 45, we get [[38 6 9] [36 31 13] [9
2 14] [33 0]] where the last byte is represented by two.  By looking
up these values in the Table 1 we get the encoded string "%69
VD92EX0".

Encoding example 3: The string "base-45" as ASCII is the byte
sequence [98 97 115 101 45 52 53].  If we look at each 16 bit value,
it is [25185 29541 11572 53].  Note the 53 for the last byte.  When
looking at the values modulo 45, we get [[30 19 12] [21 26 14] [7 32
5] [8 1]] where the last byte is represented by two.  By looking up
these values in the Table 1 we get the encoded string "UJCLQE7W581".

这是我当前的代码，它会产生错误的值：

class Base45

  ALPHABET = {
    "00" => "0",
    "01" => "1",
    "02" => "2",
    "03" => "3",
    "04" => "4",
    "05" => "5",
    "06" => "6",
    "07" => "7",
    "08" => "8",
    "09" => "9",
    "10" => "A",
    "11" => "B",
    "12" => "C",
    "13" => "D",
    "14" => "E",
    "15" => "F",
    "16" => "G",
    "17" => "H",
    "18" => "I",
    "19" => "J",
    "20" => "K",
    "21" => "L",
    "22" => "M",
    "23" => "N",
    "24" => "O",
    "25" => "P",
    "26" => "Q",
    "27" => "R",
    "28" => "S",
    "29" => "T",
    "30" => "U",
    "31" => "V",
    "32" => "W",
    "33" => "X",
    "34" => "Y",
    "35" => "Z",
    "36" => " ",
    "37" => "$",
    "38" => "%",
    "39" => "*",
    "40" => "+",
    "41" => "-",
    "42" => ".",
    "43" => "/",
    "44" => ":"
  }.freeze

  def self.encode_base45(text)
    restsumme = text.unpack('S>*')

    # not sure what this is doing, but without it, it works worse :D
    restsumme << text.bytes[-1] if text.bytes.size > 2 && text.bytes[-1] < 256

    bytearr = restsumme.map do |bytes|
      arr = []
      multiplier, rest = bytes.divmod(45**2)
      arr << multiplier if multiplier > 0

      multiplier, rest = rest.divmod(45)
      arr << multiplier if multiplier > 0
      arr << rest if rest > 0
      arr.reverse
    end
    return bytearr.flatten.map{|a| ALPHABET[a.to_s.rjust(2, "0")]}.join
  end

  def self.decode_base45(text)
    arr = text.split("").map do |char|
      ALPHABET.invert[char]
    end
    textarr = arr.each_slice(3).to_a.map do |group|
      subarr = group.map.with_index do |val, index|
        val.to_i * (45**index)
      end
      ap subarr
      subarr.sum
    end

    return textarr.pack("S>*") # returns wrong values
  end
end

结果：

Base45.encode_base45("AB")
=> "BB8" # works
Base45.decode_base45("BB8")
=> "AB" # works

Base45.encode_base45("Hello!!")
=> "%69 VD92EX" # works
Base45.decode_base45("BB8")
=> "Hello!\x00!" # wrong \x00


Base45.encode_base45("base-45")
=> "UJCLQE7W581" # works
Base45.decode_base45("UJCLQE7W581")
=> "base-4\x005" # wrong \x00

任何提示表示赞赏:(

【问题讨论】：

这是因为pack("S>*") 将数据视为 16 位整数，仅适用于偶数字符/字节的字符串。您必须单独处理额外的字节，就像在编码中一样。（这是 “不确定这是在做什么” 部分）
是的，我已经考虑过了，但我不知道该怎么做..
@BvuRVKyUVlViVIc7 我不认为 "%69 VD92EX" 是一个有效的 base45 字符串。它们的长度 mod 3 应该等于 0 或 2。而你的字符串大小 mod 3 = 1。这是你提到的文档中的一句话：“如果字节数是偶数，那么编码形式是一个长度为的字符串可以被 3 整除。如果字节数是奇数，则最后一个（最右边的）字节被编码为两个字符，如上所述。"

标签： ruby base45

【解决方案1】：

如果你想要一种更灵活的方式：

return textarr.map{|x| x<256 ? [x].pack("C*") : [x].pack("n*") }.join

看看这个方案，感觉像是一种奇怪的编码方式，因为我们正在处理数字......如果是我，我会从字符串的尾部开始，然后向头部工作，但是那是因为我们使用的是数字。

无论如何，我的 bodge 工作的原因是它将小元素/数字视为 8 位无符号而不是 16 位无符号。

...

稍微更悦目，但可能没有更好：

def self.decode_base45(text)
  arr = text.split("").map do |char|
    ALPHABET.invert[char]
  end
  textarr = arr.each_slice(3).to_a.map do |group|
    subarr = group.map.with_index do |val, index|
      val.to_i * (45**index)
    end
    ap subarr
    subarr.sum.divmod(256)
  end.flatten.reject(&:zero?)

  return textarr.pack("C*") # returns wrong values
end

【讨论】：

这仍然为 base-45 测试字符串 ("UJCLQE7W581") 返回 "base-4\x005"？
奇怪的是，我认为它没有在测试中完成。拒绝零应该可以解决它，或者您可以将sum.divmod 位设置为asum=subarr.sum ; asum > 256 ? asum.divmod(256) : asum

【解决方案2】：

在努力获得其他答案后，我根据您的问题和this snippet 制定了自己的方法。上述答案在大多数情况下都有效，但并非在所有情况下都有效，尤其是当字符串长度 mod 3 = 2 时。

class Base45
  ALPHABET = {
    0 => "0",
    1 => "1",
    2 => "2",
    3 => "3",
    4 => "4",
    5 => "5",
    6 => "6",
    7 => "7",
    8 => "8",
    9 => "9",
    10 => "A",
    11 => "B",
    12 => "C",
    13 => "D",
    14 => "E",
    15 => "F",
    16 => "G",
    17 => "H",
    18 => "I",
    19 => "J",
    20 => "K",
    21 => "L",
    22 => "M",
    23 => "N",
    24 => "O",
    25 => "P",
    26 => "Q",
    27 => "R",
    28 => "S",
    29 => "T",
    30 => "U",
    31 => "V",
    32 => "W",
    33 => "X",
    34 => "Y",
    35 => "Z",
    36 => " ",
    37 => "$",
    38 => "%",
    39 => "*",
    40 => "+",
    41 => "-",
    42 => ".",
    43 => "/",
    44 => ":"
  }.freeze

  def self.decode_base45(text)
    raise ArgumentError, "invalid base45 string" if text.size % 3 == 1

    arr = text.split("").map do |char|
      ALPHABET.invert[char]
    end

    arr.each_slice(3).to_a.map do |group|
      if group.size == 3
        x = group[0] + group[1] * 45 + group[2] * 45 * 45
        raise ArgumentError, "invalid base45 string" if x > 0xFFFF
        x.divmod(256)
      else
        x = group[0] + group[1] * 45
        raise ArgumentError, "invalid base45 string" if x > 0xFF
        x
      end
    end.flatten.pack("C*")
  end
end

【讨论】：

【解决方案3】：

这里的问题可能不是正确的解决方案。

但是添加textarr.pack("S>*").gsub(/\x00/, "") 解决了给定解码示例的问题。此外，您的 encode 版本对我来说效果不佳（在前两个示例中的结果错误），这真的很奇怪。

无论如何，这个帖子让我通过将其设为gem 做出了一些贡献。

【讨论】：