【问题标题】:Extract all `INDENT` tokens using python's tokenize使用 python 的 tokenize 提取所有 INDENT 标记
【发布时间】:2021-03-18 06:23:09
【问题描述】:

我正在尝试在 python 中使用 tokenize 库来标记 python 代码。对于示例输入:-

def cal_cone_curved_surf_area(slant_height,radius):\n\tpi=3.14\n\treturn pi*radius*slant_height\n\n

我正在使用以下代码获取所有令牌(这里p 是示例输入字符串):

text = tokenize.generate_tokens(io.StringIO(p).readline)
[tok for tok in text]

运行代码 sn-p 后,我得到以下输出:

[TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=1 (NAME), string='cal_cone_curved_surf_area', start=(1, 4), end=(1, 29), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string='(', start=(1, 29), end=(1, 30), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=1 (NAME), string='slant_height', start=(1, 30), end=(1, 42), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=',', start=(1, 42), end=(1, 43), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=1 (NAME), string='radius', start=(1, 43), end=(1, 49), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=')', start=(1, 49), end=(1, 50), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=':', start=(1, 50), end=(1, 51), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 51), end=(1, 52), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=5 (INDENT), string='\t', start=(2, 0), end=(2, 1), line='\tpi=3.14\n'),
 TokenInfo(type=1 (NAME), string='pi', start=(2, 1), end=(2, 3), line='\tpi=3.14\n'),
 TokenInfo(type=53 (OP), string='=', start=(2, 3), end=(2, 4), line='\tpi=3.14\n'),
 TokenInfo(type=2 (NUMBER), string='3.14', start=(2, 4), end=(2, 8), line='\tpi=3.14\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 8), end=(2, 9), line='\tpi=3.14\n'),
 TokenInfo(type=1 (NAME), string='return', start=(3, 1), end=(3, 7), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='pi', start=(3, 8), end=(3, 10), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=53 (OP), string='*', start=(3, 10), end=(3, 11), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='radius', start=(3, 11), end=(3, 17), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=53 (OP), string='*', start=(3, 17), end=(3, 18), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='slant_height', start=(3, 18), end=(3, 30), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 30), end=(3, 31), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=56 (NL), string='\n', start=(4, 0), end=(4, 1), line='\n'),
  TokenInfo(type=6 (DEDENT), string='', start=(5, 0), end=(5, 0), line=''),
  TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='')]

可以看出,我只能提取一个 INDENT 令牌(第 10 行),但不能提取第二个 NEWLINE 之后的第二个。如何确保在我的源代码中获得所有正确的 INDENT 令牌?

【问题讨论】:

    标签: python python-3.x tokenize


    【解决方案1】:

    Token INDENT 是在进入一个块时生成的,而不是针对每一行。退出区块后,generate_tokens() 生成令牌 DEDENT。从 INDENT 到下一个 INDENT 或匹配的 DEDENT 的所有标记都具有相同的缩进级别。

    【讨论】:

      猜你喜欢
      • 2015-11-28
      • 1970-01-01
      • 2015-01-04
      • 2018-07-27
      • 1970-01-01
      • 1970-01-01
      • 2015-07-11
      • 2019-06-15
      • 1970-01-01
      相关资源
      最近更新 更多