将常规 Python 字符串转换为原始字符串答案

【问题标题】：Convert regular Python string to raw string将常规 Python 字符串转换为原始字符串
【发布时间】：2011-05-23 20:23:49
【问题描述】：

我有一个字符串s，它的内容是可变的。我想把它变成一个原始字符串。我该怎么做？

类似于r'' 方法的东西。

【问题讨论】：

原始字符串只是定义字符串常量时的不同语法。你特别想从 var 中得到什么让你想到使用原始字符串？
您使用的是 Python 2 还是 Python 3？如果是 Python 3，您可能会询问 bytes 类型吗？

【解决方案1】：

稍微改正@Jolly1234的回答：这是代码：

raw_string=path.encode('unicode_escape').decode()

【讨论】：

【解决方案2】：

我想 repr 函数可以帮助你：

s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'

【讨论】：

【解决方案3】：

格式如下：

s = "your string"; raw_s = r'{0}'.format(s)

【讨论】：

【解决方案4】：

只需简单地使用编码功能。

my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)

然后将其转换回常规字符串，请执行此操作

my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)

--编辑--

以下内容不会使字符串成为原始字符串，而是将其编码为字节并对其进行解码。

【讨论】：

str.encode 将字符串编码为bytes，它不会创建raw string，这是一个字符串，其中反斜杠按字面意思处理，而不是转义字符。
对不起，我对它们有点困惑。

【解决方案5】：

s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed

【讨论】：

【解决方案6】：

从 Python 3.6 开始，您可以使用以下内容（类似于 @slashCoder）：

def to_raw(string):
    return fr"{string}"

my_dir ="C:\data\projects"
to_raw(my_dir)

产生'C:\\data\\projects'。我在 Windows 10 机器上使用它来将目录传递给函数。

【讨论】：

>>> def to_raw(string): ... return fr"{string}" ... >>> normal 'The\n' >>> to_raw(normal) 'The\n' >>> raw 'The\\n' 不提供与 raw 相同的输出
这实际上并没有做任何事情。 my_dir 已经是 'C:\\data\\projects'，因为 \d 和 \p 是无法识别的转义序列，因此保留了反斜杠。 Unrecognized escape sequences will raise a SyntaxError in a future version of Python。也可以试试my_dir = 'C:\Users'，它会立即引发SyntaxError。
这是正确的 - 我使用 TkInter 对话框获取文件路径，将其发送到字符串，然后我想将其转换为原始字符串以重新打开文件。因此，我不能只添加r'string'，但fr"{string}" 可以完美运行！
@ChemEnger return string 也可以； fr"{string}" == string 在string 是实际字符串而不是int 的所有情况下。
那么不做任何事情也是正确的，因为这实际上并没有做任何事情。

【解决方案7】：

因为 Python 中的字符串是不可变的，所以你不能“让它”有任何不同。但是，您可以从s 创建一个新的 原始字符串，如下所示：

raw_s = r'{}'.format(s)

【讨论】：

>>> raw_s = r'{}'.format(normal) >>> raw_s 'The\n' >>> normal 'The\n' >>> raw=r"The\n" >>> raw 'The\\n' 不提供与 raw 相同的输出
这没有任何作用。 r'{}'.format('\n') == '\n'。 r 前缀仅适用于字符串文字内的内容，即大括号。
对于 Windows 路径，如果转换路径字符串，请务必使用 os.path.join，以避免出现所谓的“转义字符”（也称为反斜杠）的问题。
这大致和str(str(str(str(s))))一样有用；如果其中一个为空，则使用format 将一个字符串放入另一个字符串中只是浪费。

【解决方案8】：

对于 Python 3，不添加双反斜杠并仅保留 \n、\t 等的方法是：

a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)

这给出了一个可以写成 CSV 的值：

hello\nbobby\nsally\n

似乎没有针对其他特殊字符的解决方案，但是，可能在它们之前有一个 \。这是一个无赖。解决这个问题会很复杂。

例如，要将包含带有特殊字符的字符串列表的pandas.Series序列化到格式为BERT的文本文件中，每个句子之间需要一个CR，每个文档之间有一个空行：

with open('sentences.csv', 'w') as f:

    current_idx = 0
    for idx, doc in sentences.items():
        # Insert a newline to separate documents
        if idx != current_idx:
            f.write('\n')
        # Write each sentence exactly as it appared to one line each
        for sentence in doc:
            f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')

这个输出（对于所有语言标记为句子的 Github CodeSearchNet 文档字符串）：

Makes sure the fast-path emits in order.
@param value the value to emit or queue up\n@param delayError if true, errors are delayed until the source has terminated\n@param disposable the resource to dispose if the drain terminates

Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{@code amb} does not operate by default on a particular {@link Scheduler}.
@param  the common element type\n@param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
@return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n@see ReactiveX operators documentation: Amb


...

【讨论】：

.decode()之后不需要那些额外的替换斜线
@wrivas 你知道，没有他们我无法让它工作，但这可能是我的用例所特有的。

【解决方案9】：

我相信您正在寻找的是 str.encode("string-escape") 函数。例如，如果您有一个想要“原始字符串”的变量：

a = '\x89'
a.encode('unicode_escape')
'\\x89'

注意：对于 python 2.x 和更早的版本，请使用 string-escape

我正在寻找类似的解决方案，并通过以下方式找到了解决方案： casting raw strings python

【讨论】：

这就是解决方案。
在 python 3.5.1 上：LookupError: unknown encoding: string-escape
这个“失败”输入"an uppercase a is \x41"
对我来说，“r'' 运算符”和 .encode() 似乎不一样。这三个： '\bla\ \n' --- r'\bla\ \n' --- ('\bla\ \n').encode("unicode_escape").decode() 似乎都给出了不同的字符串: '\x08la\\\n' --- '\\bla\\ \\n' --- '\\x08la\\\\ \\n'
如果它也对其他人有所帮助，我还需要在末尾添加额外的.decode()，就像the referenced source 一样，才能获得像我从r"string_goes_here" 单独获得的东西。但是，这是一个相当复杂的案例，我正在复制 here 之类的问题并解决。

【解决方案10】：

原始字符串不是另一种字符串。它们是在源代码中描述字符串的不同方式。一旦创建了字符串，它就是它的样子。

【讨论】：

+1 表示“it is what it is”这种罕见而准确的用法
这实际上是错误的。这里还有另一个答案正确的答案：“原始字符串不会逃避它们内部的任何东西”。
@igorsantos07 不，你很困惑。当你 create 一个字符串时，你可能需要转义一些东西；但是一旦字符串包含它所包含的内容，“转义它”就不是一个定义明确的操作（当然，您可以通过例如将文字反斜杠解释为转义码来创建具有不同内容的字符串)。

【解决方案11】：

原始字符串仅适用于字符串文字。它们的存在使您可以更方便地表达将通过转义序列处理修改的字符串。这在写出正则表达式或字符串文字中的其他形式的代码时特别有用。如果你想要一个没有转义处理的 unicode 字符串，只需在其前面加上 ur，例如 ur'somestring'。

【讨论】：

我不希望 @TokenMacGuy 知道这一点，但它们对于在 Windows 上定义路径也很有用，在路径中使用反斜杠作为分隔符，例如 r'C:\Python27\Tools\Scripts\2to3.py'
唉，TokenMacGuy 只是名字而已。我的主机运行windows。我不使用原始字符串作为文件路径的真正原因是因为我从不硬编码路径名。