从网站地址字符串中获取主页[重复]答案

【问题标题】：Get HomePage from string of website adress [duplicate]从网站地址字符串中获取主页[重复]
【发布时间】：2021-12-22 04:33:04
【问题描述】：

我有一个公司网站的字符串列表。

这是一个例子： ['www.apple.com/about'、'go-sharp.ai/services'、'http.titos.com.br']

我需要用主页替换它们。

结果必须是： ['www.apple.com','go-sharp.ai','http.titos.com.br']

请您建议最好的方法（可能是一些 API）。

感谢您的宝贵时间！

【问题讨论】：

嗨，urllib 有很多工具可以处理 url。

标签： python arrays string nlp text-mining

【解决方案1】：

通过您的示例，您可以轻松地制作一个简单的解析器，如下所示：

sites = ['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
for s in sites:
    print(s.split('/')[0])

正如@Be Chiller Too 所说，您也可以使用urllib.parse.urlparse，但请确保您的网站格式正确，即如文档所述：

遵循 RFC 1808 中的语法规范，urlparse 仅在 netloc 由“//”正确引入时才识别它。否则，输入被假定为相对 URL，因此以路径组件开头。

参见。 https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse

【讨论】：

【解决方案2】：

一种方法：使用拆分方法

array=['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
result=[]
for ar in array:
    result.append(ar.split("/")[0])
print(result)

输出： ['www.apple.com', 'go-sharp.ai', 'http.titos.com.br']

【讨论】：