【发布时间】:2018-02-28 17:37:15
【问题描述】:
以下链接是我尝试解析的数据源示例。
http://www.mediafire.com/file/wfri4idoxszqixs/sampleWordData.xlsx
我有一列包含有价值的词。我想解析该行的每个单词并将数量列附加到它们。例如:
原始数据框
单词 (Col 1)、金额 (Col 2)
Words = ['Google', 'Google 很棒', 'Hi Google']
金额 = [5, 10, 5]
新数据框
Word1 (Col 1), Word2 (Col 2), Word3 (Col 3), Amount (Col 4)
Word1 = ['谷歌','谷歌','嗨']
Word2 = ['', '是', '谷歌']
Word3 = ['', '真棒', '']
金额 = [5, 10, 5]
最终数据框
Word = ['Google', 'is', 'awesome', 'Hi']
金额 = [15, 10, 10, 5]
尽我所能解释,因为很难让降价与列格式配合得很好。我在 xlsx 中展示了我如何尝试转换数据的每个步骤。
我对代码的以下尝试:
import pandas as pd
#load the dataset
df = pd.read_csv('myfile.csv')
df.columns = ('words', 'amount')
df.head()
#toget rid of nulls
df.dropna(subset=['words', inplace=True)
#shows me how many columns are needed in total to encompass the longest line
print(df.words.str.split(expand=True).head()
#attempt to split out the first word from the bunch of words per row
df2 = pd.DataFrame(df.words.str.split(' ', 1).tolist(),
columns = ['word1', 'word2']
不胜感激任何帮助或指导!
【问题讨论】:
标签: python excel pandas parsing keyword