【发布时间】:2021-08-18 21:21:47
【问题描述】:
这是一个宠物项目,我将 CSV 解析为人类可读的格式,如 *.txt、CSV(CSV 可能包含 100k+ 行)
像这样,
name,type,start_time,duration,ack,address,read,data
I2C,start,23.6799126,8.00E-09,,,,
I2C,address,23.6799138,8.40E-06,TRUE,0x74,FALSE,
I2C,data,23.6799239,8.40E-06,TRUE,,,0x02
I2C,start,23.6799367,8.00E-09,,,,
I2C,address,23.6799409,8.40E-06,TRUE,0x74,TRUE,
I2C,data,23.6799509,8.40E-06,FALSE,,,0xB2
I2C,stop,23.6799619,8.00E-09,,,,
对于每个行项目,我将解码它是否是 type==Start 或 address 或 data 或 stop,然后适当地解析其他值,因为我对整个数据帧使用 itertuples。我还从 JSON 文件示例中获取输入,将其与 CSV 相关联。
{
"name":"IOUT_LIMIT",
"address":"0x02",
"Formulate":"1",
"Data_Width":"1",
"Mask":"0x7F",
"Weightage":"50",
"Offset":"0",
"Units":"mA",
"BitFields":[
{
"name":"",
"start":0,
"end":7
}
]
},
并输出为
Transaction started for: Read Data From IOUT_LIMIT 3100mA
Transaction started for: Write Data To IOUT_LIMIT 3250mA
示例代码
for row in df.itertuples():
#Check for Start or Stop; I2C Start is used to start a transaction, If we encounter Repeated start it's a read.
if(row.type == 'start'):
#Fresh Start Encountered
if(Transaction_Started == 0):
String = "Transaction started for : "
Transaction_Started = 1
else:
#Repeated start encountered
Repeated_Start = 1
elif (row.type == 'address'):
if(Repeated_Start == 0):
Slave_address = row.address
elif(row.type == 'data'):
#Append read data to a list "Data"
Data.append(row.data)
if(Repeated_Start):
String+=" Read Data From"
else:
if(DataWrite == 1):
String+=" Write Data To"
DataWrite+=1
elif(row.type == 'stop'):
#Iterate over Regcmd_List in the JSON
for i in DataSheet_Data['Regcmd_List']:
#If Address is Hit Get the Register name and Store in Output
Data and move along
#Data[0] will have address byte
if(i['address'] == Data[0]):
Output_Data = i['name']
#Get the Datawidth
Datawidth = int(i['Data_Width'],0)
Temp_Data = ""
#Iterate the Stored Data from Data[1] and Store them as
a single value in Temp_Data
#Temp_data will have only Data, Address byte will be
excluded since we are iterating from Index[Datawidth]
#to Index[1]
for j in range(Datawidth,0,-1):
Data[j] = Data[j].replace("0x","")
Temp_Data+=Data[j]
#Check for Formula and apply over the Temp_Data
if(i['Formulate'] == '1'):
#Multiply the value with Weightage and add offset if
mentoined in Json
Temp_Data = int(Temp_Data,16)
Temp_Data &= int(i['Mask'],0)
Temp_Data *= int(i['Weightage'])
if((i['Offset'] != "0") & (Temp_Data!=0)):
Temp_Data += int(i['Offset'])
Temp_Data = str(Temp_Data)
#Append this to Output_Data which will have the
Register Name + Value after Formula calculation
Output_Data += " " + Temp_Data
String_Units = i['Units']
else:
#Append this to Output_Data which will have the
Register Name + Raw Bytes
Output_Data += " " + "0x" +Temp_Data
print(String + " " + " " + Output_Data + String_Units,file=output_file)
#Clear all the context when an I2C Stop is Encountered
Transaction_Started = 0
Slave_address = 0
Repeated_Start = 0
DataWrite = 0
String_Units = ""
Output_Data = ""
Temp_Data = ""
Datawidth = 0
Data.clear()
对于这个应用程序我应该使用iterrows 还是itertuples 哪个更有效?
一种有效的方法是读取从遇到row.type == 'start' 到row.type == 'stop' 的N 行并处理帧。但即使是这样,我也可能需要迭代直到我停止并在遇到停止时进行大部分处理。如果有更多性能优势设计,请告诉我。
【问题讨论】:
-
请以文字而非图片的形式提供您的数据。
-
猜“L”是一个印度词的缩写。请不要在这里使用它们,我们并不都来自印度。
-
itertuples总是比iterrow快。两者都相对低效。您可能应该为此使用csv模块 -
或许更重要的是,
String += "some other string"效率低下 -
顺便说一句,您可以从给定的数据框中发布您的预期输出吗?
标签: python json pandas csv parsing