【发布时间】:2020-07-30 07:57:08
【问题描述】:
我有一个 Python 脚本,它在我们所有的 AWS 账户(大约 150 个)中创建一个 EC2 实例列表,并将结果存储在 MongoDB 中。
我正在使用 Python pandas 模块将 mongodb 集合导出到 CSV 文件。它的工作原理是标题乱序,我不想打印 MongoDB 索引。
在脚本的原始版本中(在添加数据库之前),我使用 CSV 模块来编写文件并且标题是正确的:
我添加数据库既是为了学习,也是因为它可以更轻松地处理我们拥有的所有亚马逊帐户。
如果我在 mongo 数据库中查看我正在打印的集合的 json,所有字段的顺序都正确:
{'_id': ObjectId('5f14f9ffa40de31278dade03'), 'AWS Account': 'jf-master-pd', 'Account Number': '123456789101', 'Name': 'usawsweb001', 'Instance ID': 'i-01e5e920b4d3d5dcb', 'AMI ID': 'ami-006219aba10688d0b', 'Volumes': 'vol-0ce8db4e071bc7229, vol-099f6d212a91121d0, vol-0bb36e343e9c01374, vol-05610645edfd02253, vol-05adc01d70d75d649', 'Private IP': '172.31.62.168', 'Public IP': 'xx.xx.xx.xx', 'Private DNS': 'ip-172-31-62-168.ec2.internal', 'Availability Zone': 'us-east-1e', 'VPC ID': 'vpc-68b1ff12', 'Type': 't2.micro', 'Key Pair Name': 'jf-timd', 'State': 'running', 'Launch Date': 'July 20 2020'}
{'_id': ObjectId('5f14f9ffa40de31278dade05'), 'AWS Account': 'jf-master-pd', 'Account Number': '123456789101', 'Name': 'usawsweb002', 'Instance ID': 'i-0b7db2bcab853ef96', 'AMI ID': 'ami-006219aba10688d0b', 'Volumes': 'vol-095a9dcf54ca97c0e, vol-0c8e96b71fbb7dfcf, vol-070c16c457f91c54e, vol-0dc1eaf2e826fa3a6, vol-0f0f157a8489ab939', 'Private IP': '172.31.63.131', 'Public IP': 'xx.xx.xx.xx', 'Private DNS': 'ip-172-31-63-131.ec2.internal', 'Availability Zone': 'us-east-1e', 'VPC ID': 'vpc-68b1ff12', 'Type': 't2.micro', 'Key Pair Name': 'jf-timd', 'State': 'running', 'Launch Date': 'July 20 2020'}
{'_id': ObjectId('5f14f9ffa40de31278dade07'), 'AWS Account': 'jf-master-pd', 'Account Number': '123456789101', 'Name': 'usawsweb003', 'Instance ID': 'i-0611acf4b6cc53b61', 'AMI ID': 'ami-006219aba10688d0b', 'Volumes': 'vol-0aa28f89f6ce50577, vol-0e37ff844e8b9c47a, vol-0d54c713ae231739c, vol-0e29df46edc814619, vol-07e0c40a8913b1d31', 'Private IP': '172.31.52.44', 'Public IP': 'xx.xx.xx.xx', 'Private DNS': 'ip-172-31-52-44.ec2.internal', 'Availability Zone': 'us-east-1e', 'VPC ID': 'vpc-68b1ff12', 'Type': 't2.micro', 'Key Pair Name': 'jf-timd', 'State': 'running', 'Launch Date': 'July 20 2020'}
但是使用 python pandas 从 mongo 数据库中导出标题是不正常的。信息与正确的标题对齐,但列完全乱序:
在我的代码中,我正在创建一个包含服务器信息的字典,然后将字典传递给打印 Mongo 集合的函数:
def list_instances(aws_account,aws_account_number, interactive, regions, show_details, instance_col):
for region in regions:
if 'gov' in aws_account and not 'admin' in aws_account:
try:
session = boto3.Session(profile_name=aws_account, region_name=region)
except botocore.exceptions.ProfileNotFound as e:
profile_missing_message = f"An exception has occurred: {e}"
account_found = 'no'
raise
else:
try:
session = boto3.Session(profile_name=aws_account, region_name=region)
account_found = 'yes'
except botocore.exceptions.ProfileNotFound as e:
profile_missing_message = f"An exception has occurred: {e}"
raise
try:
ec2 = session.client("ec2")
except Exception as e:
print(f"An exception has occurred: {e}")
message = f" Region: {region} in {aws_account}: ({aws_account_number}) "
banner(message)
print(Fore.RESET)
# Loop through the instances
try:
instance_list = ec2.describe_instances()
except Exception as e:
print(f"An exception has occurred: {e}")
for reservation in instance_list["Reservations"]:
for instance in reservation.get("Instances", []):
instance_count = instance_count + 1
launch_time = instance["LaunchTime"]
launch_time_friendly = launch_time.strftime("%B %d %Y")
tree = objectpath.Tree(instance)
block_devices = set(tree.execute('$..BlockDeviceMappings[\'Ebs\'][\'VolumeId\']'))
if block_devices:
block_devices = list(block_devices)
block_devices = str(block_devices).replace('[','').replace(']','').replace('\'','')
else:
block_devices = None
private_ips = set(tree.execute('$..PrivateIpAddress'))
if private_ips:
private_ips_list = list(private_ips)
private_ips_list = str(private_ips_list).replace('[','').replace(']','').replace('\'','')
else:
private_ips_list = None
public_ips = set(tree.execute('$..PublicIp'))
if len(public_ips) == 0:
public_ips = None
if public_ips:
public_ips_list = list(public_ips)
public_ips_list = str(public_ips_list).replace('[','').replace(']','').replace('\'','')
else:
public_ips_list = None
name = None
if 'Tags' in instance:
try:
tags = instance['Tags']
name = None
for tag in tags:
if tag["Key"] == "Name":
name = tag["Value"]
if tag["Key"] == "Engagement" or tag["Key"] == "Engagement Code":
engagement = tag["Value"]
except ValueError:
# print("Instance: %s has no tags" % instance_id)
raise
key_name = instance['KeyName'] if instance['KeyName'] else None
vpc_id = instance.get('VpcId') if instance.get('VpcId') else None
private_dns = instance['PrivateDnsName'] if instance['PrivateDnsName'] else None
ec2info[instance['InstanceId']] = {
'AWS Account': aws_account,
'Account Number': aws_account_number,
'Name': name,
'Instance ID': instance['InstanceId'],
'AMI ID': instance['ImageId'],
'Volumes': block_devices,
'Private IP': private_ips_list,
'Public IP': public_ips_list,
'Private DNS': private_dns,
'Availability Zone': instance['Placement']['AvailabilityZone'],
'VPC ID': vpc_id,
'Type': instance['InstanceType'],
'Key Pair Name': key_name,
'State': instance['State']['Name'],
'Launch Date': launch_time_friendly
}
mongo_instance_dict = {'_id': '', 'AWS Account': aws_account, "Account Number": aws_account_number, 'Name': name, 'Instance ID': instance["InstanceId"], 'AMI ID': instance['ImageId'], 'Volumes': block_devices, 'Private IP': private_ips_list, 'Public IP': public_ips_list, 'Private DNS': private_dns, 'Availability Zone': instance['Placement']['AvailabilityZone'], 'VPC ID': vpc_id, 'Type': instance["InstanceType"], 'Key Pair Name': key_name, 'State': instance["State"]["Name"], 'Launch Date': launch_time_friendly}
insert_doc(mongo_instance_dict)
mongo_export_to_file(interactive, aws_account)
这是将字典插入 MongoDB 的函数:
def insert_doc(mydict):
mydb, mydb_name, instance_col = set_db()
mydict['_id'] = ObjectId()
instance_doc = instance_col.insert_one(mydict)
return instance_doc
这是将 MongoDB 写入文件的函数:
def mongo_export_to_file():
aws_account = 'jf-master-pd'
today = datetime.today()
today = today.strftime("%m-%d-%Y")
mydb, mydb_name, instance_col = set_db()
# make an API call to the MongoDB server
cursor = instance_col.find()
# extract the list of documents from cursor obj
mongo_docs = list(cursor)
# create an empty DataFrame for storing documents
docs = pandas.DataFrame(columns=[])
# iterate over the list of MongoDB dict documents
for num, doc in enumerate(mongo_docs):
# convert ObjectId() to str
doc["_id"] = str(doc["_id"])
# get document _id from dict
doc_id = doc["_id"]
# create a Series obj from the MongoDB dict
series_obj = pandas.Series( doc, name=doc_id )
# append the MongoDB Series obj to the DataFrame obj
docs = docs.append(series_obj)
# get document _id from dict
doc_id = doc["_id"]
# Set the output file
output_dir = os.path.join('..', '..', 'output_files', 'aws_instance_list', 'csv', '')
output_file = os.path.join(output_dir, 'aws-instance-master-list-' + today +'.csv')
# export MongoDB documents to a CSV file
docs.to_csv(output_file, ",") # CSV delimited by commas
这是github 中原始代码目录的链接。我们想要的文件是 aws_ec2_list_instances.py 和 ec2_mongo.py
为什么 MongoDB 版本中的列和标题乱序?从 pandas 打印到文件时,如何摆脱 mongo 为 ID 添加的额外列?
【问题讨论】:
-
尝试使用集合包中的 OrderedDict 代替字典
-
您有我们可以在某处使用的测试平台吗?我尝试运行你的代码,
pandas很难安装,之后,我不能确定我的 mongodb 集合设置是否和你的一样。您发布的仓库中未定义您的create_mongodb。 -
这很奇怪。我已经重新添加了
create_mongodb定义。不知道为什么它消失了。该脚本现在正在运行,请查看我的答案。如果您再次查看 repo,请注意drop_mongodb功能还没有完全到位,它仍在进行中。谢谢!
标签: python mongodb amazon-web-services