从 COCO 数据集注释文件中提取注释答案

【问题标题】：Extract annotations from COCO Dataset annotation file从 COCO 数据集注释文件中提取注释
【发布时间】：2021-12-11 19:02:32
【问题描述】：

我想在 COCO 数据集的一个子集上进行训练。对于图像，我创建了一个包含 train2017 文件夹的前 30k 图像的文件夹。现在我需要在单独的 json 文件中对这 30k 图像（从 instance_train2017.json 中提取）进行注释，以便我可以对其进行训练。

我该怎么做？

【问题讨论】：

标签： json object-detection coco

【解决方案1】：

没有简单的方法，因为所有注释的图像都在一个长 json 文件中。我正在开发 Python 包，它可以帮助完成包括这个在内的数据集准备任务。

我在此笔记本https://github.com/pylabel-project/samples/blob/main/coco_extract_subset.ipynb 中创建了一个可重现的示例。您可以直接在 Google Colab 中使用 this link 打开它。

这个包一般是这样工作的：

from pylabel import importer
dataset = importer.ImportCoco(path_to_annotations)
#Now the annotations are stored in a dataframe 
#that you can query and manipulate like any other pandas dataframe
#In this case we filter the dataframe to images in a list of images 
dataset.df = dataset.df[dataset.df.img_filename.isin(files)].reset_index()
dataset.export.ExportToCoco()

我希望它对你有用。如果您有任何反馈，请告诉我。

【讨论】：