核心数据批量插入突然减慢到1/10的速度答案

【问题标题】：Core data bulk insert suddenly slows to 1/10th the speed核心数据批量插入突然减慢到1/10的速度
【发布时间】：2012-11-27 02:09:24
【问题描述】：

我正在批量插入核心数据。我有一个人对象，这个人对象有一个名为“otherPeople”的关系，它是一个人的 NSSet。当从下载中批量插入数据时，一切都很好，直到大约 10,000 人被读入，此时批量插入速度减慢到爬行。我每插入 500 次就保存和重置我的 NSManagedObjectContext。

如果我注释掉插入“otherPerson”关系的部分，批量插入在整个下载过程中会很快。 parseJSON 在 500 个 JSONKit 字典中被批量调用。

任何想法可能导致这种情况？可能的解决方案？

代码：

- (NSArray*) getPeople:(NSArray*)ids
{
    NSFetchRequest* request = [[[NSFetchRequest alloc] init] autorelease];
    NSEntityDescription* entityDescription = [NSEntityDescription entityForName:@"Person" inManagedObjectContext:context];
    [request setEntity:entityDescription];
    [request setFetchBatchSize:ids.count];

    //Filter by array of ids
    NSPredicate* predicate = [NSPredicate predicateWithFormat:@"externalId IN %@", ids];
    [request setPredicate:predicate];

    NSError* _error;
    NSArray* people = [context executeFetchRequest:request error:&_error];

    return people;
}

- (void) parseJSON:(NSArray*)people
{
    NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
    NSMutableArray* idsToFetch = [NSMutableSet setWithCapacity:CHUNK_SIZE * 3];
    NSMutableDictionary* existingPeople = [NSMutableDictionary dictionaryWithCapacity:CHUNK_SIZE * 3];

    // populate the existing people dictionary first, that way we know who is already in the context without having to do a fetch for each person in the array (externalId IS indexed)
    for (NSDictionary* personDictionary in people)
    {
        // uses JSON kit to parse out all the external ids...
        [PersonJSON addExternalIdsToArray:idsToFetch fromDictionary:personDictionary];
    }

    // see above code for getPeople implementation...
    NSArray* existingPeopleArray = [self getPeople:idsToFetch];
    for (Person* p in existingPeopleArray)
    {
        [existingPeople setObject:p forKey:p.externalId];
    }

    for (NSDictionary* personDictionary in people)
    {
        NSString* externalId = [personDictionary objectForKey:@"PersonId"];
        Person* person = [existingPeople objectForKey:externalId];

        if (person == nil)
        {
            // the person was not in the context, make a new person in the context
            person = [[self newPerson] autorelease];
            person.ancestryId = externalId;
            [existingPeople setObject:person forKey:person.externalId];
        }

        // use JSON kit to populate the core data object...
        [PersonJSON populatePerson:person withDictionary:personDictionary inContext:[self context]];

        // these are just objects that contain an externalId, showing that the link hasn't been setup yet
        for (UnresolvedOtherPerson* other in person.unresolvedOtherPeople)
        {
            Person* relatedPerson = [existingPeople objectForKey:other.externalId];

            if (relatedPerson == nil)
            {
                relatedPerson = [[self newPerson] autorelease];
                relatedPerson.externalId = other.externalId;
                [existingPeople setObject:relatedPerson forKey:relatedPerson.externalId];
            }

            // add link - if I comment out this line, everything runs very fast
            // if I don't comment out, things slow down gradually and then exponentially
            [person addOtherPersonsObject:relatedPerson];
        }

        self.downloaded++;
    }

    [pool drain];
}

【问题讨论】：

Instruments 有一个可以在 iOS 模拟器中运行的 Core Data 仪器。我建议您使用它来跟踪 CoreData 正在做什么。
您是否有两个实体，即一个Person 实体和一个People 实体，还是只有一个实体Person 与其他Person 实体具有一对多关系？
你设置otherPeople关系的代码，你注释掉的代码，可能会有帮助。
这是多对多的关系。代码只是遍历数据结构数组，创建新的 Person 对象并调用 addOtherPeopleObject 来添加它们。
请回答 TechZen 的问题：一两个不同的实体？代码？

标签： objective-c ios core-data

【解决方案1】：

将对象添加到关系会导致双方的关系触发。因此，如果您有 A > B 并说您正在尝试将新创建的 A 对象添加到已经与 100,000 个 A 对象有关系的 B 对象，CoreData 将从存储中获取这 100,000 个对象以履行之前的关系添加新关系。

您经常清除 mangedobjectcontext 的事实意味着现在需要重新加载所有 100,000 个对象 CD 以实现关系，这使得该过程非常缓慢。

解决此问题的一种方法是执行两步导入过程。首先在不建立任何关系的情况下将所有对象加载到数据库中，但要跟踪需要添加的关系。一旦您进行了这样的快速导入，然后返回数据库并添加关系并清除上下文，以避免核心数据不得不过于频繁地重新加载关系。所以作为一个具体的例子，如果你需要导入100万个A，需要关联100个B，首先导入所有A，然后对于百个B中的每一个，加载关系一次，然后将所有 As 添加到其中，清除上下文，继续下一个 B，依此类推。这里的关键是防止上下文重置它刚刚痛苦加载的那些 100k 记录。

另一种解决方法是不定期重置整个上下文，而只刷新您想要摆脱的对象。

哦，还有一件事，您还可以考虑在 CoreData 中建立单向关系，并使用显式 fetch 来获取关系的另一端

编辑：

我想我找到了解决方法。您需要调用原始访问器。所以像

        [self.primitiveTags addObject:tag];

初步测试似乎表明这不会迫使关系的另一方开火

【讨论】：

如果这是真的，那似乎是糟糕的设计。我想做的就是添加一个新的关系，我不关心任何现有的关系......哎呀......
从数据库的角度来看可能很糟糕，但 coredata 是一个对象图管理器，因此必须保持对象图中的完整性。我被这个咬住了，不得不调试底层的 SQL 查询来解决这个问题。
我确实认为在关系的反面触发故障可能会延迟到以后。
我想您不知道有一种方法可以告诉它不要加载所有其他关系吗？ :)
大声笑，如果你想办法做到这一点，请告诉我，因为我也在努力寻找如何做到这一点。也就是说，我编辑了答案以发布潜在的解决方法。

【解决方案2】：

这是反向关系的结果。我们有一个可以包含数万个子对象的父对象。通过消除父子之间的反向关系并手动维护它，性能现在是恒定的！

【讨论】：