Spring Data JPA BigList 插入答案

【问题标题】：Spring Data JPA BigList insertSpring Data JPA BigList 插入
【发布时间】：2019-04-24 19:32:07
【问题描述】：

两天来，我一直在尝试使用 Spring-Data-JPA 在我的 Postgres 数据库中存储一个包含大约 600 万个条目的数组列表。整个过程有效，但速度很慢。我需要大约 27 分钟的时间。我已经尝试过批量大小，但这并没有带来太大的成功。我还注意到，桌子越大，保存的时间就越长。有没有办法加快速度？我以前用 SQLite 完成了所有的事情，同样的数量我只需要大约 15 秒。

我的实体

@Data
@Entity
@Table(name = "commodity_prices")
public class CommodityPrice {

    @Id
    @Column( name = "id" )
    @GeneratedValue( strategy = GenerationType.SEQUENCE )
    private long id;

    @Column(name = "station_id")
    private int station_id;

    @Column(name = "commodity_id")
    private int commodity_id;

    @Column(name = "supply")
    private long supply;

    @Column(name = "buy_price")
    private int buy_price;

    @Column(name = "sell_price")
    private int sell_price;

    @Column(name = "demand")
    private long demand;

    @Column(name = "collected_at")
    private long collected_at;


    public CommodityPrice( int station_id, int commodity_id, long supply, int buy_price, int sell_price, long demand,
            long collected_at ) {
        this.station_id = station_id;
        this.commodity_id = commodity_id;
        this.supply = supply;
        this.buy_price = buy_price;
        this.sell_price = sell_price;
        this.demand = demand;
        this.collected_at = collected_at;
    }
}

我的插入类

@Slf4j
@Component
public class CommodityPriceHandler {

    @Autowired
    CommodityPriceRepository commodityPriceRepository;

    @Autowired
    private EntityManager entityManager;

    public void inserIntoDB() {

        int lineCount = 0;
        List<CommodityPrice> commodityPrices = new ArrayList<>(  );
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();


        try {
            Reader reader = new FileReader( DOWNLOAD_SAVE_PATH + FILE_NAME_COMMODITY_PRICES );
            Iterable<CSVRecord> records = CSVFormat.EXCEL.withFirstRecordAsHeader().parse( reader );
            for( CSVRecord record : records ) {
                int station_id = Integer.parseInt( record.get( "station_id" ) );
                int commodity_id = Integer.parseInt( record.get( "commodity_id" ) );
                long supply = Long.parseLong( record.get( "supply" ) );
                int buy_price = Integer.parseInt( record.get( "buy_price" ) );
                int sell_price = Integer.parseInt( record.get( "sell_price" ) );
                long demand = Long.parseLong( record.get( "demand" ) );
                long collected_at = Long.parseLong( record.get( "collected_at" ) );

                CommodityPrice commodityPrice = new CommodityPrice(station_id, commodity_id, supply, buy_price, sell_price, demand, collected_at);
                commodityPrices.add( commodityPrice );

                if (commodityPrices.size() == 1000){
                    commodityPriceRepository.saveAll( commodityPrices );
                    commodityPriceRepository.flush();
                    entityManager.clear();
                    commodityPrices.clear();
                    System.out.println(lineCount);
                }

                lineCount ++;
            }
        }
        catch( IOException e ) {
            log.error( e.getLocalizedMessage() );
        }

        commodityPriceRepository.saveAll( commodityPrices );


        stopWatch.stop();

        log.info( "Successfully inserted " + lineCount + " lines in " + stopWatch.getTotalTimeSeconds() + " seconds." );
    }
}

我的应用程序.properties

# HIBERNATE
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
spring.jpa.hibernate.ddl-auto = update

spring.jpa.properties.hibernate.jdbc.batch_size=1000
spring.jpa.properties.hibernate.order_inserts=true

【问题讨论】：

如果您已经在 application.properties 中设置了 spring 批量大小，则不需要商品Prices.size() == 1000。当你调用 saveAll() 时，spring batch 会根据 batch 自动持久化。

标签： java spring postgresql spring-data-jpa bulkinsert

【解决方案1】：

当您进行批量插入时，您的序列生成策略仍然要求您为插入的每条记录发出一条语句。因此，对于 1000 条记录的批量大小，您发出 1001 条语句，这显然不是预期的。

我的建议：

启用 sql 日志记录以查看哪些语句被发送到您的数据库。我个人使用datasource-proxy，但使用任何你喜欢的东西。
修改您的序列生成器。至少，使用

@Id
@Column( name = "id" )
@GeneratedValue(generator = "com_pr_generator", strategy = GenerationType.SEQUENCE )
@SequenceGenerator(name="com_pr_generator", sequenceName = "book_seq", allocationSize=50)
private long id;

了解不同的生成策略并微调您的序列生成器。
- A beginner’s guide to Hibernate enhanced identifier generators
- Hibernate pooled and pooled-lo identifier generators

【讨论】：