具有更快搜索速度的节点嵌入数据库答案

【问题标题】：node embed databases with faster search speed具有更快搜索速度的节点嵌入数据库
【发布时间】：2017-09-22 08:18:45
【问题描述】：

实际上，我使用 SQlite 来存储一些具有某些属性的“文件”列表，没有关系数据，只有一个具有多个数据的文件，所以我认为任何关系数据库或 nosql 数据库都对我有效。现在的问题是搜索速度，我需要使用Node嵌入数据库并将文件存储在项目文件夹中，实际上我使用的是节点sqlite3，但我昨天测试了better-sqlite3模块，结果相似.

我的表结构是这样的：

╔════╦══════════════╦══════════╦════════════════╦═════════╦═══════════════╦══════════════════╗
║ id ║     hash     ║   name   ║   description  ║   date  ║      tags     ║    languages     ║
╠════╬══════════════╬══════════╣════════════════╣═════════╣═══════════════╣══════════════════╣
║INT ║     TEXT     ║   TEXT   ║     TEXT       ║  NUMBER ║     JSON      ║       JSON       ║
║  2 ║ b2b22b2b2bb2 ║ two test ║  lorem ipsum b ║ 1233123 ║ ["d","e","f"] ║ ["ko","en","tk"] ║
║  3 ║ asdasdasdsad ║ 333 test ║  lorem ipsum c ║ 1233123 ║ ["a","d","c"] ║ ["es","de","fr"] ║
║  4 ║ 4s342s423424 ║ 444 test ║  lorem ipsum d ║ 1233123 ║ ["a","b","g"] ║ ["es","pt","fr"] ║
╚════╩══════════════╩══════════╩════════════════╩═════════╩═══════════════╩══════════════════╝

大约 300.000 行的结果是：

Select * from files WHERE name LIKE "%string%"：300 毫秒

select * from files WHERE (tags LIKE '"music"' OR tags LIKE '"banana"') AND (languages LIKE '"esp"' OR languages LIKE '"ger"')：400 毫秒

select id from files : 130ms（尝试使用“select count(id) as counter FROM files”，它的速度较慢，计算结果在 30ms vs 150ms 左右）

结果还不错...但是这里只有一个搜索操作，而且我的程序允许多个用户同时搜索，所以搜索时间变得无法接受。（10 个客户，每次回复约 4 秒）我在 Core i7 4820K 中运行测试，将 500Gb SSD (550R/450W) 移动到 HDD RAID0 会增加很多查询时间

我尝试为每个搜索列创建 indexx，这个项目中的插入是偶尔的，所以我不太关心插入速度，但很奇怪，因为在名称、标签或语言字段中放置索引并不会影响太多速度（仅 50ms 左右，但明显增加了表大小）

所以..我正在寻找替代方案，我需要一个具有极高搜索速度且没有数据库锁定的节点嵌入数据库（我认为随着时间的推移数据库可以增长到 2M 行），但不消耗大量内存，不关心是否是否相关。

编辑：我做了很多测试，这是我的结果：

对于 node-lmdb，创建速度非常快，就像使用 memcache 一样，在 4 秒内插入 100.000 次，读取数据运行良好，但是因为是一个键值数据库，我需要将给定的数据转换为JSON，然后做“搜索”逻辑，这样会大大减少结果，下面是示例代码：

const crypto = require('crypto')
const lmdb = require('node-lmdb')

const env = new lmdb.Env()

env.open({
    path: __dirname + "/mydata",
    mapSize: 2*1024*1024*1024, // maximum database size 
    maxDbs: 3
})

var dbi = env.openDbi({
    name: "myPrettyDatabase",
    create: true // will create if database did not exist 
})

// Begin transaction
var txn = env.beginTxn()

let t0 = new Date().getTime()


// Create cursor
let cursor = new lmdb.Cursor(txn, dbi)
let counter = 0
let find = 0

for (var found = cursor.goToFirst(); found !== null; found = 
cursor.goToNext()) {
    cursor.getCurrentString(function(key, data) {

        let js
        try {
            js = JSON.parse(data)
            counter++
        } catch (e) { js = null }

        if (js && String(js.name).indexOf('Lorem') !== -1) {
            find++
        }
    })
}

console.log('counter: ' + counter)
console.log('find: ' + find)

// Close cursor
cursor.close();


let t1 = new Date().getTime()
console.log('time: ' + (t1-t0))


// Commit transaction
txn.commit()

dbi.close()

结果是：

$ 节点 index.js 柜台：215548 查找：113073 时间：1516

列表速度约为 200 毫秒，但 JSON 转换和小“搜索”逻辑在 sqlite 速度下会变慢（或者我做错了什么）

我用 Tingodb 做了其他实验，这是一个嵌入式数据库，但使用像 MongoDB 这样的系统，我像这样插入 200K 对象：

{ hash: '3736b5da857a4c7b9b046f326004803a',
  name: 'inia, looked up one of the more obscure Latin words, consectetur, from a Lorem I',
  description: ', looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of ',
  tags: [ 'pc', 'pc', 'hd', 'mp4' ],
  languages: [ 'fre', 'jap', 'deu' ] }

插入令人难以置信，在 2 ces 内大约 100K，但是......这是实验：

const Db = require('tingodb')({cacheSize: 60000, cacheMaxObjSize: 4096}).Db
const assert = require('assert')
const crypto = require('crypto')
var db = new Db('./', {})

// Fetch a collection to insert document into 
var collection = db.collection("batch_document_insert_collection_safe")

let t0 = new Date().getTime()
collection.find({ tags: 'video' }).toArray(function(err, docs) {
    console.log(err)  
    console.log(docs)  
    let t1 = new Date().getTime()
    console.log('time: ' + (t1-t0))
})

在 200K DB 中运行这个总共花费了 38 SECONDS 不知道是否正常...

关于 aladb，我对其进行了测试并且运行良好，我进行了另一个实验（现在我没有）并且性能很好并且类似于 sqlite3 有一些不错的东西，但在某些搜索中比慢 2 倍sqlite（使用 LIKE %string% 杀死引擎）。

编辑 2：在 Linux 机器中使用 ab 命令（ab -n 10000 -c 50 http://machine.tst:13375/library/search?tags=lorem）进行多次研究和测试以模拟多个请求后，我终于继续使用 sqlite3 库，但在开始时创建了一个附加表（内存一）并将处理的请求响应存储在表中（id(INT), hash(VARCHAR), object(TEXT), last(NUMBER)）。

我第一次使用请求数据创建一个唯一的哈希（“GET”+“/a/b/c”+JSON（requestData））和json编码响应，现在第一次查询继续返回正常速度，但下一个就像使用 memcache 或类似的数据库，现在我从 10 个请求/秒到 ~450 个请求/秒，CPU 使用率为 10%。

无论如何我都会做一个观察者事件来检查缓存行的“最后”列以删除旧请求并防止内存问题，我检查只有一个请求的参数多次更改，所有其他请求始终相同，所以我认为内存使用量不会增长太多。

如果将来我发现一些比 sqlite3 更好的嵌入选项，我会尝试更改数据库引擎

【问题讨论】：

AlaSQL (github.com/agershun/alasql) 怎么样？

标签： sql node.js database nosql

【解决方案1】：

试试LMDB 和Node-LMDB

性能非常好，对于您的用例来说，它看起来是一个理想的选择。每个客户端我可以达到 1,000,000 行/秒。

【讨论】：