【问题标题】:RavenDB: How can I properly index a cartesian product in a map-reduce?RavenDB:如何在 map-reduce 中正确索引笛卡尔积?
【发布时间】:2017-05-27 14:37:54
【问题描述】:

这个问题是RavenDB: Why do I get null-values for fields in this multi-map/reduce index? 的衍生问题,但我意识到,问题是另一个问题。

考虑我极其简化的域,重写为电影租赁店场景进行抽象:

public class User
{
    public string Id { get; set; }
}

public class Movie
{
    public string Id { get; set; }
}

public class MovieRental
{
    public string Id { get; set; }
    public string MovieId { get; set; }
    public string UserId { get; set; }
}

这是一个教科书上的多对多示例。

我要创建的索引是这样的:

对于给定的用户,给我一个数据库中每部电影的列表(过滤/搜索暂时省略)以及一个描述用户租借这部电影的次数(或零次)的整数。

基本上是这样的:

用户:

| Id     |
|--------|
| John   |
| Lizzie |
| Albert |

电影:

| Id           |
|--------------|
| Robocop      |
| Notting Hill |
| Inception    |

电影租赁:

| Id        | UserId | MovieId      |
|-----------|--------|--------------|
| rental-00 | John   | Robocop      |
| rental-01 | John   | Notting Hill |
| rental-02 | John   | Notting Hill |
| rental-03 | Lizzie | Robocop      |
| rental-04 | Lizzie | Robocop      |
| rental-05 | Lizzie | Inception    |

理想情况下,我想要一个要查询的索引,如下所示:

| UserId | MovieId      | RentalCount |
|--------|--------------|-------------|
| John   | Robocop      | 1           |
| John   | Notting Hill | 2           |
| John   | Inception    | 0           |
| Lizzie | Robocop      | 2           |
| Lizzie | Notting Hill | 0           |
| Lizzie | Inception    | 1           |
| Albert | Robocop      | 0           |
| Albert | Notting Hill | 0           |
| Albert | Inception    | 0           |

或声明式:

  • 我一直想要所有电影的完整列表(最终我会添加过滤/搜索功能) - 即使提供从未租过电影的用户也是如此
  • 我想计算每个用户的租金,只是整数
  • 我希望能够按租借次数排序 - 即在列表顶部显示给定用户租借次数最多的电影

但是,我找不到一种方法来制作上面的“交叉连接”并将其保存在索引中。相反,我最初认为我在下面的这个操作中做对了,但它不允许我进行排序(参见失败的测试):

{"不支持计算:x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+c__DisplayClass0_0).user_john.Id)).Count。您不能在 RavenDB 查询中使用计算 (只允许简单的成员表达式)。"}

我的问题基本上是:我怎样才能 - 或者我完全可以 - 编制索引,以便满足我的要求?


下面是我提到的示例,它不能满足我的要求,但这就是我现在所处的位置。它使用以下软件包(VS2015):

packages.config

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
  <package id="NUnit" version="3.5.0" targetFramework="net461" />
  <package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>

MovieRentalTests.cs

using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;

namespace UnitTestProject2
{
    [TestFixture]
    public class MovieRentalTests : RavenTestBase
    {
        [Test]
        public void DoSomeTests()
        {
            using (var server = GetNewServer())
            using (var store = NewRemoteDocumentStore(ravenDbServer: server))
            {
                //Test-data
                var user_john = new User { Id = "John" };
                var user_lizzie = new User { Id = "Lizzie" };
                var user_albert = new User { Id = "Albert" };


                var movie_robocop = new Movie { Id = "Robocop" };
                var movie_nottingHill = new Movie { Id = "Notting Hill" };
                var movie_inception = new Movie { Id = "Inception" };

                var rentals = new List<MovieRental>
                {
                    new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
                };

                //Init index
                new Movies_WithRentalsByUsersCount().Execute(store);

                //Insert test-data in db
                using (var session = store.OpenSession())
                {
                    session.Store(user_john);
                    session.Store(user_lizzie);
                    session.Store(user_albert);

                    session.Store(movie_robocop);
                    session.Store(movie_nottingHill);
                    session.Store(movie_inception);

                    foreach (var rental in rentals)
                    {
                        session.Store(rental);
                    }

                    session.SaveChanges();

                    WaitForAllRequestsToComplete(server);
                    WaitForIndexing(store);
                }

                //Test of correct rental-counts for users
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .ToList();

                    var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
                    Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);

                    var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
                    Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
                }

                // Test that you for a given user can sort the movies by view-count
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
                            .ToList();

                    Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
                    Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
                    Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
                }
            }
        }

        public class Movies_WithRentalsByUsersCount :
            AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
        {
            public Movies_WithRentalsByUsersCount()
            {
                AddMap<MovieRental>(rentals =>
                    from r in rentals
                    select new ReducedResult
                    {
                        MovieId = r.MovieId,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
                    });

                AddMap<Movie>(movies =>
                    from m in movies
                    select new ReducedResult
                    {
                        MovieId = m.Id,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
                    });

                Reduce = results =>
                    from result in results
                    group result by result.MovieId
                    into g
                    select new
                    {
                        MovieId = g.Key,
                        UserRentalCounts = (
                                from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
                                group userRentalCount by userRentalCount.UserId
                                into subGroup
                                select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
                            .ToArray()
                    };
            }

            public class ReducedResult
            {
                public string MovieId { get; set; }
                public UserRentalCount[] UserRentalCounts { get; set; }
            }

            public class UserRentalCount
            {
                public string UserId { get; set; }
                public int Count { get; set; }
            }
        }

        public class User
        {
            public string Id { get; set; }
        }

        public class Movie
        {
            public string Id { get; set; }
        }

        public class MovieRental
        {
            public string Id { get; set; }
            public string MovieId { get; set; }
            public string UserId { get; set; }
        }
    }
}

【问题讨论】:

    标签: ravendb cartesian-product cross-join


    【解决方案1】:

    由于您的要求是“针对给定用户”,如果您真的只寻找单个用户,您可以使用 Multi-Map 索引来执行此操作。使用 Movies 表本身生成基线零计数记录,然后在此基础上为用户映射实际的 MovieRentals 记录。

    如果你真的需要它让所有用户都看过所有电影,我不相信有办法用 RavenDB 干净地做到这一点,因为这将被视为reporting which is noted as one of the sour spots for RavenDB

    如果您真的想尝试使用 RavenDB 执行此操作,这里有一些选项:

    1) 在数据库中为每个用户和每部电影创建虚拟记录,并在索引中使用这些记录,计数为 0。每当添加/更新/删除电影或用户时,相应地更新虚拟记录。

    2) 根据请求在内存中自己生成零计数记录,并将该数据与 RavenDB 为您返回的非零计数数据合并。查询所有用户,查询所有电影,创建基线零计数记录,然后对非零计数进行实际查询并将其分层。最后,应用分页/过滤/排序逻辑。

    3) 使用 SQL 复制包将 Users、Movies 和 MovieRental 表复制到 SQL 并使用 SQL 进行此“报告”查询。

    【讨论】:

    • 谢谢你,大卫。我不喜欢选项 2,因为当必须按计数排序时,在考虑分页时,我需要将“所有内容”加载到内存中。选项 3 绝对是要走的路,我意识到,聚合的限制等是选择 RavenDB 时的权衡。在这种特殊情况下,在等待答案时,我最终将“最喜欢的电影”列表与“所有电影”列表分开(因此,MovieRentals 上的简单索引与分组和总和) - 结果甚至更好最终用户 imo 的 UX。再次感谢,这是你的赏金:)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-23
    • 1970-01-01
    • 1970-01-01
    • 2021-05-06
    • 2020-07-26
    • 1970-01-01
    相关资源
    最近更新 更多