使用 MPEG-TS 流量清漆答案

【问题标题】：Varnish with MPEG-TS Traffic使用 MPEG-TS 流量清漆
【发布时间】：2021-02-18 10:06:27
【问题描述】：

我们正在尝试使用 Varnish 作为媒体服务器的代理/缓存。我们的流是基于 http 的 MPEG-TS (h264/h265)。该媒体服务器上有 1000 个直播流，每个流都有多个连接。我们尝试如下配置 Varnish，但我们遇到了这些问题。

流在短时间内关闭
有时无法连接到流，卡在连接中...
在 varnislog 中出现这些错误；

-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Could not get storage
-   FetchError     Could not get storage

我的配置；

vcl 4.0;

import directors;


backend s6855 {
    .host = "127.0.0.1";
    .port = "6855";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6866 {
    .host = "127.0.0.1";
    .port = "6866";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
    }

backend s6877 {
    .host = "127.0.0.1";
    .port = "6877";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6888 {
    .host = "127.0.0.1";
    .port = "6888";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6899 {
    .host = "127.0.0.1";
    .port = "6899";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}


sub vcl_init {
    new fb = directors.round_robin();
    fb.add_backend(s6855);
    fb.add_backend(s6866);
    fb.add_backend(s6877);
    fb.add_backend(s6888);
    fb.add_backend(s6899);

}


sub vcl_recv {

    set req.grace = 120s;

    set req.backend_hint = fb.backend();

    if (req.url ~ "(\.ts)" ) {
    unset req.http.Range;
    }
    if (req.http.cookie) {
        unset req.http.cookie;
    }

    if (req.method != "GET" && req.method != "HEAD") {
    return (pipe);
    }

    if (req.method == "GET" && req.url ~ "(\.ts)"  ) {
        unset req.http.Accept-Encoding;
        return(hash);
    }
return(hash);
}

sub vcl_hash {
    hash_data(req.url);
    return(lookup);
}

sub vcl_backend_response {
    set beresp.grace = 2m; 
    set beresp.ttl = 120s;
    set beresp.do_gunzip = false;
    set beresp.do_gzip = false;

    if (bereq.url ~ "(\.ts)") {
    set beresp.ttl = 60s;
    set beresp.http.X-Cacheable = "YES";
    }

                else    {
    set beresp.ttl = 10m;
    set beresp.http.X-Cacheable = "NO";
    }

    if ( beresp.status == 404 ) {
    set beresp.ttl = 5m;
    }
 
    return(deliver);
}


sub vcl_hit {
    if (obj.ttl == 0s) {
    return(pass);
    }

    return(deliver);
}

sub vcl_miss {
}

sub vcl_deliver {
    set resp.http.X-Served-By = "For Test";

    if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT";
    set resp.http.X-Cache-Hits = obj.hits;

    } else {
    set resp.http.X-Cache = "MISS";
    }



    if(resp.http.magicmarker) {
    unset resp.http.magicmarker; 
    set resp.http.Age="0";
    }

    unset resp.http.Via;
    unset resp.http.X-Varnish;

}

Varnish Usage

由于对 Varnish 很陌生，不知道如何调试问题，我们将不胜感激。

谢谢

【问题讨论】：

标签： varnish varnish-vcl

【解决方案1】：

您遇到的问题不仅仅是缺少对象存储，而是您最大的 HTTP 响应大于对象存储的总大小。

这意味着 Varnish 无法LRU 驱逐所需的空间以适应缓存中的对象。

Could not get storage 是发生这种情况时通常返回的错误。

检查尺寸

重要的是要弄清楚你的缓存有多大，以及你失败的对象的大小。

您的varnishd 运行时设置将告诉您您的对象存储有多大。 -s malloc,<size> 包含此值。

您还可以使用varnishstat 检查内存缓存和临时存储的大小和使用情况：

varnishstat -f SMA.*.g* -f MAIN.n_lru_nuked

此命令中还包含MAIN.n_lru_nuked 计数器，它将指示 Varnish 强制从缓存中删除多少对象以为新对象腾出空间。

解决问题

解决此问题的最简单方法是通过-s malloc,<size> 为 Varnish 分配更多内存。更改这些设置后不要忘记重新启动 Varnish。

之后，以下命令将帮助您确定是否有足够的存储空间，以及 Varnish 是否仍需要从缓存中强制删除对象以释放空间：

varnishstat -f SMA.*.g* -f MAIN.n_lru_nuked

更可持续的计划

另一个计划是依靠Massive Storage Engine (MSE)。这是一个存储引擎，属于Varnish Enterprise。

它结合了内存和磁盘存储，并针对处理大量数据进行了优化。它避免了碎片化，并且在架构上不受磁盘访问典型延迟的影响。

有适用于 AWS、Azure 和 Google Cloud 的官方机器映像，可让您试验此存储引擎，而无需预先购买许可证。

一个杀手级 MSE 功能是 memory governor。这是一种根据请求和响应的需要动态调整缓存内存存储大小的机制。

如果内存不足，并且线程处理所需的内存不多，内存调控器会自动为存储引擎分配更多内存。

如果您使用 MSE 的持久层，您可以在单台机器上托管数 TB 的数据，而不会遇到这些问题。

在构建 Varnish Enterprise 的 Varnish Software 公司，我们将 MSE 视为 OTT 视频流公司用来加速视频交付的主要功能。

如果我的评估完全错误怎么办

虽然 Could not get storage 错误通常出现在 Varnish 尝试在缓存中存储大对象时，而缓存的大小太小，但我也可能是错误的。

在这种情况下，我建议您运行 varnishlog 并查看该特定事务中发生的完整跟踪：

varnishlog -g request -q "ReqUrl eq '/my-url'"

此示例获取/my-url 请求的所有详细信息。请将其更改为您要监控的 URL。

输出通常会让您更好地了解 Varnish 的行为方式。如果我最初的评估有误，这可以帮助我们找出解决问题的方法。

【讨论】：

非常感谢您的回答 Thisj。我特别怀疑我的时间设置ttls和graces。我不明白为什么我的对象没有在缓存中衰减并及时更新。由于在这种情况下的文件是一个 mpegts 流，它具有将无限增长/流动的不间断数据事务，因此清漆应该解析一个部分并在它的 ttl 时间之前提供它，我认为应该有新的数据服务. MSE 实际上是我们解决这个项目的方法，但不幸的是，我们目前无法选择使用它。我的 systemd 文件 pastebin.com/tT3X50Cz 。 32GB 内存 / 32C CPU
@Talion 我想在发生错误时查看varnishstat 中n_lru_limited 计数器的值。错误指向 Varnish 未能释放存储空间。