张量流中的掩码张量部分，数据增强答案

【问题标题】：Mask tensors parts in tensorflow, data augmentation张量流中的掩码张量部分，数据增强
【发布时间】：2019-04-25 16:20:05
【问题描述】：

我正在尝试实现以下论文：https://arxiv.org/abs/1904.08779，以便在 Speech to Text 中获得更好的结果。
我正在尝试使用mozilla DeepSpeech repo 来实现它。它使用 tensorflow 数据集模型来加载数据。

dataset = (tf.data.Dataset.from_generator(generate_values,
                                              output_types=(tf.string, (tf.int64, tf.int32, tf.int64),tf.int64))
                              .map(entry_to_features, num_parallel_calls=tf.data.experimental.AUTOTUNE)
                              .cache(cache_path)
                              .map(augment_spec, num_parallel_calls=tf.data.experimental.AUTOTUNE)
                              .window(batch_size, drop_remainder=True).flat_map(batch_fn)
                              .prefetch(num_gpus))

音频转换为频谱图并计算 mfcc，因此当数据到达 augment_spec 函数时，其形状为 (?, 26)。 ?是对可变音频长度进行整形的结果。我试图掩盖图像的某些部分，为此我想到了乘以张量，一个是 1 和 0 的掩码，使用类似这样的代码

def augment_spec(features, features_len, transcript):
    # print("\n\n\n\n duration", duration.eval())
    sample_rate = 8000

    mask = np.ones_like(features)

    temp = tf.Variable(tf.ones_like(features))
    print(temp)

    time_len = features_len.shape[0]
    features_len = features_len

    n_time_masks = np.random.randint(0, 4)
    n_freq_masks = np.random.randint(0, 3)

    for _ in range(n_time_masks):
        time_delta = np.random.randint(int(sample_rate / 10), int(sample_rate / 2))
        time_start = np.random.randint(0, time_len - time_delta)
        print(time_start, time_delta)
        mask[time_start:time_start + time_delta] = 0

    for _ in range(n_freq_masks):
        freq_delta = np.random.randint(1, 4)
        freq_start = np.random.randint(0, features_len - freq_delta)
        print(freq_start, freq_delta)
        mask[:, freq_start:freq_start + freq_delta] = 0

    mask = tf.convert_to_tensor(mask, dtype=tf.float32)
    return tf.math.multiply(features, mask),  features_len, transcript

问题在于这些说明：

    mask = np.ones_like(features)  

    time_len = features_len.shape[0]

由于在构建图形时张量没有定义形状，所以不工作，所以我不知道如何实现这一点。你能帮我解决这个问题吗？非常感谢！！

更新：在@kempy 回答之后，我的代码现在看起来像这样：

def augment_spec(features, features_len, transcript):

    # print("\n\n\n\n duration", duration.eval())
    sample_rate = 8000

    mask = tf.Variable(tf.ones_like(features),validate_shape=False)

    time_len = tf.shape(features)[0]

    n_time_masks = np.random.randint(0, 4)
    n_freq_masks = np.random.randint(0, 3)
    # n_time_masks = tf.random.uniform(
    #         shape=(), minval=0, maxval=4, dtype=tf.int32)
    # n_freq_masks = tf.random.uniform(
    #         shape=(), minval=0, maxval=3, dtype=tf.int32)

    for _ in range(n_time_masks):

        time_delta = tf.random.uniform(
            shape=(), minval=int(sample_rate / 10), maxval=int(sample_rate / 2), dtype=tf.int32)
        time_start = tf.random.uniform(
            shape=(), minval=0, maxval=time_len-time_delta, dtype=tf.int32)

        # indexes = list(range(time_start,time_start+time_delta))
        indexes = tf.range(time_start, time_start+time_delta, delta=1, dtype=tf.int32, name='range')

        tf.scatter_update(mask, indexes, 0)

    mask = tf.transpose(mask,(1,0))
    for _ in range(n_freq_masks):
        # freq_delta = np.random.randint(1, 4)
        # freq_start = np.random.randint(0, features_len - freq_delta)

        freq_delta = tf.random.uniform(
            shape=(), minval=1, maxval=4, dtype=tf.int32)
        freq_start = tf.random.uniform(
            shape=(), minval=0, maxval=(features_len - freq_delta), dtype=tf.int32)


        # indexes = list(range(freq_start,freq_start+freq_delta))
        indexes = tf.range(freq_start, freq_start+freq_delta, delta=1, dtype=tf.int32, name='range')

        tf.scatter_update(mask, indexes, 0)


    mask = tf.transpose(mask,(1,0))
    mask = tf.convert_to_tensor(mask, dtype=tf.float32)
    masked = tf.multiply(features, mask)
    return masked,  features_len, transcript

但是现在我收到了这个错误：

ValueError: Tensor("Variable:0", dtype=float32_ref) must be from the same graph as Tensor("tower_0/Mean:0", shape=(), dtype=float32, device=/device:GPU:0).

我不知道如何解决这个问题，谢谢您的帮助

【问题讨论】：

你用的是什么版本的TF？您是在 Eager 模式还是图形模式下运行？
tf 版本 1.13 我运行图形模式，该函数必须在数据集生成器中

标签： python tensorflow speech-recognition tensor data-augmentation

【解决方案1】：

简短的回答

使用tf 版本而不是np 函数。 tf.ones_like 应该与形状 (?, 26) 的输入一起正常工作，您可以使用 tf.shape(features)[0] 动态获取特征的形状。再往下，你应该使用类似tf.random.uniform

长答案

在图形模式下运行 TF 时（这是 TF 1.X 中的默认模式），你不能让 python 代码依赖于张量的输出，因为它还没有被执行，所以你应该使用 TF ops而不是 python numpy 代码。

我们可以构建一个具有动态第一维的图：

import numpy as np
import tensorflow as tf

# Feature dimensions
unknown_size = 3
feature_dim = 26

tf.reset_default_graph()

# features_input has dynamic first dimension
features_input = tf.placeholder(tf.int32, shape=(None, feature_dim))

# ones_like should work fine with argument of shape (?, 26)
batched_ones = tf.ones_like(features_input)

# dynamically get the shape of the features_input
time_len = tf.shape(features_input)[0]
time_start = tf.random.uniform(
    shape=(), minval=0, maxval=time_len, dtype=tf.int32)

并打印以下内容：

print('features_input.shape:')
print(features_input.shape)
print('batched_ones.shape:')
print(batched_ones.shape)
print('time_start.shape:')
print(time_start.shape)

我们看到的输出是：

features_input.shape:
(?, 26)
batched_ones.shape:
(?, 26)
time_start.shape:
()

如果我们然后尝试执行图形：

with tf.Session() as sess:
  # Create some input data
  features = np.arange(feature_dim)
  batched_features = np.tile(features, (unknown_size, 1))

  # Evaluate the tensors
  features_out, ones_out, time_start_out = sess.run(
      [features_input, batched_ones, time_start],
      feed_dict={features_input: batched_features})

并打印输出：

# Print out what the output looks like
print('\nOutput:')
print('\nFeatures:')

print(features_out)
print('shape:', features_out.shape)

print('\nOnes:')
print(ones_out)
print('shape:', ones_out.shape)

print('\nRandom between 0 and unknown_size:')
print(time_start_out)
print('shape:', time_start_out.shape)

我们可以看到它有效！

Output:

Features:
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25]]
shape: (3, 26)

Ones:
[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
shape: (3, 26)

Random between 0 and unknown_size:
0
shape: ()

【讨论】：

通过这样做，您可以获得 time_len 和 time_start，问题是我不能将 0 值分配给张量，因为它是张量流张量，它不允许它
您可以使用tf.scatter_update按索引更新。您应该生成要更新的索引列表，然后使用 tf.scatter_update 将它们全部更新为 0
我已经用新代码编辑了这个问题。现在我收到一个新错误，说张量来自不同的图表
看起来该错误与将操作添加到不同的图表有关。它引用了一些名为 tower_0/Mean 的张量，看起来它是您正在使用的库的一部分。您应该确保您的代码和将操作添加到图表的库代码在同一个图表上运行 - 因此请确保您不要在两者之间运行 tf.reset_default_graph()