【问题标题】:Delete All Azure Table Records删除所有 Azure 表记录
【发布时间】:2014-12-07 05:11:54
【问题描述】:

我有一个 Azure 存储表,它有 3k+ 条记录。

删除表中所有行的最有效方法是什么?

【问题讨论】:

    标签: c# azure azure-storage azure-table-storage


    【解决方案1】:

    对于 3000 条记录,最简单的方法是 delete the table。但是请注意,当您删除表时,它并没有在当时被删除,而是被放入某种待删除的队列中,并且实际上是在一段时间后被删除的。这个时间取决于系统的负载 + 表中的实体数量。在此期间,您将无法重新创建此表或使用此表。

    如果继续使用该表对您很重要,则唯一的其他选择是删除实体。为了更快地删除,您可以查看使用Entity Batch Transactions 删除实体。但要删除实体,您需要先获取实体。您可以通过仅获取实体的PartitionKeyRowKey 属性而不是获取所有属性来加快获取过程,因为删除实体只需要这两个属性。

    【讨论】:

    • 嗨 Gaurav - 您知道表格实际被删除需要多长时间吗?
    • stackoverflow.com/questions/15508517/… - 该帖子说删除一个表至少需要 40 秒 - 但也表明如果表很大,它可能会更长:-/
    【解决方案2】:

    我使用这样的东西。我们按日期对键进行分区,您的情况可能会有所不同:

    async Task Main()
    {
        var startDate = new DateTime(2011, 1, 1);
        var endDate = new DateTime(2012, 1, 1);
    
        var account = CloudStorageAccount.Parse("connString");
        var client = account.CreateCloudTableClient();
        var table = client.GetTableReference("TableName");
    
        var dates = Enumerable.Range(0, Math.Abs((startDate.Month - endDate.Month) + 12 * (startDate.Year - endDate.Year)))
            .Select(offset => startDate.AddMonths(offset))
            .ToList();
    
        foreach (var date in dates)
        {
            var key = $"{date.ToShortDateString()}";
    
            var query = $"(PartitionKey eq '{key}')";
            var rangeQuery = new TableQuery<TableEntity>().Where(query);
    
            var result = table.ExecuteQuery<TableEntity>(rangeQuery);
            $"Deleting data from {date.ToShortDateString()}, key {key}, has {result.Count()} records.".Dump();
    
            var allTasks = result.Select(async r =>
            {
                try
                {
                    await table.ExecuteAsync(TableOperation.Delete(r));
                }
                catch (Exception e) { $"{r.RowKey} - {e.ToString()}".Dump(); }
            });
            await Task.WhenAll(allTasks);
        }
    }
    

    【讨论】:

      【解决方案3】:

      这取决于您的数据结构,但如果您可以为所有记录编写查询,则可以将每个记录添加到 TableBatchOperation 并一次全部执行。

      这是一个仅在同一分区键中获取所有结果的示例,改编自 How to get started with Azure Table storage and Visual Studio connected services

      // query all rows
      CloudTable peopleTable = tableClient.GetTableReference("myTableName");
      var query = new TableQuery<MyTableEntity>();
      var result = await remindersTable.ExecuteQuerySegmentedAsync(query, null);
      
      // Create the batch operation.
      TableBatchOperation batchDeleteOperation = new TableBatchOperation();
      
      foreach (var row in result)
      {
          batchDeleteOperation.Delete(row);
      }
      
      // Execute the batch operation.
      await remindersTable.ExecuteBatchAsync(batchDeleteOperation);
      

      【讨论】:

      • 我使用类似于 KyleMit 的东西,但是 TableBatchOperations 最多可以包含 100 个项目,所以在 foreach 循环结束时,我会检查每批 100 个项目的 batchDeleteOperation 和 ExecuteBatchAsync 的计数。跨度>
      【解决方案4】:

      我使用下面的函数,先将所有分区键放入队列,然后循环遍历键,以100个为单位批量删除所有行。

      Queue queue = new Queue();
                  queue.Enqueue("PartitionKeyTodelete1");
                  queue.Enqueue("PartitionKeyTodelete2");
                  queue.Enqueue("PartitionKeyTodelete3");
      
                  while (queue.Count > 0)
                  {
                      string partitionToDelete = (string)queue.Dequeue();
      
                      TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
                        .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionToDelete))
                        .Select(new string[] { "PartitionKey", "RowKey" });
      
                      TableContinuationToken continuationToken = null;
      
                      do
                      {
                          var tableQueryResult = await myTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);
      
                          continuationToken = tableQueryResult.ContinuationToken;
      
                          // Split into chunks of 100 for batching
                          List<List<TableEntity>> rowsChunked = tableQueryResult.Select((x, index) => new { Index = index, Value = x })
                              .Where(x => x.Value != null)
                              .GroupBy(x => x.Index / 100)
                              .Select(x => x.Select(v => v.Value).ToList())
                              .ToList();
      
                          // Delete each chunk of 100 in a batch
                          foreach (List<TableEntity> rows in rowsChunked)
                          {
                              TableBatchOperation tableBatchOperation = new TableBatchOperation();
                              rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));
      
                              await myTable.ExecuteBatchAsync(tableBatchOperation);
                          }
                      }
                      while (continuationToken != null);
                  }
      

      【讨论】:

        【解决方案5】:

        对于后来发现这一点的人来说,接受答案“刚刚删除表”的问题在于,虽然它在存储模拟器中运行良好,但在生产中会随机失败。如果您的应用/服务需要定期重新生成表,那么您会发现由于冲突或删除仍在进行中而导致失败。

        相反,我发现最快和最防错的 EF 友好方法是删除分段查询中的所有行。下面是我正在使用的一个简单的嵌入式示例。传入您的客户端、表名和实现 ITableEntity 的类型。

        private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
            {
                // query all rows
                CloudTable tableref = client.GetTableReference(table);           
                var query = new TableQuery<T>();
                TableContinuationToken token = null;
                                                 
                do
                {
                    var result = await tableref.ExecuteQuerySegmentedAsync(query, token);  
                    foreach (var row in result)
                    {
                        var op = TableOperation.Delete(row);
                        tableref.ExecuteAsync(op);
                    }
                    token = result.ContinuationToken;
                } while (token != null);  
                
            }
        

        示例用法:

        table = client.GetTableReference("TodayPerformanceSnapshot");
        created = await table.CreateIfNotExistsAsync();
        
        if(!created)
        { 
            // not created, table already existed, delete all content
           await DeleteAllRows<TodayPerformanceContainer>("TodayPerformanceSnapshot", client);
           log.Information("Azure Table:{Table} Purged", table);
        }
        

        批处理方法需要付出更多的努力,因为您必须同时处理“批处理中只有相同的分区键”和“只有 100 行”的限制。以下版本的 DeleteAllRows 执行此操作。

        private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
            {
                // query all rows
                CloudTable tableref = client.GetTableReference(table);           
                var query = new TableQuery<T>();
                TableContinuationToken token = null;            
                TableBatchOperation batchops = new TableBatchOperation();
                Dictionary<string, Stack<TableOperation>> pendingOperations = new Dictionary<string, Stack<TableOperation>>();
                
                do
                {
                    var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
                    foreach (var row in result)
                    {
                       var op = TableOperation.Delete(row);
                        if (pendingOperations.ContainsKey(row.PartitionKey))
                        {
                            pendingOperations[row.PartitionKey].Push(op);
                        }
                        else
                        {
                            pendingOperations.Add(row.PartitionKey, new Stack<TableOperation>() );
                            pendingOperations[row.PartitionKey].Push(op);
                        }                                    
                    }
                    token = result.ContinuationToken;
                } while (token != null);
        
                // order by partition key            
                foreach (var key in pendingOperations.Keys)
                {                
                    log.Information($"Deleting:{key}");                
                    var rowStack = pendingOperations[key];
                    int max = 100;
                    int current = 0;
        
                    while (rowStack.Count != 0)
                    {
                        // dequeue in groups of 100
                        while (current < max && rowStack.Count > 0)
                        {
                            var op = rowStack.Pop();
                            batchops.Add(op);
                            current++;
                        }
        
                        //execute and reset
                        _ = await tableref.ExecuteBatchAsync(batchops);
                        log.Information($"Deleted batch of size:{batchops.Count}");
                        current = 0;
                        batchops.Clear();
                    }
                }                       
            }
        

        【讨论】:

          【解决方案6】:

          我最近编写了一个可以做到这一点的库。

          来源/文档:https://github.com/pflajszer/AzureTablesLifecycleManager

          对于您的用例,代码如下所示:

          // inject ITableManager in the constructor:
          
          private readonly ITableManager _api;
          
          public MyClass(ITableManager api)
          {
              _api = api;
          }
          
          /// <summary>
          /// Delete all data from a single table
          /// </summary>
          /// <typeparam name="T"></typeparam>
          /// <param name="tableName"></param>
          /// <returns></returns>
          public Task<DataTransferResponse<T>> DeleteTableDataAsync<T>(string tableName) where T : class, ITableEntity, new()
          {
              // this query will return a single table with a given name:
              Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;
          
              // this query will return all the data from the table:
              Expression<Func<T, bool>> dataQuery = x => true;
                       
              // ... but you can use LINQ to filter results too, like:
              // Expression<Func<T, bool>> anotherExampleOfdataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);
          
              return _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
          }
          

          ...或者,正如 Gaurav Mantri 建议的那样,您可以删除表本身:

          /// <summary>
          /// Delete a single table
          /// </summary>
          /// <param name="tableName"></param>
          /// <returns></returns>
          public Task<DataTransferResponse<TableItem>> DeleteTableAsync(string tableName)
          {
              // this query will return a single table with a given name:
              Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;
          
              return _api.DeleteTablesAsync(tableQuery);
          }
          

          【讨论】:

          • 天哪,我已经为这样的工具等了一年!您从基于 LINQ 的表中删除数据和删除表是我一直缺少的非常需要的功能。
          • 感谢您的客气话@shelbaz。我很高兴你发现它很有用。随时标记您遇到的任何问题!
          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2012-10-01
          • 2011-11-06
          相关资源
          最近更新 更多