【问题标题】:How to list all AWS S3 objects in a bucket using Java如何使用 Java 列出存储桶中的所有 AWS S3 对象
【发布时间】:2011-12-23 01:24:19
【问题描述】:

使用 Java 获取 S3 存储桶中所有项目列表的最简单方法是什么?

List<S3ObjectSummary> s3objects = s3.listObjects(bucketName,prefix).getObjectSummaries();

此示例仅返回 1000 个项目。

【问题讨论】:

标签: java amazon-web-services amazon-s3


【解决方案1】:

这可能是一种解决方法,但这解决了我的问题:

ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();

while (listing.isTruncated()) {
   listing = s3.listNextBatchOfObjects (listing);
   summaries.addAll (listing.getObjectSummaries());
}

【讨论】:

  • 对我来说似乎不是一种解决方法,这似乎是 API 的预期用途。
  • 有人建议this 编辑你的答案,如果你有兴趣
  • s3.listObjects 每个列表的默认限制为 1000 个元素,所以正如 @JoachimSauer 所说,这是 API 的预期用途
  • 这是一个危险的假设,即getObjectSummaries() 返回的List 是可变的。
  • 我可以知道这里的前缀是什么吗?
【解决方案2】:

对于那些在 2018 年及以上阅读本文的人。有两种新的无分页 API 可用:一种在 AWS SDK for Java 1.x 中,另一种在 2.x 中。

1.x

Java SDK 中有一个new API 允许您在不处理分页的情况下迭代 S3 存储桶中的对象:

AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();

S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {
    // TODO: Consume `objectSummary` the way you need
    System.out.println(objectSummary.key);
});

这个迭代是惰性的:

S3ObjectSummarys 的列表将在需要时被延迟获取,一次一页。页面大小可以通过withBatchSize(int)方法控制。

2.x

API 发生了变化,所以这里是 SDK 2.x 版本:

S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);

for (ListObjectsV2Response page : response) {
    page.contents().forEach((S3Object object) -> {
        // TODO: Consume `object` the way you need
        System.out.println(object.key());
    });
}

ListObjectsV2Iterable 也是懒惰的:

当操作被调用时,这个类的一个实例被返回。此时,尚未进行任何服务调用,因此无法保证请求有效。当您遍历可迭代对象时,SDK 将通过调用服务开始延迟加载响应页面,直到没有剩余页面或您的迭代停止。如果您的请求中有错误,则只有在开始迭代可迭代对象后才能看到失败。

【讨论】:

  • 真棒回答帮助很大,但我想询问更多信息。我想迭代像 Spring Pageable 这样的页面,例如请求前 20 个对象,如果需要,我可以请求第二页,然后是 20 个。有可能吗?
【解决方案3】:

这是直接来自 AWS 文档:

AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());        

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
    .withBucketName(bucketName)
    .withPrefix("m");
ObjectListing objectListing;

do {
        objectListing = s3client.listObjects(listObjectsRequest);
        for (S3ObjectSummary objectSummary : 
            objectListing.getObjectSummaries()) {
            System.out.println( " - " + objectSummary.getKey() + "  " +
                    "(size = " + objectSummary.getSize() + 
                    ")");
        }
        listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());

【讨论】:

    【解决方案4】:

    我正在处理我们系统生成的大量对象;我们更改了存储数据的格式,需要检查每个文件,确定哪些是旧格式,然后转换它们。还有其他方法可以做到这一点,但这与您的问题有关。

        ObjectListing list = amazonS3Client.listObjects(contentBucketName, contentKeyPrefix);
    
        do {                
    
            List<S3ObjectSummary> summaries = list.getObjectSummaries();
    
            for (S3ObjectSummary summary : summaries) {
    
                String summaryKey = summary.getKey();               
    
                /* Retrieve object */
    
                /* Process it */
    
            }
    
            list = amazonS3Client.listNextBatchOfObjects(list);
    
        }while (list.isTruncated());
    

    【讨论】:

      【解决方案5】:

      使用适用于 Java 的 AWS 开发工具包列出密钥

      http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html

      import java.io.IOException;
      import com.amazonaws.AmazonClientException;
      import com.amazonaws.AmazonServiceException;
      import com.amazonaws.auth.profile.ProfileCredentialsProvider;
      import com.amazonaws.services.s3.AmazonS3;
      import com.amazonaws.services.s3.AmazonS3Client;
      import com.amazonaws.services.s3.model.ListObjectsRequest;
      import com.amazonaws.services.s3.model.ListObjectsV2Request;
      import com.amazonaws.services.s3.model.ListObjectsV2Result;
      import com.amazonaws.services.s3.model.ObjectListing;
      import com.amazonaws.services.s3.model.S3ObjectSummary;
      
      public class ListKeys {
          private static String bucketName = "***bucket name***";
      
          public static void main(String[] args) throws IOException {
              AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
              try {
                  System.out.println("Listing objects");
                  final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
                  ListObjectsV2Result result;
                  do {               
                     result = s3client.listObjectsV2(req);
      
                     for (S3ObjectSummary objectSummary : 
                         result.getObjectSummaries()) {
                         System.out.println(" - " + objectSummary.getKey() + "  " +
                                 "(size = " + objectSummary.getSize() + 
                                 ")");
                     }
                     System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
                     req.setContinuationToken(result.getNextContinuationToken());
                  } while(result.isTruncated() == true ); 
      
               } catch (AmazonServiceException ase) {
                  System.out.println("Caught an AmazonServiceException, " +
                          "which means your request made it " +
                          "to Amazon S3, but was rejected with an error response " +
                          "for some reason.");
                  System.out.println("Error Message:    " + ase.getMessage());
                  System.out.println("HTTP Status Code: " + ase.getStatusCode());
                  System.out.println("AWS Error Code:   " + ase.getErrorCode());
                  System.out.println("Error Type:       " + ase.getErrorType());
                  System.out.println("Request ID:       " + ase.getRequestId());
              } catch (AmazonClientException ace) {
                  System.out.println("Caught an AmazonClientException, " +
                          "which means the client encountered " +
                          "an internal error while trying to communicate" +
                          " with S3, " +
                          "such as not being able to access the network.");
                  System.out.println("Error Message: " + ace.getMessage());
              }
          }
      }
      

      【讨论】:

        【解决方案6】:

        作为在 S3 对象可能被截断时列出它们的更简洁的解决方案:

        ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucketName);
        ObjectListing listing = null;
        
        while((listing == null) || (request.getMarker() != null)) {
          listing = s3Client.listObjects(request);
          // do stuff with listing
          request.setMarker(listing.getNextMarker());
        }
        

        【讨论】:

          【解决方案7】:

          格雷你的解决方案很奇怪,但你看起来是个好人。

          AmazonS3Client s3Client = new AmazonS3Client(new BasicAWSCredentials( ....
          
          ObjectListing images = s3Client.listObjects(bucketName); 
          
          List<S3ObjectSummary> list = images.getObjectSummaries();
          for(S3ObjectSummary image: list) {
              S3Object obj = s3Client.getObject(bucketName, image.getKey());
              writeToFile(obj.getObjectContent());
          }
          

          【讨论】:

          • 据我所知,此解决方案只会获取第 1000 个 kyes/文件并打印它们。但不会进一步迭代更多文件。
          【解决方案8】:

          我知道这是一篇旧文章,但这仍然可能对任何人有用:Java/Android SDK 2.1 版提供了一个名为 setMaxKeys 的方法。像这样:

          s3objects.setMaxKeys(arg0)
          

          您现在可能找到了解决方案,但请检查一个答案是否正确,以便将来对其他人有所帮助。

          【讨论】:

            【解决方案9】:

            这对我有用。

            Thread thread = new Thread(new Runnable(){
                @Override
                public void run() {
                    try {
                        List<String> listing = getObjectNamesForBucket(bucket, s3Client);
                        Log.e(TAG, "listing "+ listing);
            
                    }
                    catch (Exception e) {
                        e.printStackTrace();
                        Log.e(TAG, "Exception found while listing "+ e);
                    }
                }
            });
            
            thread.start();
            
            
            
              private List<String> getObjectNamesForBucket(String bucket, AmazonS3 s3Client) {
                    ObjectListing objects=s3Client.listObjects(bucket);
                    List<String> objectNames=new ArrayList<String>(objects.getObjectSummaries().size());
                    Iterator<S3ObjectSummary> oIter=objects.getObjectSummaries().iterator();
                    while (oIter.hasNext()) {
                        objectNames.add(oIter.next().getKey());
                    }
                    while (objects.isTruncated()) {
                        objects=s3Client.listNextBatchOfObjects(objects);
                        oIter=objects.getObjectSummaries().iterator();
                        while (oIter.hasNext()) {
                            objectNames.add(oIter.next().getKey());
                        }
                    }
                    return objectNames;
            }
            

            【讨论】:

              【解决方案10】:

              您不想一次列出存储桶中的所有 1000 个对象。更强大的解决方案是一次最多获取 10 个对象。您可以使用withMaxKeys 方法来做到这一点。

              以下代码创建一个 S3 客户端,一次获取 10 个或更少的对象并根据前缀进行过滤,并为获取的对象生成 pre-signed url

              import com.amazonaws.HttpMethod;
              import com.amazonaws.SdkClientException;
              import com.amazonaws.auth.AWSStaticCredentialsProvider;
              import com.amazonaws.auth.BasicAWSCredentials;
              import com.amazonaws.regions.Regions;
              import com.amazonaws.services.s3.AmazonS3;
              import com.amazonaws.services.s3.AmazonS3ClientBuilder;
              import com.amazonaws.services.s3.model.*;
              
              import java.net.URL;
              import java.util.Date;
              
              /**
               * @author shabab
               * @since 21 Sep, 2020
               */
              public class AwsMain {
              
                  static final String ACCESS_KEY = "";
                  static final String SECRET = "";
                  static final Regions BUCKET_REGION = Regions.DEFAULT_REGION;
                  static final String BUCKET_NAME = "";
              
                  public static void main(String[] args) {
                      BasicAWSCredentials awsCreds = new BasicAWSCredentials(ACCESS_KEY, SECRET);
              
                      try {
                          final AmazonS3 s3Client = AmazonS3ClientBuilder
                                  .standard()
                                  .withRegion(BUCKET_REGION)
                                  .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                                  .build();
              
                          ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(BUCKET_NAME).withMaxKeys(10);
                          ListObjectsV2Result result;
              
                          do {
                              result = s3Client.listObjectsV2(req);
              
                              result.getObjectSummaries()
                                      .stream()
                                      .filter(s3ObjectSummary -> {
                                          return s3ObjectSummary.getKey().contains("Market-subscriptions/")
                                                  && !s3ObjectSummary.getKey().equals("Market-subscriptions/");
                                      })
                                      .forEach(s3ObjectSummary -> {
              
                                          GeneratePresignedUrlRequest generatePresignedUrlRequest =
                                                  new GeneratePresignedUrlRequest(BUCKET_NAME, s3ObjectSummary.getKey())
                                                          .withMethod(HttpMethod.GET)
                                                          .withExpiration(getExpirationDate());
              
                                          URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);
              
                                          System.out.println(s3ObjectSummary.getKey() + " Pre-Signed URL: " + url.toString());
                                      });
              
                              String token = result.getNextContinuationToken();
                              req.setContinuationToken(token);
              
                          } while (result.isTruncated());
                      } catch (SdkClientException e) {
                          e.printStackTrace();
                      }
              
                  }
              
                  private static Date getExpirationDate() {
                      Date expiration = new java.util.Date();
                      long expTimeMillis = expiration.getTime();
                      expTimeMillis += 1000 * 60 * 60;
                      expiration.setTime(expTimeMillis);
              
                      return expiration;
                  }
              }
              

              【讨论】:

                【解决方案11】:

                试试这个

                public void getObjectList(){
                        System.out.println("Listing objects");
                        ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
                                .withBucketName(bucketName)
                                .withPrefix("ads"));
                        for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
                            System.out.println(" - " + objectSummary.getKey() + "  " +
                                               "(size = " + objectSummary.getSize() + ")");
                        }
                    }
                

                您可以使用特定前缀将存储桶内的所有对象。

                【讨论】:

                • 不,你不能,只有1000个文件限制,你没有读过上面,你的解决方案有同样的问题
                猜你喜欢
                • 2015-10-23
                • 2012-04-12
                • 1970-01-01
                • 2021-05-09
                • 1970-01-01
                • 2015-05-24
                • 1970-01-01
                • 1970-01-01
                • 2010-11-21
                相关资源
                最近更新 更多