Iterating over files in AWS S3

The following example was written in plain JavaScript. The code example will work in AngularJS, or any other JavaScript framework.

A year ago I wrote about reading from an S3 bucket here. This method was good but came with a shortfall. In the previous article, a single call to amazon was made to import S3 files. However, each call to amazon only pulls 1000 files at a time. In order to get more you need to call again using a continuation token provided in the previous call.

The issue am addressing today is how to do this without causing the browser to freeze.

How not to iterate AWS S3 files

Looking at the previous example code, if you tried to run this example:

var params = {
                Bucket: "[YOUR-BUCKET]",
                Delimiter: "/",
                EncodingType: "url"
            };        
            s3.listObjects(params, function (err, response) {
                data = JSON.parse(JSON.stringify(response.Contents));             
            });

You would only get 1000 records. If you then attempted to wrap it say in a for loop or while loop your browser would freeze up. I am not providing that code because it would waste both of our time to try it. It would also be bad practice use a for loop where you would make perhaps 10 calls to ensure you imported all 6000 guesstimated records. You would have to make additional calls if you were unsure of, or the number of records, changed contiguously. A while loop would be a better option.

Using a while loop, you could check for the [Marker], also written as: response.Marker in the data. Writing a loop as:

while(data.Marker != undefined){ /* do something */ }

However, the AWS calls would still freeze up the browser and the result would be that your users received no records and their browsers would crash.

How to we call AWS for lots of records?

We will use callbacks. A callback is a function that will be executed after another function has finished its execution. This is the advantage over for and while loops, using the callback allows the code to wait for the response before executing the next step. This is important because the AWS calls operate using promises which is an asynchronous operation. While and for loops are synchronous operations and this causes our problem. Here is an example of how we would call AWS S3 using callbacks to get all of our 8K records.

<script src="https://cdnjs.cloudflare.com/ajax/libs/angular.js/1.5.5/angular.min.js"></script>
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.19.0.min.js"></script>
  
  <script>


    let FilterSet = [];
    let BigData = [];
    
    function init(callback) {        
        let data = [];
        let rNextMarker = "";
        let accessKeyId = ""; //[-- your accessKeyId --]
        let secretAccessKey = ""; //[-- your secretAccessKey --]
        let region = ""; //[-- your region --]
        let bucket = ""; //[-- your bucket --]
      
        if(accessKeyId != "" && secretAccessKey != ""){
          AWS.config.update({ "accessKeyId": accessKeyId, "secretAccessKey": secretAccessKey, "region": region });
          let s3 = new AWS.S3();
          var out = setInterval(function () {
              if (rNextMarker != undefined) {
                  s3.listObjects({ Bucket: bucket, Delimiter: "/", EncodingType: "url", Marker: rNextMarker }
                      , function (err, response) {
                          data = JSON.parse(JSON.stringify(response.Contents));
                          rNextMarker = response.NextMarker;
                          for (var a = 0; a < data.length; a++) { 
                              BigData.push(data[a]);
                              FilterSet.push(data[a]);
                          }                      
                          callback(BigData);
                      });
              }
              else {
                  //
              }
          }, 500);   
        }
    }
    init(function (x) {
        DrawDocList(BigData);
    });
    function DrawDocList(docList) {
        //write out the results
        console.log(docList);
    }
  </script>

Here is a codepen for you to play with.

Let’s walk through the code and go over what we’ve done here.

First we need a couple of libraries: jQuery is required by the Amazon SDK, versions 2+ will work, and the Amazon SDK. Next, we will need variables for our records, access keys, region and target bucket.

The example is comprised of 2 methods.

The first method: function init(callback) { /* code */ } . Will handle making the calls to AWS. This method will take a callback to execute once complete. When this first method completes the second method: DrawDocList will execute, this is where we would handle drawing out our records for the user but for the example we will simply output them to the console.

We will kick the entire code off by calling our init() method with callback [x] and a call to DrawDocList(). In this example, when the page is loaded, the code will execute and calls will be made to AWS S3 until each set of 1000 is received. We will know that no more records exist when response.NextMarker is returned with an undefined.

In order to run the example, you will need to get your own Amazon keys, setup a region, and bucket, and if you want to see it iterate over more than 1000 records, you will need to have a bucket containing more than 1000 records. Hope this helps.

Happy Coding!

Leave a Reply

Your email address will not be published. Required fields are marked *