Validating Data Uploaded To Amazon S3 In Node.js

DECEMBER 18th, 2017

In this post a number of libraries are being used:

Axios - a promise based HTTP client
Crypto - to create an MD5 hash
File System - to work with the file system on a local machine
Funky - a helper library that provides a function which takes a node-style callback and returns a promise

The application that I'm writing test automation for contains media assets like images, video and audio. These assets are uploaded to Amazon S3 cloud storage when one of our users adds media to their projects. Users can then send duplicate copies of their projects containing these assets to other users.

I needed a way to validate that after the assets were uploaded, the user's project pointed to the proper assets in S3. In addition, I needed a way to validate that when a user sent a copy of their projects to someone else that the original project and the duplicate project each pointed to their own copies of the assets.

The first step I needed to accomplish was to validate that the file on the user's machine and the file uploaded to S3 were identical.

When data is uploaded to S3 a MD5 hash is calculated from the file and a URL is generated so the app can point to that asset in the cloud. In order to validate that the uploaded asset and the asset on the users local machine are the same I first calculated the MD5 hash of the local file, then compared it to the version uploaded to S3. They should be the same.

To generate the MD5 hash of the local file I pulled in Node's Cypto module. Then wrote a function that accepts a string and returns a MD5 hash. I yanked this function from Tom's Blog on calculating checksums in Node.js.

const crypto = require('crypto')

const MD5Hash = (str) =>
  crypto
    .createHash('MD5')
    .update(str, 'utf8')
    .digest('hex')

Next I used the readFile function in Node's File System module to read the contents of the file. Though instead of using the function as is, I passed it into the promisify function from Articulate's funky library as I am much more comfortable with the .then() syntax of promises then I am with callbacks.

const fs = require('fs') // Node's File System
const { promisify, } = require('@articulate/funky') // Articulate's promisify function
const readFileP = promisify(fs.readfile, fs) // Creates a promisified version of fs.readfile

Now that I've got a version of File System's readfile function readFileP set up to return a promise, I can use it to read the local file (returned as a string) then pass that into MD5Hash. The resulting MD5 hash will be stored in a variable to be used in our comparison.

const localFileMD5 = readFileP('/some-file.mp4')
  .then(MD5Hash)

After uploading the file to S3. We can make an HTTP head request to the file URL and pull off the MD5 hash created by S3 from the response headers. The ETag (entity tag) header is the hash of the uploaded file.

const uploadedFileMD5 = await axios.head(uploadedFileUrl)
  .then(res => res.headers.etag)

Now we can compare localFileMD5 and uploadedFileMD5 to validate that they are identical.

In the final step I needed to validate the duplication of the asset when a user sends another user a copy of their project. This is essentially a repeat of the above with one addition. While the MD5 hash of the duplicate should still match that of the local file, the URL to the duplicated asset should be different.

Note: S3 supports multi-part uploading. In cases where this is being used the calculation of the MD5 hash is more involved and this particular implementation will not work as your local hash will not match the ETag hash from S3. Digging around I found this blog post which links a bash script to split your file up the same way Amazon does and return you the correct ETag hash.