Optimising performance for Amazon S3

Rebecca Boardman
Dec 14, 2020
2 min read

Updated: Jan 25, 2021

What is Amazon S3? Amazon S3 (Amazon Simple storage service) is an object storage service and is great for storing static data. It offers pay as you go with unlimited storage so provides scalability for your business needs.

S3 performance- what can we do to scale up performance?

It’s worth noting that S3 automatically scales to high request rates and the latency will tend to be between 100-200ms to get the first byte. Your application can achieve at least 3,500 PUT/COPY,POST/DELETE and 5,500 GET/HEAD per second per prefix in a bucket

Prefixes

Prefixes can be used on files to scale out performance so that you can parallelise reads. For example, for reads performance you can get 5,500 per prefix but if you had 10 prefixes you could scale out to 55,000 per second! There is no limit with the amount of prefixes you can have within a bucket.

What is a prefix?

The easiest way to explain what a prefix it is just to show you. Below are examples of some paths to an S3 file which I have just pulled out the prefix:

bucket/folder1/sub1/file.jpg => folder1/sub1

bucket/folder1/sub2/file.jpg => folder1/sub2/

Utilizing Cloud front with S3

Cloud front is a content delivery network which is great for caching static data. CloudFront serves content through a worldwide network of data centers called Edge Locations. Using edge servers to cache and serve content improves performance by providing content closer to where viewers are located and CloudFront has edge servers in locations across the world.

S3 transfer acceleration (uploads)

Multi-part upload

This is recommended for files over 100MB and MUST be used for files over 5GB. By separating a file into multiple parts you speed up the upload time through parallelising uploads. If one part fails to upload, it will just retry that one part rather than restarting the whole process. S3 is then able to construct the file back together to form 1 large object.

S3 byte-range fetches Downloads

Parallelises GETS by requesting specific byte ranges. In case of failure it will only retry the bytes which have been specified that have failed. This is a great way to speed up downloads and can also be useful if you’re only interested in getting a subset of a file (e.g. head of a file). This is great for saving on bandwidth.

S3 select and Glacier select

At the time of writing, this is a relatively new feature for Amazon S3 and retrieves less data by allowing you to retrieve a subset of data from an object using SQL by performing server side filtering. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases. It is key to note that objects must be in CSV, JSON, or Parquet format.

Want to see the above in action? Click here which will give you a quick demo from Be a better Dev that talks you through how this works.

That’s it for this weeks blog post. I do have a question for YOU who are reading this:

Do you use any of the above day to day? Have you come across any headaches while using any? Feel free to email me beckydevops@hotmail.com should you have any questions or wish to just have a chat!

Optimising performance for Amazon S3

Recent Posts

Comments