By: Daniel Marquard, Cloud Engineer III
Although Amazon does not guarantee bandwidth or latency with Amazon Web Services’ S3 offering, there are steps you can take to accelerate data transfers to and from S3 buckets.
S3 Transfer AccelerationTransfer
Acceleration is a premium offering for S3, offering upgraded bandwidth to and from S3 buckets. Starting at $0.04 per gigabyte in addition to nominal S3 data transfer rates, Transfer Acceleration allows for data to be transferred quickly to and from S3 buckets via AWS edge locations.
This can be useful for serving and accepting content from over the Internet but is less effective for speeding up the transfer of data between S3 buckets.
Multithreading via AWS Command Line Interface
Using the AWS Command Line Interface (CLI), it is possible to simultaneously run multiple cp, mv, or sync operations in parallel. There are multiple approaches to dividing the work, but in this example, we’ll move files beginning with lowercase “a” through “n” with one command, and files beginning with lowercase “o” through “z” using a second command. aws s3 cp s3://srcbucket/ s3://destbucket/
–recursive –exclude “o*” –exclude “p*” –exclude “q*” –exclude “r*” –exclude “s*”
–exclude “t*” –exclude “u*” –exclude “v*” –exclude “w*” –exclude “x*”
–exclude “y*” –exclude “z*”
aws s3 cp s3://srcbucket/ s3://destbucket/
–recursive –exclude “a*” –exclude “b*” –exclude “c*” –exclude “d*” –exclude “e*”
–exclude “f*” –exclude “g*” –exclude “h*” –exclude “i*” –exclude “j*” –exclude “k*”
–exclude “l*” –exclude “m*” –exclude “n*”
When executed consecutively, the workload for this intra-bucket transfer is split between two jobs, decreasing the time it takes to copy data from one S3 bucket to another.
AWS Import/Export For transfers exceeding 1 TB, Amazon Web Services’ Snowball offering can be used. Snowball is a petabyte-scale data transfer solution for securely transferring large amounts of data in and out of AWS.
S3DistCp with Amazon Elastic MapReduce
When transfer time is of the utmost importance in transferring data across S3 buckets, S3DistCp can be used in conjunction with Amazon Web Services’ Elastic MapReduce (EMR). This approach, which requires running an EMR cluster, comes at an additional cost, but promises high speed and fault tolerance.