3 S3 (Storage)

3 S3 (Storage)🔗ℹ

package: aws

S3 provides a fairly simple and REST-ful interface. Uploading an object to S3 is an HTTP PUT request. Download an object is a GET request. And so on. As a result, you may feel you don’t need a lot of “wrapper” around this.

Where you definitely will want help is constructing the Authorization header S3 uses to authenticate requests. Doing so requires making a string out of specific elements of your request and “signing” it with your AWS private key. Even a small discrepancy will cause the request to fail authentication. As a result, aws/s3 makes it easy for you to create the authentication header correctly and successfully.

Plus, aws/s3 does provide wrappers and tries to help with some wrinkles. For example, S3 may give you a 302 redirect when you do a PUT or POST. You don’t want to transmit the entire body, only to have S3 ignore it and you have to transmit it all over again. Instead, you want to supply the request header Expect: 100-continue, which lets S3 respond before you transmit the body.

3.1 Request Method🔗ℹ

parameter
(s3-path-requests?) → boolean?
(s3-path-requests? v) → void?
v : boolean?

The default value #f means "Virtual Hosted" style and #t means "Path Style" as described here. "Virtual Hosted" is preferred. (Use "Path Style" only if you have a legacy US Standard bucket with a name that doesn’t meet the restrictions for DNS – for which (valid-bucket-name? name #t) returns #f.)

3.2 Endpoint🔗ℹ

parameter
(s3-host) → string?
(s3-host v) → void?
v : string?
= "s3.amazonaws.com"

The hostname used for the S3 REST API. May be any value from the Endpoint column.

parameter
(s3-region) → string?
(s3-region v) → void?
v : string?
= "us-east-1"

The region used for the S3 REST API. Should be a value from the Region column of the same row as the value for s3-host.

Added in version 1.1 of package aws.

parameter
(s3-scheme) → (or/c "http" "https")
(s3-scheme v) → void?
v : (or/c "http" "https")
= "http"

The scheme used for the S3 REST API.

parameter
(s3-max-tries) → exact-positive-integer?
(s3-max-tries v) → void?
v : exact-positive-integer?
= 5

The number of attempts made for each HTTP request to S3.

When S3 returns certain 50x response codes, an additional (sub1 (s3-max-tries)) attempts will be made. If none succeed, then a exn:fail:aws exception is raised.

A value of 1 means try just once, in other words do not retry.

Added in version 1.7 of package aws.

3.3 Authentication signatures🔗ℹ

procedure
(bucket&path->uri bucket path-to-resource) → string?
bucket : string?
path-to-resource : string?

Given bucket and path (both of which should not start with a leading "/"), use s3-path-requests?, s3-scheme and s3-host to make the URI for the resource.

Example:

> (bucket&path->uri "bucket" "path/to/file")
"http://bucket.s3.amazonaws.com/path/to/file"

procedure
(bucket+path->bucket&path&uri b+p) →
string? string? string?
b+p : string?

Given a combined bucket+path string such as "bucket/path/to/resource", return the bucket portion, path portion and URI.

Example:

> (bucket+path->bucket&path&uri "bucket/path/to/file")
"bucket"
"path/to/file"
"http://bucket.s3.amazonaws.com/path/to/file"

procedure
(uri&headers b+p method [headers] body) →
string? dict?
  b+p : string?
  method : string?
  headers : dict? = '()
  body : #""

Return the URI and headers for which to make an HTTP request to S3. Constructs an Authorization header based on the inputs.

procedure
(sign-uri b+p method expires headers) → string?
  b+p : string?
  method : string?
  expires : (and/c exact-positive-integer? (between/c 1 604800))
  headers : dict?

Return a pre-signed URI valid for expires seconds.

Example:

(require aws)
(require aws/s3)

(public-key "akid")

(private-key "secret")

(s3-host "nyc3.digitaloceanspaces.com")
(s3-region "nyc3")
(s3-scheme "https")

(define bucket+path "unixcasts/10.mp4")
(define method "GET")
(define expires 900)
(sign-uri bucket+path method expires '())

3.4 Conveniences🔗ℹ

procedure
(create-bucket bucket-name [location]) → void?
bucket-name : string?
location : (or/c #f string?) = #f

Create a bucket named bucket-name in location. For valid values of location see the Region column. Omitting or supplying #f for location means the US Standard region.

Keep in mind that bucket names on S3 are global—shared among all users of S3. You may want to make your bucket names include a domain name that you own.

If you try to create a bucket with a name that is already used by another AWS account, you will get a 409 Conflict response.

If you create a bucket that already exists under your own account, this operation is idempotent (it’s not an error, it’s simply a no-op).

Use valid-bucket-name? to check the validity of the bucket-name you want to use.

procedure
(valid-bucket-name? bucket-name
[ dns-compliant?]) → boolean?
bucket-name : string?
dns-compliant? : boolean? = #t

Checks whether a bucket name meets the criteria described here. The dns-compliant? argument corresponds to the more-restrictive rules required for non-US Standard buckets and required to use the so-called Virtual Host request method corresponding to the default value #f of the s3-path-requests? parameter.

procedure
(delete-bucket bucket-name) → void?
bucket-name : string?

Delete a bucket named bucket-name.

This operation is idempotent (it is a no-op to delete a bucket that has already been deleted).

procedure
(list-buckets) → (listof string?)

List all the buckets belonging to your AWS account.

procedure
(bucket-location bucket [default]) → string?
bucket : string?
default : string? = "us-east-1"

Return bucket’s LocationConstraint value, if any, else default.

When dealing with arbitrary buckets, you might need to parameterize s3-region to this value because AWS Signature v4 requires a region to be specified. For example:

(parameterize ([s3-region (bucket-location "my-bucket")])
(ls "my-bucket/"))

Added in version 1.6 of package aws.

procedure
(ls/proc bucket+path
proc
init
[ max-each
#:delimiter delimiter]) → any/c
  bucket+path : string?
  proc : (any/c (listof xexpr?) . -> . any/c)
  init : any/c
  max-each : ((and/c integer? (between/c 1 1000))) = 1000
  delimiter : (or/c #f string?) = #f

List objects whose names start with the bucket+path (which is the form "bucket/path/to/resource"). S3 is queried to return results for at most max-each objects at a time.

For each such batch, proc is called. The first time proc is called, its first argument is the init value; subsequent times it’s given the previous value that it returned.

The second argument to proc is a (listof xexpr?), where each xexpr? respresents XML returned by S3 for each object. The XML is the Contents or CommonPrefixes portions of the ListBucketResults XML response, where CommonPrefixes are produced only for a non-#f delimiter.

The return value of ls/proc is the final return value of proc.

For example, ls is implemented as simply:

(map (λ (x) (se-path* '(Key) x))
(ls/proc b+p append '()))

procedure
(ls bucket+path) → (listof string?)
bucket+path : string?

List the names of objects whose names start with bucket+path (which is the form "bucket/path/to/resource").

procedure
(ll* bucket+path) → (listof (list/c xexpr? string? xexpr?))
bucket+path : string?

List objects whose names start with bucket+path (which is the form "bucket/path/to/resource"). Return a list, each item of which is a list consisting of:

an xexpr (as with ls/proc)
response headers from a HEAD request (as with head)
an xexpr representing the ACL (as with get-acl)

procedure
(ll bucket+path) → (listof (list/c string? string? xexpr?))
bucket+path : string?

List objects whose names start with the path in bucket+path (which is the form "bucket/path/to/resource"). Return a list, each item of which is a list consisting of:

name (as with ls)
response headers from a HEAD request (as with head)
an xexpr representing the ACL (as with get-acl)

procedure
(head bucket+path) → string?
bucket+path : string?

Make a HEAD request for bucket+path (which is the form "bucket/path/to/resource") and return the headers as a string in net/head format. You can provide this string to heads-string->dict from http/head.

procedure
(delete bucket+path) → string?
bucket+path : string?

Make a DELETE request to delete bucket+path (which is the form "bucket/path/to/resource")

procedure
(delete-multiple bucket paths) → string?
bucket : string?
paths : (listof string?)

Make a request to delete all paths (each which is the form "path/to/resource") in bucket. The paths list must have no more than 1000 elements.

procedure
(copy bucket+path/from bucket+path/to [heads]) → string?
  bucket+path/from : string?
  bucket+path/to : string?
  heads : dict? = '()

Tip: To rename an object, copy it then delete the original.

Copy an existing S3 object bucket+path/from to bucket+path/to, including its metadata. Both names are of the form "bucket/path/to/resource".

It is not an error to copy to an existing object (it will be replaced). It is even OK to copy an existing object to itself, as long as heads implies metadata changes.

Changed in version 1.8 of package aws: Added the heads argument.

procedure
(get-acl bucket+path [heads]) → xexpr?
bucket+path : string?
heads : dict? = '()

Make a GET request for the ACL of the object bucket+path (which is the form "bucket/path/to/resource").

S3 responds with an XML representation of the ACL, which is returned as an xexpr?.

procedure
(put-acl bucket+path acl [heads]) → void
  bucket+path : string?
  acl : (or/c xexpr? #f)
  heads : dict? = '()

Make a PUT request to set the ACL of the object bucket+path to acl.

If acl is #f, then the request body is empty and ACL changes must be provided by heads (e.g., as a canned ACL using 'x-amz-acl).

Changed in version 1.8 of package aws: Added the heads argument and allow #f for acl.

procedure
(get/proc bucket+path
reader
[ heads
range-begin
range-end]) → any/c
  bucket+path : string?
  reader : (input-port? string? -> any/c)
  heads : dict? = '()
  range-begin : (or/c #f exact-nonnegative-integer?) = #f
  range-end : (or/c #f exact-nonnegative-integer?) = #f

Although you may use get/proc directly, it is also a building block for other procedures that you may find more convenient, such as get/bytes and get/file.

Make a GET request for bucket+path (which is the form "bucket/path/to/resource").

The reader procedure is called with an input-port? and a string? respresenting the response headers. The reader should read the response body from the port, being careful to read the exact number of bytes as specified in the response header’s Content-Length field. The return value of reader is the return value of get/proc.

You may pass request headers in the optional heads argument.

The optional arguments range-begin and range-end are used to supply an HTTP Range request header. This header, which Amazon S3 supports, enables a getting only a subset of the bytes. Note that range-end is exclusive to be consistent with the Racket convention, e.g. subbytes. (The HTTP Range header specifies the end as inclusive, so your range-end argument is decremented to make the value for the header.)

procedure
(get/bytes bucket+path
[ heads
range-begin
range-end]) → bytes?
  bucket+path : string?
  heads : dict? = '()
  range-begin : (or/c #f exact-nonnegative-integer?) = #f
  range-end : (or/c #f exact-nonnegative-integer?) = #f

Make a GET request for bucket+path (which is the form "bucket/path/to/resource") and return the response body as bytes?.

You may pass request headers in the optional heads argument.

The optional arguments range-begin and range-end are used to supply an optional Range request header. This header, which Amazon S3 supports, enables a getting only a subset of the bytes. Note that range-end is exclusive to be consistent with the Racket convention, e.g. subbytes. (The HTTP Range header specifies the end as inclusive, so your range-end argument is decremented to make the value for the header.)

The response body is held in memory; if it is very large and you want to "stream" it instead, consider using get/proc.

procedure
(get/file bucket+path
file
[ heads
#:mode mode-flag
#:exists exists-flag]) → void?
  bucket+path : string?
  file : path-string?
  heads : dict? = '()
  mode-flag : (or/c 'binary 'text) = 'binary
   exists-flag : (or/c 'error 'append 'update 'replace 'truncate 'truncate/replace)
= 'error

Make a GET request for bucket+path (which is the form "bucket/path/to/resource") and copy the the response body directly to file. The keyword arguments #:mode and #:exists are identical to those for call-with-output-file*.

You may pass request headers in the optional heads argument.

procedure
(put bucket+path
writer
data-length
mime-type
reader
[ heads
#:chunk-len chunk-len]) → void?
  bucket+path : string?
  writer : (output-port . -> . void?)
  data-length : exact-nonnegative-integer?
  mime-type : string?
  reader : (input-port? string? . -> . any/c)
  heads : dict? = '()
  chunk-len : aws-chunk-len/c = aws-chunk-len-default
value
aws-chunk-len-minimum : (* 8 1024)
value
aws-chunk-len-default : (* 64 1024)
value
aws-chunk-len/c :
((and/c exact-nonnegative-integer?
        (>=/c aws-chunk-len-minimum)))

Although you may use put directly, it is also a building block for other procedures that you may find more convenient, such as put/bytes and put/file.
To upload more than about 100 MB, prefer multipart-put.

Makes a PUT request for bucket+path (which is the form "bucket/path/to/resource"), using the writer procedure to write the request body and the reader procedure to read the response body. Returns the response header (unless it raises exn:fail:aws).

The writer procedure is given an output-port? and a string? representing the response headers. It should write the request body to the port. The amount written should be exactly the same as data-length, which is used to create a Content-Length request header. You must also supply mime-type (for example "text/plain") which is used to create a Content-Type request header.

The reader procedure is the same as for get/proc. The response body for a PUT request usually isn’t interesting, but you should read it anyway.

Note: If you want a Content-MD5 request header, you must calculate and supply it yourself in heads. Supplying this allows S3 to verify the upload integrity.

chunk-len determines the length of the Content-Encoding: aws-chunked chunks used to perform AWS Signature v4 chunked uploads.

To use reduced redundancy storage, supply (hasheq 'x-amz-storage-class "REDUCED_REDUNDANCY") for heads.

procedure
(put/bytes bucket+path data mime-type [heads]) → void?
  bucket+path : string?
  data : bytes?
  mime-type : string?
  heads : dict? = '()

To upload more than about 100 MB, prefer multipart-put.

Makes a PUT request for bucket+path (which is the form "bucket/path/to/resource"), sending data as the request body and creating a Content-Type header from mime-type. Returns the response header (unless it raises exn:fail:aws).

A Content-MD5 request header is automatically created from data. To ensure data integrity, S3 will reject the request if the bytes it receives do not match the MD5 checksum.

To use reduced redundancy storage, supply (hasheq 'x-amz-storage-class "REDUCED_REDUNDANCY") for heads.

procedure
(put/file bucket+path
file
[ #:mime-type mime-type
#:inline? inline?
#:mode mode-flag]) → void?
  bucket+path : string?
  file : path-string?
  mime-type : (or/c #f string?) = #f
  inline? : boolean? = #f
  mode-flag : (or/c 'binary 'text) = 'binary

For files larger than about 100 MB, prefer multipart-put/file.

Upload file to bucket+path and return the response header (or raise exn:fail:aws).

The #:mode-flag argument is identical to that of call-with-input-file*.

If #:mime-type is #f, then the Content-Type header is guessed from the file extension, using a (very short!) list of common extensions. If no match is found, then "application/x-unknown-content-type" is used. You can customize the MIME type guessing by setting the path->mime-proc parameter to your own procedure.

A Content-MD5 request header is automatically created from the contents of file. To ensure data integrity, S3 will reject the request if the bytes it receives do not match the MD5 checksum.

If #:inline? is #f, then a Content-Disposition request header is automatically created from file. For example if file is "/foo/bar/test.txt" or "c:\\foo\\bar\\test.txt" then the header "Content-Disposition:attachment; filename=\"test.txt\"" is created. This is helpful because a web browser that is given the URI for the object will prompt the user to download it as a file. If #:inline? is #t, then the header "Content-Disposition:inline" is created instead, allowing the user to view the object from their web browser.

To use reduced redundancy storage, supply (hasheq 'x-amz-storage-class "REDUCED_REDUNDANCY") for heads.

parameter
(path->mime-proc) → procedure?
(path->mime-proc proc) → void?
proc : procedure?

A procedure which takes a path-string? and returns a string? with a MIME type.

3.5 Multipart uploads🔗ℹ

In addition to uploading an entire object in a single PUT request, S3 lets you upload it in multiple 5 MB or larger chunks, using the multipart upload API. Amazon recommends using this when the total data to upload is bigger than about 100 MB.

3.5.1 Convenience🔗ℹ

procedure
(multipart-put bucket+path
num-parts
get-part
[ mime-type
heads]) → string?
  bucket+path : string?
  num-parts : s3-multipart-number/c
  get-part : (exact-nonnegative-integer? . -> . bytes?)
  mime-type : string? = "application/x-unknown-content-type"
  heads : dict? = '()

Upload num-parts parts, where the data for each part is returned by the get-part procedure you supply. In other words, your get-part procedure is called num-parts times, with values (in-range num-parts).

Each part must be at least s3-multipart-size-minimum, except the last part.

The parts are uploaded using a small number of worker threads, to get some parallelism and probably better performance.

Changed in version 1.7 of package aws: Worker threads handle exceptions by returning work to the end of the to-do list to retry later, but no sooner than a delay that increases after each such retry.

procedure
(multipart-put/file bucket+path
file
[ #:mime-type mime-type
#:inline? inline?
#:mode mode-flag]
#:part-size part-size) → string?
  bucket+path : string?
  file : path?
  mime-type : string? = #f
  inline? : boolean? = #f
  mode-flag : (or/c 'binary 'text) = 'binary
  part-size : (or/c #f s3-multipart-size/c)
value
s3-multipart-size-minimum : (* 5 1024 1024)
value
s3-multipart-size-default : (* 5 1024 1024)
value
s3-multipart-size/c :
(and/c exact-positive-integer?
       (>=/c s3-multipart-size-minimum))

Like put/file but uses multipart upload.

The parts are uploaded using a small number of worker threads, to get some parallelism and probably better performance.

Although it’s usually desirable for part-size to be as small as possible, it must be at least 5 MB, and large enough that no more than 10,000 parts are required. When part-size is #f, the default, a suitable minimal size is calculated based on the file-size of file.

procedure
(incomplete-multipart-put/file bucket+path
file
#:mode mode
#:part-size part-size)
→
(or/c #f
      (cons/c string?
              (listof (cons/c s3-multipart-number/c string?))))
  bucket+path : string?
  file : path?
  mode : 'binary
  part-size : #f

EXPERIMENTAL. Use at your own risk. Subject to change or removal.

Use list-multipart-uploads to look for a multipart-put/file of bucket+path and file that was interrupted (neither complete-multipart-upload nor abort-multipart-upload was called and succeeded). If such an upload is found, use list-multipart-upload-parts to determine which of the previously uploaded parts have MD5 checksums that match the corresponding parts of file (that is, parts that do not remain to be uploaded). If any do, return the upload ID and a list of those parts. Otherwise return #f.

You may call this to determine whether resume-multipart-put/file would attempt to do anything, for example if you want to get user confirmation.

Added in version 1.5 of package aws.

procedure
(resume-multipart-put/file bucket+path
file
#:mode mode
#:part-size part-size)
→ (or/c #f string?)
  bucket+path : string?
  file : path?
  mode : 'binary
  part-size : #f

EXPERIMENTAL. Use at your own risk. Subject to change or removal.

If incomplete-multipart-put/file returns a non #f value, use the information to resume the upload by upload-part-ing the remaining parts, calling complete-multipart-upload, and returning the upload ID. Otherwise return #f.

Added in version 1.5 of package aws.

3.5.2 Building blocks🔗ℹ

Use these if the data you’re uploading is computed on the fly and you don’t know the total size in advance. Otherwise you may simply use multipart-put or multipart-put/file.

procedure
(initiate-multipart-upload bucket+path
mime-type
heads) → string?
  bucket+path : string?
  mime-type : string?
  heads : dict?

Initiate a multipart upload and return an upload ID.

procedure
(upload-part bucket+path
upload-id
part-number
bstr)
→ (cons/c s3-multipart-number/c string?)
  bucket+path : string?
  upload-id : string?
  part-number : s3-multipart-number/c
  bstr : bytes?
value
s3-multipart-number/c
: (and/c exact-integer? (between/c 1 10000))

Upload one part for the multipart upload specified by the upload-id returned from initiate-multipart-upload.

Note that S3 part numbers start with 1 (not 0).

bstr must be at least s3-multipart-size-minimum, unless it’s the last part.

Returns a cons of part-number and the ETag for the part. You will need to supply a list of these, one for each part, to complete-multipart-upload.

procedure
(list-multipart-uploads bucket) → xexpr?
bucket : string?

Get information about multipart uploads that haven’t been ended with complete-multipart-upload or abort-multipart-upload.

procedure
(list-multipart-upload-parts bucket+path
upload-id) → (listof xexpr?)
bucket+path : string?
upload-id : string?

Get a list of already-uploaded parts for a multipart upload that hasn’t been ended with complete-multipart-upload or abort-multipart-upload.

Added in version 1.3 of package aws.

procedure
(complete-multipart-upload bucket+path
upload-id
parts-list) → xexpr?
  bucket+path : string?
  upload-id : string?
  parts-list : (listof (cons/c s3-multipart-number/c string?))

Complete the multipart upload specified the by upload-id returned from initiate-multipart-upload, using a parts-list of the values returned from each upload-part. The parts-list does not need to be in any particular order; it will be sorted for you.

Returns S3’s XML response in the form of an xexpr?.

procedure
(abort-multipart-upload bucket+path
upload-id) → void?
bucket+path : string?
upload-id : string?

Abort the multipart upload specified by the upload-id returned from initiate-multipart-upload.

3.6 S3 examples🔗ℹ

(require aws/keys

aws/s3)

(define (member? x xs)

(not (not (member x xs))))

;; Make a random name for the bucket. Remember bucket names are a

;; global space shared by all AWS accounts. In a real-world app, if

;; you have a domain name, you probably want to include that as part

;; of your name.

(define test-bucket

(for/fold ([s "test.bucket."])

([x (in-range 32)])

(string-append s

(number->string (truncate (random 15)) 16))))

(credentials-from-file!)

(create-bucket test-bucket)

(member? test-bucket (list-buckets))

(define test-pathname "path/to/file")

(define b+p (string-append test-bucket "/" test-pathname))

(define data #"Hello, world.")

(put/bytes b+p data "text/plain")

(get/bytes b+p)

(get/bytes b+p '() 0 5)

(head b+p)

(ls (string-append test-bucket "/"))

(ls (string-append test-bucket "/" test-pathname))

(ls (string-append test-bucket "/" (substring test-pathname 0 2)))

(define p (build-path 'same

"tests"

"s3-test-file-to-get-and-put.txt"))

(put/file b+p p #:mime-type "text/plain")

(get/file b+p p #:exists 'replace)

(head b+p)

(member? test-pathname (ls b+p))

(define b+p/copy (string-append b+p "-copy"))

(copy b+p b+p/copy)

(ls (string-append test-bucket "/"))

(head b+p/copy)

(delete b+p/copy)

(delete b+p)

(delete-bucket test-bucket)

1	Introduction
2	All Services
3	S3 (Storage)
4	Dynamo DB (Database)
5	Simple DB (Database)
6	SES (Email)
7	SNS (Notifications)
8	SQS (Queues)
9	Route 53 (DNS)
10	Cloud Watch (Monitoring)
11	Glacier (Archives)
12	Utilities
13	Unit tests
14	License

3.1	Request Method
3.2	Endpoint
3.3	Authentication signatures
3.4	Conveniences
3.5	Multipart uploads
3.6	S3 examples