AWS Learn In Public Week 6, Advanced S3, Glacier And Athena

bucket

Photo by Alto Crew

Week 6 of my AWS learning journey. This week is diving deeper into what S3 can provide us. We're going to do a quick overview of a couple new services Glacier and Athena. Let's go into the details.

S3 Replication

Let's talk about AWS S3 Replication

  • To do that we must enable versioning in both source and destination buckets
  • We have Cross Region Replication (CRR) which is ideal for compliance, lower latency access and also when you want to replicate across accounts

Then we have same region replication (SRR) which can be for log aggregation or live replication between production and test accounts

  • In either case, buckets can be in different accounts
  • Copy is asynchronous and we must give proper IAM permissions

After enabling S3 Replication, only new objects are replicated and not everything.

  • When deleting. Deletes without version ID add a delete marker which is not replicated.
  • Deleting with a version ID, it deletes in the source and is not replicated

We cannot do "chaining" of replication.

  • That means that if Bucket A has a replication into Bucket B which then has a replication into Bucket C and we add an object into bucket A, it won't make it all the way to Bucket C

S3 pre-signed URLs

Let's talk about AWS S3 pre-signed URLs

  • We can generate links that have the same permissions on the file as when we open it via the AWS console.
  • Such links can be generated using the CLI (downloads) or SDK (uploads)
  • By default they are valid for 3600 seconds but we can change the timeout with the --expires-in x seconds argument
  • We can use them for cases like

    • Share link to content only with logged in users
    • Allow temporary actions for users

Here's the CLI command for it

bucket

S3 Storage classes and Glacier

We're going to do a quick overview of AWS S3 Storage classes and Glacier!

1. S3 standard which is a general purpose with high durability across multiple AZs.

  • 99.99% availability throughout the year.
  • S3 standard is great for Data analytics, mobile & gaming applications and more

2. S3 Standard-Infrequent Access (IA)

  • For data that are less frequently accessed but require almost instant access.
  • It is also high durability across multiple AZs
  • It has 99.9% availability
  • IA is good for Disaster recovery, backups etc

3. S3 One Zone-Infrequent Access

  • The same principle but only in a single AZ but 20% cheaper
  • It has 99.5% availability
  • Supports SSL for data in transit and encryption at rest
  • Good for secondary backups and other data you can recreate

4. S3 Intelligent tiering

  • Similar to S3 standard but automatically moves objects between tiers based on access patterns
  • It has a small monthly monitoring and auto-tiering fee

5. Glacier

  • Low cost and meant for archiving or backups
  • Good for long term retention like 10s of years
  • Archives are stored in vaults
  • There is a cost to retrieve which gets more expensive for faster retrievals (1 minute to 12 hours)
  • The minimum storage duration is 90 days.

6. Glacier Deep Archive

  • For looong term storage and really cheaper than all the other options
  • However the fastest way to retrieve vaults is 12 hours
  • The minimum storage duration is 180 days (half a year)

To give you an idea of the costs, for S3 standard it is $0.023 per GB. For deep Glacier it is $0.00099 per GB

https://aws.amazon.com/s3/pricing/

S3 lifecycle rules

Let's talk about AWS S3 lifecycle rules

  • We can move between storage classes based on how often we access our objects.
  • For infrequently accessed object, move them to standard IA
  • For archives and object we don't need instantly we can move them to Glacier or Deep Archive
  • We can even automate that with transition actions which are definitions when objects are transitioned to another storage class

    • Example: Move objects to Standard IA class 60 days after creation. Then archive them to Glacier after 6 months
  • We can also set expiration actions where we configure object to be deleted after a set amount of time

    • Example: Access log files can be set to delete after 365 days. Such files can be old versions or incomplete multi-part uploads
  • We can create these rules based on prefixes like s3://somebucket/archives/*
  • We can also add rules based on certain objects tags like Department: Sales

AWS Athena

Today let's talk about AWS Athena

  • It is a Serverless service to perform analytics directly against S3 files and uses a SQL language to query these files
  • It supports CSV, JSON and more.
  • Athena is quite common for cases like BI, analytics, reporting, ELB and CloudTrail logs and more

Summary

I have to admit, most of these features are not really that common out there. In my last 2-3 jobs I have only seen a very basic way of using S3 comparing to what we discuss here. However, based on the examples these features must definitely be used by companies especially when there are so many different storing tiers.

Next week we are going to talk about the service that caused me to want to learn more about how AWS works. That is ECS where we are going to talk about managing Docker containers, ECR and Fargate.

Did you enjoy this content?