Skip to main content

AWS EC2 storage

AMI

  • Build AMI from an EC2 instance
    • start an EC2 instance and customize it
    • stop the instance (for data integrity)
    • build an AMI - this will also create EBS snapshots
    • launch EC2 instances from other AMIs

EC2 instance store

  • EBS volumes are network drives with good but limited performance
  • if you need a high-performance hardware disk, use EC2 instance store
    • better I/O performance
    • will lose the storage if the instance is stopped
    • good for buffer/cache/scratch data/temporary content
    • risk of data loss if hardware fails
    • backups and replication are your responsibility
  • storage optimised instance types (i2, i3, i4) have instance store

EBS

  • EBS (Elastic Block Store) volume is a network drive that you can attach to your instance when they run
    • it's a network drive, not a physical drive, which means there might be a bit of latency
    • it can be detached from an EC2 instance and attached to another one quickly
    • some EBS can be attached to multiple instances (multi-attach)
  • Availability: when you create an EBS volume, it is automatically replicated within its AZ to prevent data loss due to failure of any single hardware component
  • Persistence: an EBS volume is off-instance storage that can persist independently from the life of an instance
    • it has a provisioned capacity (size in GBs and IOPS)
    • you can increase the capacity of the drive over time
    • Delete on termination
      • root volume: default is true which means it will delete the root volume of the instance when the instance terminates; the default value can be changed when launching the instance
      • non-root volume: default is false which means it will preserve the volumes; you can take a snapshot of the preserved volume or attach it to another instance
  • AZ: they're bound to a specific availability zone
    • you can attach an EBS volume to any EC2 instance in the same AZ
    • but an EBS volume in one AZ cannot be attached to another AZ
  • Flexibility: EBS volumes support live configuration changes while in production, which means you can modify volume types, volume size and IOPS capacity without service interruptions.
  • Analogy: think of them as a network USB stick

EBS snapshots

  • Make a backup (snapshot) of your EBS volume at a point in time
  • Not necessary to detach volume to do snapshot, but recommended
  • Can copy snapshot across AZ or Region
    • EBS snapshot archive
      • move a snapshot to an archive tier that is 75% cheaper
      • takes within 24 to 72 hours for restoring the archive
    • Recycle bin for EBS snapshots
      • setup rules to retain deleted snapshots so you can recover them after an accidental deletion
      • specify snapshot retention (from 1 day to 1 year)
    • Fast snapshot restore (FSR)
      • force full initialisation of snapshot at creation to have no latency on the first use (incur charges)

EBS volume types

  • General Purpose SSD (gp2/gp3): balance price and performance for a wide variety of workloads

    • cost effective storage, low latency
    • volume size: 1GiB - 16TiB
    • max throughput 250MiB/s (gp2), 1000MiB/s (gp3)
    • max IOPS: 16000
    • for gp3: you can increase IOPS up to 16000 and throughput up to 1000 independently
    • use cases
      • transactional workloads
      • medium-sized, single-instance databases
      • boot volumes, development and test environments
  • Provisioned IOPS SSD (io1, io2 block express): high performance SSD volume for mission-critical low-latency or high-throughput workloads

    • volume size: 4GiB - 64TiB (io2 block express), 4GiB - 16TiB (io1)
    • max throughput: 4000MiB/s (io2 block express) 1000MiB/s (io1)
    • max IOPS: 256000 (io2 block express), 64000 (io1)
  • io2 block express use cases:

    • sub-millisecond latency
    • sustained IOPS performance
    • more than 64000 IOPS or 1000 MiB/s of throughput
    • io1 use cases
      • workloads that require sustained IOPS performance or more than 16000 IOPS
      • I/O intensive database workloads
  • Throughput Optimised HDD: low cost HDD volume designed for frequently accessed

    • max IOPS: 500, max throughput: 500MiB/s
    • use cases:
      • big data
      • data warehouses
      • log processing
  • Cold HDD: lowest cost HDD volume designed for less frequently accessed workloads

    • max IOPS: 250, max throughput: 250MiB/s
    • use cases:
      • throughput-oriented storage for data that is infrequently accessed
      • scenarios where the lowest storage cost is important
  • Can be used as boot volumes: gp2/gp3, io1/io2 block express

  • Supports EBS multi-attach: io1/io2 block express

EBS Lifecycle Manager

  • automate the creation, retention, copy and deletion of snapshots and AMIs
    • fast snapshot restore integration
    • built-in cross-region copy
    • automated cross-account snapshots copy

EBS multi-attach

  • attach the same EBS volume to up to 16 EC2 instances built on Nitro System in the same AZ
  • each instance has full read & write permissions to the high-performance volume
  • only Provisioned IOPS SSD (io1 and io2) volume supports multi-attach

EBS encryption

  • You can create an encrypted EBS volume:
    • data at rest is encrypted inside the volume
    • all the data in flight moving between the instance and the volume is encrypted
    • all snapshots are encrypted
    • all volumes created from the encrypted snapshots are encrypted
  • Encryption and decryption are handled transparently using the keys from KMS (you have nothing to do)
  • Encryption has a minimal impact on latency
  • You can enable encryption when copying an unencrypted snapshot

EFS

  • Elastic File System: serverless, full elastic file storage that can be mounted on many EC2
  • EFS works with EC2 instances in multi-AZ, compatible with Linux based AMI
  • Use security groups to control access to EFS, encryption at rest using KMS
  • Performance mode
    • general purpose (recommended): high performance and latency-sensitive applications
    • max I/O: highly parallelised workloads that can tolerate higher latencies
  • Throughput mode
    • elastic: for workloads with un-predicable I/O, performance automatically scales with your workloads
    • provisioned: set your throughput
    • bursting: throughput scales with the amount of storage for workloads with basic performance
  • Highly scalable, highly available and highly durable, pay-per-use, no capacity planning
    • regional (recommended): store data redundantly across multiple AZ within one region
    • one zone: store data within a single AZ in one region
  • Storage layer
    • Standard: frequently access files
    • IA (infrequent access): cost to retrieve files, lower price to store
    • Archive: cost optimised for data that is accessed only a few times each year or less
  • The EFS lifecycle policies
    • transition to IA: by default, files that are not accessed in Standard storage for 30 days are transitioned into IA.
      • from 1 day to 365 days since last access
    • transition to Archive: by default, files that are not accessed in Standard storage for 90 days are transitioned into Archive.
      • from 60 days to 365 days since last access
    • Transition into Standard: by default, files are not moved back to Standard storage, and they remain in the IA or Archive storage class when they are accessed. For performance-sensitive use cases that demand the fastest latency performance, choose to transition files into Standard storage on first access.
      • None, or On first access

EFS vs EBS

  • EBS volumes
    • attach to one instance (except multi-attach io1/io2)
    • are locked at the AZ level
    • to migrate an EBS volume across AZ
      • take a snapshot
      • restore the snapshot to another AZ
  • EFS
    • mount to 100 of instances across AZ
    • only for Linux instances
    • has a higher price point than EBS, can leverage EFS-IA/archive for cost savings