AWS

Created: May 01, 2020

EC2

Limits: default limits for accounts. Can be changed by contacting AWS.
Reserved instances: purchase a reserved instance for 1y or 3y. In return, you get a discount.
Spot requests: purchase spare capacity at a discounted rate. The caveat is AWS can shut down the instances (with a bit of notice). State should be saved elsewhere.
Savings plan: similar to reserved instances. Compute Savings Plans and EC2 Instance Saving plans available. Option to choose different models.
Scheduled instances: purchase capacity on a recurring schedule.
Capacity reservations: reserve capacity to ensure it is available.
Static IPs: you get charged for not using them.

Launching EC2 instances

Public subnet: has Internet gateway, has entry in route table pointing to Internet gateway and EC2 instances have public IP addresses.
Must chmod 400 .pem key for ssh, otherwise we cannot connect.

EC2 metadata

To view all metadata categories, from an EC2 instace:

curl http://169.254.169.254/latest/meta-data

The IP is always the same, regardless of EC2 instance.
To retrieve information from a category:

curl http://169.254.169.254/latest/meta-data/[CATEGORY]
curl http://169.254.169.254/latest/meta-data/public-hostname

EC2 user data

Shell commands that can be specified at instance creation.
To retrieve user data from inside an EC2 instance:

curl http://169.254.169.254/latest/user-data

Status checks and monitoring

System Status Checks: AWS has fucked-up
Instance Status Checks: admin can configure
Monitoring (CloudWatch): basic is free. Detailed is not.
Basic ticks at 5 minutes and is an average.
Alarms: can take action and/or create notification.

IP addresses

Name	Description
Public IP address	Lost when the instance is stopped (dynamic)
	Used in Public Subnets
	No charge
	Associated with a private IP address of the instance
	Cannot be moved between instances
Private IP address	Retained when the instance is stopped
	Used in Public and Private Subnets
Elastic IP address	Static Public IP address
	You are charged if not used
	Associated with a private IP address on the instance
	Can be moved between instances and Elastic Network Adapters

Internet gateway performs NAT.
Associating an elastic IP address dissociates a dynamic IP address.

NAT Gateway

NAT Instance (older)	NAT Gateway
Manage by you (e.g. software updates)	Managed by AWS
Scales up (instance type) manually use enhanced networking	Elastic scalability up to 45 Gbps
No HA – scripted/auto-scaled HA possible using multiple NATs in multiple subnets	Provides automatic high availability within an AZ and can be placed in multiple AZs
Need to assign Security Groups	No Security Groups
Can use as a bastion host	Cannot access through SSH
Use an Elastic IP address or a public IP address with a NAT instance	Choose the Elastic IP address to associate with a NAT gateway at creation
Can implement port forwarding through manual customisation	Does not support port forwarding

NAT Instance is an EC2 instance.
NAT Gateways must be in a public subnet.
Set the NAT device as the 0.0.0.0/0 route in a route table. Associate a subnet with the route table.
A pre-configured image for a NAT Instance is available by searching “amzn-ami-vpc-nat” in Community AMIs.
In the EC2 network settings for the NAT Instance, Source/Destination Check must be disabled to allow port forwarding.

EC2 Placement Groups

EC2 instances can be placed in Placement Groups when they are launched. This can be done retroactively, however, it isn’t recommended.
Placement Groups determine which physical hardware instances are on.
There are three types:
- Cluster: pack instances close together inside an AZ. This helps achieve low latency networking.
- Partition: spreads your instances across logical partitions such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. Used for large distributed and replicated workloads. Each partition in one rack. Partitions can be in multiple AZs.
- Spread: strictly places a small groups of instances across distinct underling hardware to reduce correlated failures. Each instance is in a separate rack.
There are requirements on what instance can go in to each type of PG.

	Clustered	Spread	Partition
What	Instances are placed into a low-latency group with a single AZ.	Instances are spread across underlying hardware.	Instances are grouped into logical segments which use distinct hardware.
When	Need low network latency and/or high network throughput.	Reduce the risk of simultaneous instance failure if underlying hardware fails.	Need control and visibility into instance placement.
Pros	Get the most out of enhanced networking instances.	Can span multiple AZs.	Reduce likelihood of correlated failures for large workloads.
Cons	Finite capacity: recommend launching all you might need up-front.	Maximum of 7 instances running per group, per AZ.	Partition placement groups are not supported for dedicated hosts.

Networking Interfaces

Elastic Network Interface (ENI): virtual NIC.
- Can have IP address, security group, MAC, source/destination check flag and description.
- ENI is bound to an AZ.
- Can be attached/detached from different instances in an AZ.
- eth0 cannot be moved.
Elastic Network Adaptor (ENA):
- Used for Enhanced Networking.
- Provides higher bandwidth, higher packet-per-second (PPS) performance, and lower latency.
- Must launch an HVM AMI.
- Only available for certain instances types.
Elastic Fabric Adaptor (EFA):
- An ENA with more capabilities.
- Enables customers to run applications requiring high levels of inter-node communications at scale on AWS.
- With EFA, High Performance Computing (HPC) applications using Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs.

Using S3 in EC2

The bad way:

You need an access key from IAM > Users > Security credentials.
When inside an EC2 virtual terminal, type aws configure.
Enter the details from the access key.
S3 can now be accessed via commands such as aws s3 ls.
This is insecure because there is a ~/.aws/credentials file with the creds stored in plain text.

The good way:

Create a role in IAM and assign it permission to access S3.
Assign the role to the EC2 instance.
S3 can now be accessed via commands such as aws s3 ls.

EC2 Auto Scaling Groups (ASG)

Enables automatic loading of instances.
Metrics define elasticity.
Metrics can be fed into CloudWatch, which can trigger the Auto Scaling group to create or destroy instances.
Auto scaling groups can also be used with Elastic Load Balancers, rather than CloudWatch.
When creating an ELB for use with auto scaling groups, avoid registering instances as targets. This will be controlled by the auto scaling group.
There are three types of scaling policy: simple, step scaling and target tracking scaling.
With step scaling, each step adjustment specifies the following:
- A lower bound for the metric value.
- An upper bound for the metric value.
- The amount by which to scale, based on the scaling adjustment type.
Target tracking scaling: set a target and the ASG attempts to meet it.
Launch Configurations allow us to specify settings for instances launched by an ASG.
Launch Configuration cannot be modified, only copied and deleted.
Launch Templates are a new alternative to Launch Configuration.
Launch Templates have versioning.
Launch Templates have more options than Launch Configuration.
In EC2, templates can be created from existing instances.
When using a Launch Template in an ASG, there is a Fleet Composition option. Fleet Composition can allow both On-Demand and Spot instances to be requested (as opposed to either or) . There is an Instance Type option, to be used with Fleet Composition, whereby multiple types of acceptable instances can be specified.
Once created, there are more configuration options for Fleet Composition. For example, the ability to specify a percentage of Spot and a percentage of On-Demand.
By default, ASG uses EC2 Status Checks.
ELB Heath Checks are an optional (recommend setting) in ASG.
If ELB Health Checks are enabled in ASG, both types are used.
If both are enabled, both checks need to pass.
Default termination policy:
1. Determine which AZ has the most instances.
2. (Launch Templates only) attempts to meet the target we have set for On-Demand or Spot instances.
3. Determines which instance uses the oldest template.
4. Determines which instances uses the old launch configuration.
5. If everything else passes, it terminates the instance closest to the next billing hour.
There are various other termination policies.
We can protect instances from termination with “Instance Protection: Protect From Scale In”.
With Suspended Processes, we can specify processes that an ASG should not take. For example, specifying terminate means that an ASG will not terminate instances. Good for troubleshooting instances.

Storage

Instances store volumes are ephemeral local disks that offer very high performance.

IAM

Only one IAM role can be attached to an EC2 instance at a time.
IAM roles are universal and region-less.
Preferable to storing private keys on instances.

Roles

An identity to which you can assign policies. Policies have permissions.
Can be temporarily assigned to EC2 Instances.
Avoids storing secrets on instances.

Security groups

A firewall applied at the instance level.
Multiple security groups can be applied to an instance.
Source/destination can be a security group as well as IP and subnet.

Elastic Load Balancing (ELB) and Load Balancing

You cannot load balance between regions.
High availability and fault tolerance.
Health checks used to determine if an instance is available.
Application and network load balancers have target groups.
An instance can be in multiple target groups.

ELB Types

Application Load Balancer

Operates at layer 7.
Path-based routing, host-based routing, query string parameter-based routing, and source IP address-based routing.
Path-based routing is based on the part of the URL after the domain name.
Host-based routing is based on information in the HTTP header.
Supports IP addresses, Lambda Functions and containers as targets.
Instance Protocol and Load 0 Protocol: HTTP and HTTPS.
Can specify rules.
You can only have one listener per protocol.

Network Load Balancer

Operates at layer 4.
Very high performance, low latency and TLS offloading at scale.
Can have static IP/Elastic IP.
Supports UDP and static IP addresses as targets.
Instance Protocol: TCP, and TCP_UDP.
Load Balancer Protocol: TCP, TLS, UDP, and TCP_UDP.

Classic Load Balancer

Old generation; not recommended for new applications.
Performs routing at layer 4 and layer 7.
Use for existing applications running in EC2-Classic.
Instance Protocol and Load Balancer Protocol: TCP, SSL, HTTP, and HTTPS.
Addressed using a domain name only. IPs are kept private.
CLB can be Internet-facing, Private or Multi-tier (hybrid).
CLBs use EC2 instances and take up IP addresses. AWS recommend at least a /27 mask on the subnet and at least 8 IP addresses are reserved. When the load balancer needs to scale, these IP addresses are used.
Instance Protocol and Load Balancer Protocol must be on the same layer.

Cross-Zone Load Balancing

When there are different amounts of nodes in AZs and Cross-Zone Load Balancing is disabled, the nodes in each AZ will be getting different percentages of the traffic. For example, in Zone A, three nodes might be getting 16%, and, in Zone B, two nodes might be getting 25%.
With Cross-Zone Load Balancing enabled, all nodes get an equal amount of traffic.

Different types of LB have different options and defaults:

Name	Created through Console	Created through CLI/API	Can be enabled/disabled?
ALB	Enabled	Enabled	No
NLB	Disabled	Disabled	Yes
CLB	Enabled	Disabled	Yes

ELB Sticky Sessions

Stick a session for a certain period, such that the client isn’t directed to another server, potentially breaking their session.
Uses cookies.
ALBs support load balancer generated cookies (name: AWSALB) but do not support application generated cookies.
NLBs do not support cookies.
CLBs support both load balancer generated cookies and application generated cookies.

Load Balancing Instances in Private Subnets

If target group instances are in private subnets, an Internet-facing ELB must be in a public subnet in the same AZ.

ELB Connections and Logging

ELBs can be configured to store logs in S3 buckets.
X-Forwarded-For headers and Proxy Protocol are a protocols that can be used pm proxies to ensure that the correct source address is forwarded to endpoints.
X-Forwarded-For operates on layer 7.
Proxy Protocol operates on layer 4.
There are some requirements when SSL is involved, outlined here.
When there is full end-to-end SSL encryption, Proxy Protocol is not supported.
NLBs send IPs by default and can be configured to also use Proxy Protocol.
If targets are specified by IP address in an NLB, source IPs won’t be present and use of Proxy Protocol is necessary.
ALBs send X-Forwarded-For by default.
Apache needs some config to use X-Forwarded-For. /etc/httpd/conf/httpd.conf need editing with some config.

The LogFormat section needs changing to something like this:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" forwarded
CustomLog "logs/access_log" combined env=!forwarded
CustomLog "logs/access_log" proxy env=forwarded

Source: https://www.loadbalancer.org/blog/apache-and-x-forwarded-for-headers/

Storage

S3 is object storage and is accessed via a REST API.
Elastic Block Storage (EBS) is block storage for OSes and applications with high performance requirements. EBS is only available with the AZ it was created. EBS can only be mounted to a single EC2 instance.
Elastic File Storage (EFS) is file storage. It can be mounted in multiple AZs using NFSv1. Multiple endpoints can mount one filesystem. EFS filesystems can be accessed outside of AWS using a VPN.

S3

S3 is object storage.
Sits outside the VPC.
You can access buckets using URLs.
- http://bucket.s3.aws-region.amazonaws.com
- http://s3.aws-region.com.amazonaws.com/bucket
- There is some variation in the format of aws-region
Objects have a key, version ID, value, metadata, sub-resources and access control info.
An S3 Gateway Endpoint is provide a private connection to S3 and avoid routing over the public Internet.

S3 Gateway Endpoint

We choose which route table we want it in when its created.
Creates a routing table entry automatically when created.

VPC

IPv6 can be enabled at VPC creation.
When creating a VPC, there is a tenancy option. The options are default or dedicated hardware. Dedicated hardware might be used for regulatory reasons.
By default a VPC is created with DNS hostnames disabled, which means a DNS name will not be assigned by default. This can be enabled through actions.
Amazon reserves the first 4 and last IP address in a subnet.
VPC is created with a master address range; subnets use addresses in this range. This master range is impossible to change; the VPC needs to be deleted and re-created.

Security groups

Security groups exist within a VPC.
Security groups are essentially instance-level firewalls on a NIC.
Any new security groups have no inbound allow by default.
Up to 5 security groups can be attached to an instance.
There are only permits. No denies.
All groups attached to an instances are evaluated for allows.
If there are no allows, there is an implicit deny at the end.
A /32 netmask implies a specific IP address and no other.
Security groups are stateful, whatever is allowed in is allowed back out.

Network ACLs (NACLs)

A NACL is a firewall at the subnet level.
The NACL is actually on the VPC router.
A NACL can have allow or deny rules.
There is an implicit deny at the end of a NACL.
NACL rules are numbered and processed first to last.
A subnet can only have one NACL.
Default NACL = allow everything inbound and outbound.
Newly created custom NACL = deny everything inbound and outbound.
NACLs are stateless, traffic that is allowed in must be allowed back out, if that is the desired behaviour.
There is a gotcha with allowing traffic back out in that ports 1024 - 65535 might be used. For example, for HTTP.
Amazon suggests large gaps are left between each NACL rule number (e.g. 100, 200).
IPV4 rule number + 1 should be the IPV6 rule number.
The first allow or deny is processed and nothing is processed after that. For a deny to be effective, it needs to be a lower numbered rule than an allow.

VPC Peering Connections

VPC peering allows traffic to be sent between multiple different VPCs.
The VPCs can be in different accounts and different regions.
VPCs in different regions are connected via the AWS backbone, rather than the Internet.
VPCs can be redundantly connected.
Each VPC must have a different CIDR block.
To route between VPCs, the route tablse must have an entry with the a peering connection ID.
When there are more than two more VPCs, there must be peering between each of them for everything to be routable. Traffic cannot be routed through a VPC with peering, to get to a VPC without peering. However, this is possible with a Transit Gateway.

Transit Gateways

Can connect VPCs together.
Can also connect VPCs to on-prem infrastructure.

VPC Endpoint Services and Endpoints

VPC enable you to connect to services are shared in other accounts or from other VPCs. Those services can be private services or they can be AWS services such as S3.
Endpoint services allow routing over the AWS PrivateLink backbone rather than the Internet.
NLBs are necessary for Endpoint Services.
An interface endpoint uses an ENI as an entry point.
A Gateway Endpoint serves as a target for a route in the routing table.
VPC Gateway Endpoint currently supports S3 and DynamoDB services
Endpoint Services may need endpoint requests accepted, if enabled.
Services include some AWS services for e.g. CloudTrail, CloudWatch etc., services hosted by other AWS customers and partners in their own VPCs (referred to as endpoint services), and supported AWS Marketplace partner services.

VPC Flow Logs

VPC Flow Logs allow logging of ingress and egress traffic.
Data is logged either to CloudWatch or S3.
Flow Logs can be enabled at either the VPC, subnet or instance (ENI) level.
When adding a Flow Log to an interface, we add it to an ENI.
Can’t modify flow logs. Delete and re-create.
Good way to debug security groups as traffic rejects show up.

Publishing to CloudWatch

An IAM role must include the following to enable this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}   

This is the least privilege necessary.

After role creation, the following needs to be pasted for the role trust relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "vpc-flow-logs.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
} 

Users must also have permissions to use the iam:PassRole action for the IAM role that’s associated with the flow log.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["iam:PassRole"],
      "Resource": "arn:aws:iam::account-id:role/flow-log-role-name"
    }
  ]
}

Source: https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-cwl.html

VPN

Site-to-site VPN (IPSec/TLS):
Create a Customer Gateway (on-prem/non-Amazon DC)
Create a Virtual Private Gateway (VPG).
Create a VPN connection.
Attach the VPG to a VPC.
There needs to be an entry in the route table for the subnets at the other end of the VPN, pointing at the VPN Gateway.
There is also Client VPN Endpoint which allows devices to be connected to a VPC without a site-to-site VPN.

AWS Direct Connect

(Menu outside of VPC.)

Allows direct connection of an organisation’s network to AWS for low latency, high bandwidth, consistency, and the ability to route with private IPs.
There are direct connect locations scattered around the world.
The customer purchases a cage (in a colo) next to an AWS cage and plugs into the AWS router (AWS Direct Connect endpoint).
Virtual interfaces (VIF) are used to route between public or private services.
A public VIF routes to S3, EC2, etc; anything with a public IP.
A private VIF routes to a VPN gateway connected to a VPC.
1 gbps or 10 gbps bandwidth on VIFs.
By default, Direct Connect only routes within a single region.
To route to VPCs in other regions, a Direct Connect Gateway is needed.
When a Direct Connect Gateway is implemented it needs to be connected to the Direct Connect endpoint using a private VIF.

VPC Wizard

Creates various pre-configured VPCs.
Various options for the Systems Architect Exam.

Route 53

Route 53 can do health checks on instances, web servers and devices outside of AWS.
The results of the health checks can determine the DNS responses from Route 53.
Hosted zones can either be public or private.
Host zones are tied to a VPC.
enableDnsHostnames and enableDnsSupport must be enabled in a VPC for hosted zones to work. In actions for the VPC.
Routing policy is defined when records are created.
Make sure TTL is low otherwise failover will not occur.

DNS Record types

A
AAA
CNAME (canonical name record). Resolve one domain to another.
Alias; like a CNAME. Route 53 specific.
CAA (certification authority authorization).
MX (mail exchange record).
NAPTR (name authority pointer record).
NS (name server).
PTR (pointer record).
SOA (start of authority).
SPF (sender policy framework).
SRV (service locator).
TXT (text record).

Aliases vs CNAMES (key differences)

CNAME	ALIAS
Route53 charges for CNAME queries	Route53 doesn’t charge for alias queries to AWS resources
You can’t create a CNAME record at the top node of a DNS namespace (zone apex)	You can create an alias recoard at the zone apex (however you can’t route to a CNAME at the zone apex)
A CNAME can point to any DNS record that is hosted anywhere	An alias record can only point to a CloudFront distribution, Elastic Beanstalk environment, ELB, S3 bucket as a static website, or to another record in the same hosted zone that you’re creating the alias record in.

Simple routing policy

An A record is associated with one or more IP addresses.
Cycles through the IP addresses using round robin.
Does not support health checks.

Weighted routing policy

Specify a weight, which is a percentage, for each IP to be used.
Optional health checks.
Not optional when pointing at a load balancer.

Latency-based routing

Optional health checks.

Fail-over routing

Health-check on the primary resource.
Fail over to the secondary resource.

Geolocation Routing Policy

Similar to latency routing, however, routes can be locked by geolocation.
Country, continent or sate (in the US).

Multivalue routing

Multiple IP addresses are returned.
Client-side load balancing.
If a health check is not configured, a record is always returned.
If a health check is failed, a record is not returned.
If there are eight or fewer records, Route 53 responds will all healthy records.
If all eight records are unhealthy, Route 53 responds with up to eight unhealthy records.

Traffic flow

Use a visual editor to create a sophisticated sequence of rules that can be applied to a hosted zone or namespace.

Router 53 Resolver

Enable seamless DNS resolution across hybrid cloud environments.
Create inbound and outbound endpoints in AZs.
Create rules for what goes where.

AWS Global Accelerator

You get given two static IPs.
They are AnyCast IPs, that such that they can point to multiple endpoints seamlessly.
Client-side DNS cache therefore not a problem.
Can failover to instances in different regions if an instance is unhealthy.
Can specify a traffic dial per region to determine where traffic goes. 100 means traffic goes to the closest region.
Within a region, weights for individual endpoints can also be specified.
Endpoints can be either an ALB, NLB, EC2 instance or elastic IP.
Routed over Amazon backbone.
IP costs and data usage costs money.

EBS

Instance stores are a type of ephemeral storage, based on NVMe, that are directly attached to EC2 instances.

Volume types

General Purpose SSD (gp2): default. Burst up to 3000 IOPS per volume. Baseline of 3 IOPS/GB. Recommended for most workloads. System boot volumes. Virtual desktops. Low-latency interactive apps. Dev and test environments.
Provisioned IOPS SSD (io1): up to 20000 IOPS. Up to 50 IOPS/GB (configurable). Critical business applications that require sustained IOPS peformance, or more than 16,000 IOPS or 250 MiB/s of throughput per volume. Large database workloads.
Cold HDD (sc1): baseline throughput - 12 MB/s per TB. Throughput-orientated storage for large volumes of data that is increquently accesses. Scenarios where low storage cost is important. Cannot be a boot volume.
Throughput Optimized HDD (st1): baseline throughput - 40 MB/s per TB. Streaming workloads requiring consistent, fast throughput at a low price. Big data. Data warehouses. Log processing. Cannot be a boot volume.
Magnetic (standard): old default for root volumes.

Delete on termination

A flag for EBS volumes that will determine whether volumes are deleted with instances.

Migrate an EBS volumes between AZs

Take a snapshot and then restore it within the other AZ.

AMIs can be created from an EBS volume

EBS snapshots

Stored on S3.
Snapshots are incremental.
Only the last snapshot in a chain is necessary for a restore.
Lifecycle manager: automate creation and deletion of snapshots on a schedule.
Volume should not be writable when snapshots are taken.

CloudFront

There are two types of edge cache: regional edge caches and edge locations.
Edge locations are far more numerous, however, have smaller caches.
Regional edge caches have larger caches.
Caches have a TTL.

Origin

Where the data is coming from, e.g., S3 or a custom origin such as an S3 static website or EC2 instnace.

Distributions

Web distribution: static or dynamic content. HTTP or HTTPS. Add/update/delete object + webforms. Real-time live streaming.
RTMP distribution: uses Adobe Flash Media RTMP protocol, can play media files before they are downloaded, and must use an S3 as origin.

Static websites

S3 bucket names must be the same as the target domain name.
Restricting Bucket Access - stop users accessing a bucket directly.
Restricting Viewer Access - force users to use CloudFlare.

Setup

Alias to the CloudFront distribution.
CloudFront adds policies, it doesn’t delete them. Public access ACLs may need to be removed explicitly.

Notes

Possible to remove individual items from the cache. This incurs a charge.

IAM

Authentication methods

Access keys: access key ID and secret access key. Used for programmatic access. API. Optional MFA on API calls. Create, modify, view or rototate access keys. Secret access keys are only returned on access key creation. Users can be given access to change their own keys through IAM policy (not from the console). Access keys can be disabled.
IAM user: has a password. Logs into the AWS Management Console. Users can be allowed to change their own passwords. Password changes can be allowed selectively by disabling this option for all users, they adding an IAM policy to grant permission for selected users.
Signing cert: an alternative for some AWS services. SSL/TLS cert. Recommended to use AWS Certificate Manager (ACM) to provision, manage, and deploy these certs. Use IAM only when you must support HTTPS connections in a region that is not supported by ACM.

S3

Theory

S3 is object storage.
There is no concept of file hierarchy.
There is an analogy to directories. However, under the hood, directories don’t exist.
S3 is external to VPCs.
S3 is accessed via a RESTful API or via the AWS CLI.

URLs for accessing buckets

http://BUCKET.s3.AWS-REGION.amazonaws.com http://AWS-REGION.amazonaws.com/BUCKET

There is some variation in the syntax of AWS-REGION.

S3 object fields

Key Value Sub-resources Version ID Metadata Access control information

S3 Gateway Endpoint

An S3 Gateway Endpoint can be used to connect a VPC to S3 directly from within AWS.
Gateway Endpoints are created in the Endpoint section of the VPC page.
Gateways need to be added to a specific route table to function.
However, by default a routing table entry is added for S3 Gateway Endpoints.

Access controls

Access can be limited to S3 Buckets using either policies or ACLs.
Policies are either identity-based or resource-based.
An Amazon Resource Name ARN is a unique identifier for Amazon Resources.
Buckets have ARNs; these are needed to identify a bucket when implementing access controls.
Use identity-based policies if the same access needs to be granted across multiple buckets.
Resources can be locked such that they can only accessed via a specific web page using a referrer condition. This is useful for ensuring public S3 assets can only be accessed from a specific web page.
There is a policy simulator to test policies.
There is an access advisor in IAM.
There is a policy generator at awspolicygen.s3.amazonaws.com.

Identity-based policies

Apply to users, groups, or roles.
In-line policies are attached directly to a role or user; they cannot be re-used.
Standalone policies exist independently and can be applied to one or more groups.

Resource-based policies

Apply to S3 Buckets.

Example policy

{
    "Version": "2012-10-17",
    "Statement": [
      {
    "Sid": "SeeBucketListInTheConsole",
    "Action": ["s3:ListAllMyBuckets"],
    "Effect": "Allow",
    "Resource": ["arn:aws:s3::::*"]
      }
    ]
}

Allow (effect) listing of all buckets (action) for any resource on S3 (resource).

ACL

Somewhat legacy. Only use ACLs if functionality cannot be provided by IAM.
ACLs can be applied both at the bucket and object level.
ACLs apply to AWS accounts or S3 predefined groups. Groups are authenticated users, all users, or log delivery group.

Permissions	When granted on a bucket	When granted on an object
READ	Listing the objects in the bucket	Read the object data and metadata
WRITE	Create, overwrite, and delete child objects	N/A
READ_ACP	Read the bucket ACL	Read the object ACL
WRITE_ACP	Write the bucket ACL	Write the object ACL
FULL_CONTROL	All of the above	All of the above (except N/A)

When to use Object ACLs

When managing access for objects when you’re not the Bucket owner.
When permissions need to be managed at the object level.
When each object has a different permissions (PoLP).

When to use Bucket ACLs

To grant write permission to the S3 Log Delivery group.

Multi-part upload

Can be done manually or automatically through the AWS CLI.
For 100 MB+ files.
Up to 5GB files can be uploaded.
Benefits: improves throughput; pause/resume support. Upload as an object as you create it.
Use aws s3 where possible rather than lower-level commands.

Query string authentication

Generate a URL that allows the download of an S3 object.
URLs can be generated manually or programatically.
A URL lifetime can be specified. The maximum lifetime is seven days.
aws s3 presign FILE --expires-in SECONDS can be used to generate a query string.

Transfer acceleration

Upload to a an CloudFront end-point rather than straight to S3.
Can help in high latency situations. The benefits can be be variable.
There is a cost to using transfer acceleration.
Enabled per bucket.
To upload using transfer acceleration, different URLs are required:

http://BUCKETNAME.s3-accelerate.amazonaws.com IPv4 http://BUCKETNAME.s3-accelerate.dualstack.amazonaws.com IPv6

Versioning and MFA Delete

By default, a bucket does does have versioning enabled.
Once enabled, versioning cannot be disabled, only suspended.
MFA delete adds additional security such that MFA is required for deletion, reversion, and toggling of versioning on buckets.

Cross region replication

Asynchronously sync buckets in different regions.
S3 stores data across multiple geographically distant AZs by default.
Why: compliance requirements, latency, operational reasons (e.g. clusters in different regions), and to maintain object copies under different ownership.
Requirements: versioning enabled, source and destination must be in different regions, and permissions must be granted.
Objects uploaded before CRR is enabled and not enabled.

Lifecycle management

Lifecycle management can automatically move data between different storage tiers or delete them.
Transition actions: move between data classes.
Expiration actions: delete objects.

Encryption

Can be specified per bucket.
Object headers can override bucket encryption settings.

SSE-S3 Server-side encryption with S3 managed keys. S3 manages keys. Unique key per object. AES256. Master key encrypts the object key. Encryption/decryption occurs server-side. SSE-KMS Server-side encryption with AWS KMS managed keys. KMS manages keys.

                              Can use customer supplied keys.
                              Encryption/decryption occurs server-side.  SSE-C                        Server-side encryption with client provided keys.
                              Encryption/decryption occurs server-side.
                              Keys are completely managed by the client.  Client-side encryption

Events

Notifications when specific events occur.
PUT, POST, COPY, deleted, etc
Send to SNS Topic, SQS Queue, or Lambda Function.

Requestor pays

(For data transfer and request charges.)

Must have an Amazon account and be authenticated.

Server access logs

Recommended to use different buckets.
Enabled programmatically or through the AWS console.
Best effort.

Object lock

Prevent objects from being deleted.
Can only be enabled when a bucket is created.
Requires versioning.
In compliance mode, objects cannot be deleted even by root users.

Select and Glacier Select

A SQL query can be used to look inside an archive, to extract only necessary data.

Security Token Service

Enables the request and provisioning of temporary, limited-privilege credentials for IAM users, or for authenticated users (i.e. federated users).
By default, STS is a global service and all requests go to a single endpoint at https://sts.amazonaws.com.
All regions are enabled by default but can be disabled
The region in which temporary credentials are are requested must be enabled.
Credentials will always work globally.