AWS Stateful vs Stateless

Stateful vs Stateless

AWS Security Group is Stateful and ACL is Stateless, when we open any port in Security Group (Inbound) the same port will get opened in the Outbound and vice versa, the same is not true for ACL, even when you open any port in Inbound, you will need to explicitly open the same in outbound, that’s why ACL is Stateless.

EC2 – SSH access – Permission denied (publickey)

Error ec2-user@10.0.0.10: Permission denied (publickey).


Issue : We recently got an issue where we were not able to access the EC2 instance through SSH and got permission denied error (publickey).

The permissions of the pem file was correct, i.e 600. Upon further investigating the issue, we found the issue was with ownership of /home/ec2-user/.ssh/authorized_keys file, by default the file ownership should be ec2-user:ec2-user or ubuntu:ubuntu based upon the OS you are using. In our case the ownership of the file was changed, which blocked our access to ssh on the ec2 instance.

Solution: There are multiple fixes for such issues,

  • Access the ec2-instance through SSM, session manager through amazon console and update the ownership.
  • Run the SSM command on the respective instance to update the ownership of impacted file.

In both above solutions, we need SSM agent to be installed on the impacted instance, in our case the impacted instance didn’t had SSM agent installed.
To fix the issue, we used the below approach as we cannot use SSM command or session manager on the impacted instance.

  • We took snapshot of the impacted instance volume.
  • Stop the instance and update the user_data of the impacted instance with below details,

Content-Type: multipart/mixed; boundary=”//”
MIME-Version: 1.0

–//

Content-Type: text/cloud-config; charset=”us-ascii”
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=”cloud-config.txt”

# cloud-config
cloud_final_modules:
– [scripts-user, always]

–//
Content-Type:
text/x-shellscript; charset=”us-ascii”
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=”userdata.txt”

#!/bin/bash

chown root:root /home
chmod 755 /home
chmod 600 /home/ec2-user/.ssh/authorized_keys
chown ec2-user:ec2-user /home/ec2-user -R

–//

  • The above content will update user_data as well the cloud-config file.
  • The cloud config file is located at /var/lib/cloud/instance/cloud-config.txt
  • We also need to ensure that, all exiting user data are deleted or cleaned, as the above config update may trigger to execute the existing as well the new user_data, thus corrupting your application.
  • The old user data can be located at, /var/lib/cloud/instance/scripts/
  • On certain instance you may see symlinks with the instance id at the above script location. The user data are stored in files name’s part_001 etc.
  • If you don’t have any user data, just start the instance again and the new user_data will kick-in and update the ownership of /home/ec2-user/.ssh/authorized_keys file and thus allowing us to access the instance through ssh.

Follow the below steps only when you have user data impacting your application.

  • To delete the user data, create one more instance, and attach the impacted instance volume to the new instance.
  • Note- the new instance and the snapshot volume should be in the same availability zone.
  • Access the new instance through SSH, mount the volume using below command,

lsblk (will list all the drives)
mkdir /mnt
mount /dev/xvdf /mnt

  • Access the existing user_data at /var/lib/cloud/instance/scripts/, delete or move it some other location.
  • Detach the instance and reattach it to the impacted instance and start the instance.
  • The new user_data to update the ownership will kick in and will update the ownership of /home/ec2-user/.ssh/authorized_keys file, thus allowing you to login to the impacted instance.

ElastiCache

  • ElastiCache is a web service that makes it easy to deploy, operate and scale in-memory cache in the cloud.
  • ElastiCache improves the performance of the application by allowing retrieving the information from the fast, managed in-memory cache then the slow disk based database.
  • ElastiCache can improve the latency and throughput for the read-heavy application workloads or compute intensive work loads.
  • Caching improves application performance by storing critical piece of data in memory for low latency access.
  • ElastiCache has to be used with RDS
  • ElastiCache is very good choice if your database is particularly read-heavy and not prone to frequent change.
  • Redshift is very good for OLAP transactions.

Types of ElastiCache

  • MemCached
    • A widely adapted memory objects cache system. ElastiCache protocol is complaint with the MemCached.
    • All tools that work with MemCached environments will work in ElastiCache.
    • MemCache supports Multi-thread
    • MemCache doesnt have Multi-AZ capacity
  • Redis
    • A popular open-source in-memory keyvalue store that supports data structure such as sorted sets and list.
    • It supports Sorted set and list.
    • ElastiCache supports Master-Slave replication and Multi AZ which can be achieve cross AZ redundancy

Caching Strategy

Two types of strategies are available

  • Lazy loading
  • Write Throughput

Lazy Loading

  • Loads the data in cache only when required.
  • If the data is in cache, Elastic Cache will return the data or else it will return Null.
  • Lazy loading with TTL
    • Specify the number of seconds until the key or data expires to avoid keeping stale data in cache.
    • Lazy Loading treats an expired key as a cache miss and causes the application to retrieve the data from the database and write it back to the cache with a new TTL.
    • Lazy Loading does not eliminate the stale data but to avoid the stale data.

Write Through Cache

Write Through Cache adds and updates the cache whenever the data is written to the database.

DynamoDB

  • DynamoDB is fast and flexible NoSQL DB. It can be used for all application that needs consistent, single-digit millisecond latency at any scale. It is fully managed database.  And supports both document and key-value data model.
  • Stored in SSD storage.
  • Spread across three geographically distinct data centers.
  • Eventual Consistent Read (Default).
    • Consistency across all copies of the data is usually reached within 1 second. Repeating a read after a short time should return an updated data.
  • Strongly consistent Read.
    • Returns a result that reflects all writes that received successful response prior the read.
  • DynamoDB pricing
    • Provisioned Throughput capacity.
      • Write Throughput $0.0065 per hour for every 10 units.
      • Read Throughput $0.0065 per hour for every 50 units.
    • Storage cost of $0.25GB per month.


Aurora

  • Aurora runs only in AWS infrastructure.
  • Aurora is MySQL compatible, and can provide 5-times better performance than MySQL.
  • Aurora can deliver High Availability and performance.
  • Aurora provides better Scaling.
  • Aurora starts with 10GB and can scales in 10GB increments to 64 TB (Storage Auto Scaling)
  • Compute resource can upscale to 32v CPUs and 244 GB of Memory.
  • Aurora maintains two copies of the data in each AZ and maintains 3 AZ, so total 6 copies of the data.
  • Aurora is designed to handle transparently the loss of up to two copies of data without impacting the write availability and up to three copies without impacting the read availability.
  • Aurora is a self-healing, Data blocks and disks are continuously scanned for errors and repaired automatically.
  • Two types of Replicas for Aurora
    • Aurora Replicas (currently 15)
    • MySQL read Replicas (Currently 5)
  • Replicas priority login works in Tier 0 > Tier 1 > …. > Tier 15.
  • DB Cluster Identifier will be the DNS end point for the database instance.
  • Replication instance will have the DNS name as end point. In case of failure of the Cluster end point, it will automatically use the replication instance end point, no need to update the connection string.

AWL CLI Paging

  • Paging controls the number of output shown on the CLI prompt, when we run the cli commands.
  • The default page size is 100
  • If you run a command which has 300 object, CL will make three different API calls but the output will show all in same cmd window as a single API. You may need to pass the next token value depending upon the command you run.
  • In certain cases if the object exceeds 1000 limit, we may get time out error.
  • To fix this error we will use the flag –page-size and provide the number less than the limit.
  • Still the CLI will fetch all records but with more number of API calls fetching records in smaller number.
  • Use –max-item flag to limit the CLI output

CLI paging commands

As shown below example, we are using paging to fetch the s3 bucket list.

  • aws s3api list-objects --bucket <YOUR_BUCKET_NAME>
  • aws s3api list-objects --bucket <YOUR_BUCKET_NAME> --page-size 5
  • aws s3api list-objects --bucket <YOUR_BUCKET_NAME> --max-items 1

VPC – ACL (Access Control List)

VPC by default has one ACL (Network) and it allows all inbound and outbound traffic.

You can create a custom network ACL, by default custom network blocks all inbound and outbound traffics, until we add the rules.

Each subnet in VPC is associated with the ACL, if we don’t associate a subnet with network ACL, the subnet will automatically have associated with the default network ACL.

You can associate a network ACL with multiple subnets, however, a subnet can be associated with only one network ACL at a time, when you associate a network ACL with a subnet, the previous association is removed.

a network ACL contains a numbered list of rules that are evaluated in order, starting with the lowest numbered rule.
ACL has separate inbound and outbound rules, each rule can either allow or deny traffic.

Network ACLs are stateless, responses to allowed inbound traffic are subjected to the rules for outbound traffic and vice versa.

Note – Ephemeral rules, as why we cannot browse the application even when we have the inbound and outbound traffic is enabled for port 80/143

The rule works based on hierarchy like 100 will be executed prior to 110. That is just a number preference set for the rule.

Terraform Module

Modules are defined with module blocks. They are used for the reusability of the code, suppose you have a stack that can be re-used.  In such a case instead of copy-paste the same resource code, again and again, we define a module.

Any folder containing a configuration file is by default considered as a module in Terraform.

These modules are referenced in the module code block.

Modules do contain input and outputs.

The module looks similar to resources, they just don’t have types. Each module must have unique names, in the configuration. Modules only have one required parameter, i.e. source.

Modules can be stored in local as well as remote locations,

  • Github
  • Local
  • Terraform Registry
  • Gitbucket
  • HTTP URL
  • S3 Bucket
  • GCS Bucket

Note – You may need to run the terraform init command to call the module file. In case of any update in the module file, you will need to run terraform init command in order to get those module updates reflected in your terraform configuration, failure to do so will result in the below error.

$ terraform init

Terraform Destroy

  • Destroy command will be used to destroy the created configuration.
  • You can use target flag to specify which resource you want to delete.
  • $ terraform  destroy – target < flagname>
  • You can also pass the -out flag to pass the output to a file.
  • $ terraform destroy –out <filename.tfplan>

Terraform Graph

  • Terraform creates a dependency graph when we execute the plan command, you can view the dependency graph using the graph command.
  • $ terraform graph
  • You can export the graph data using command, these graphs can be visualised using tools graphviz.
  • $ terraform graph > <filename>.dot
  • The graph will be displayed in below format,
  • You can also run dot command; dot command is part of the tool. The output will be saved in svg format.