Recap of AWS re:Invent 2021: An Honest Review

Every year, we look forward to AWS re:Invent keynotes and the announcements. When running on AWS, every announcement might be a new way for you to improve operational burden, reduce costs, delete some code. Cloud was never about just running virtual machines on someone else's computers; it's about using building blocks to deliver business value; whether it's a high-level or low-level building block, whenever it makes sense. Every code you write, any asset you host is a liability to your business. You need to do an objective, systematic assessment of the pros and cons of using or not using cloud services for your business. This is why are writing this blog post. We want to see and share what is new, in detail, so we can make use of them to improve our business and services.

Computation

Announcing new Amazon EC2 C7g instances powered by AWS Graviton3 processors‍

The ARM architecture is making a hit in those two years! If you had the chance to work with an Apple M1 computer, you were probably shocked by how silent, cool, and fast they are. The same goes for ARM in the Cloud. Amazon acquired Annapurna Labs in 2015, which is now the source of Graviton chips. The first Graviton chip was announced in 2018, but it was not a big hit. Then came the Graviton2, which improved many of the worldwide workloads in terms of pricing and performance. If you could move your workload to ARM, you probably would have immediate benefits. Go and Rust can cross-compile to ARM; ARM JDKs exist, even .NET supports ARM. Maybe the trend to switch to ARM started to proceed faster with the general availability of Raspberry Pi's, but I'm glad we're here now!

And Amazon announced Graviton3 chips and new types of instances based on them. Compared to the previous generation, they are %25 faster, support DDR5 memory, which has %50 more bandwidth than DDR4. The new chips are also 2x quicker for cryptographic operations and 3x for CPU-based ML operations! You need to request access to use the new C7g instances. I'm sure M7g and R7g will follow next year.

Announcing new Amazon EC2 Im4gn and Is4gen instances powered by AWS Graviton2 processors‍

On the other hand, there are also lm4gn and ls4gen instances (the instance family names are getting weirder, right?) based on Graviton2, which are high-density storage instances. They can support up to 30TB of local NVMe SSD, up to 38 Gbps EBS bandwidth when local storage is not enough, and up to 100 Gbps network bandwidth. If you have specialized applications that can benefit those specs, the new instances are welcome improvements to i3 instances for considerable price-per TB and CPU improvements.

Announcing data tiering for Amazon ElastiCache for Redis‍

Many people use Elasticache Redis with replicas for availability and data partitioning for scalability. However, it can be costly. Two replicas, with keyspace divided into 4 slots means 3 x 4 = 12 instances. Your monthly bill can be a big cost item if your data footprint is also significant. This new feature allows you to use the attached local SSD disk transparently in r6.gd instances, with very low access latency. The SSD to Memory ratio is around 7.4x, so if you can face the latency penalty and usually access a small portion of data, it can be a considerable improvement to your bill.

Announcing Amazon EC2 M1 Mac instances for macOS

AWS had announced Mac instances last year; now, they have M1 instances. They are physical Mac Minis in data centers, connecting to AWS Nitro System to provide a 10Gbps network and 8 Gbps EBS bandwidth and behave like a regular EC2. However, there are two essential bits you need to factor in before clicking the Launch Instance button. First, Apple's licensing restricts the usage of a Mac to a minimum of 24 hours. So, if you need a Mac for iOS builds during an 8-hour workday, you still pay 24 hours. Second, M1-based Macs have 0.65$/hour pricing, which results in $15 per day. A 16GB M1 Mac Mini costs $899, meaning it is equivalent to around 60 days of EC2 Mac usage. There are cheaper alternatives on the internet, such as Scaleway or Mac Stadium, but they don't have EBS, VPC, or IAM support. You need to assess if these features are worth this price.

New – AWS Outposts Servers in Two Form Factors‍

If you cannot go to the Cloud for some reason (compliance, latency), AWS can bring it to you with Outposts. In the original release were industry-standard 42U racks with a lot of capacity. Right now, there are 1U or 2U configurations that you can put in an existing rack in your branch offices. The new servers also support Graviton2 and can go as low as 547$ per month. Of course, this is not for everyone, but the seamless connection to AWS, including VPC, IAM, KMS, EBS, and running locally with extremely low latency can be beneficial compared to having custom hardware and software.

Announcing Pull Through Cache Repositories for Amazon Elastic Container Registry‍

As Docker Hub got famous, it started serving millions of images. It was no surprise that Docker wanted to restrict free usage because it's a bandwidth hog. Additionally, relying on external services in your AWS account to download software artifacts always bothered me. We favor replicating the Docker Hub images to our own ECR to vet the images and use them reliably without external access and even in private VPCs. To address this common need, the ECR team announced pull-through cache repositories that allow ECR to cache public images for you instead. However, at the launch, it only supports Quay.io and ECR Public. I hope to see Docker hub here, but it might be tricky due to licensing.

Update: I have been informed by r/aws that official Docker images (the ones under library/) are replicated on ECR Public, which addresses most of the issues, as they are probably the ones mostly used.

Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler‍

If you are running EKS or a custom Kubernetes cluster on AWS, you probably have a cluster-autoscaler to accommodate spikes in your workload. However, it's a restrictive tool and does not intelligently add instances. Karpenter also evaluates scheduling constraints and adds and removes instances more intelligently and faster. See this comparison video by Justin Garrison to understand how it differs from cluster-autoscaler.

Storage

Announcing the new Amazon S3 Glacier Instant Retrieval storage class - the lowest cost archive storage with milliseconds retrieval and Amazon S3 Glacier storage class is now Amazon S3 Glacier Flexible Retrieval; storage price reduced by 10%, and bulk retrievals are now free.

For many reasons, we generate bazillion data, and S3 is a great place to store them. But depending on your needs, it might be expensive. That's why there are many storage classes for different access patterns. S3 Glacier is dirt cheap but charges a significant sum for retrievals, and it and data arrives late. Now, another tier called Glacier Instant Retrieval is a bit more expensive but has millisecond access time. It can be desirable if you have ambitious disaster recovery objectives. Head over to the S3 Pricing page to calculate. And there has been a %31 discount for S3 Infrequent access and S3 One-Zone infrequent access if you are interested.

Amazon S3 console now reports security warnings, errors, and suggestions from IAM Access Analyzer as you author your S3 policies.

S3 policies are extensible and flexible; however, they can be hard to write. This new console and API feature guides you to write better policies and validate them as you write.

Serverless

Announcing AWS Graviton2 Support for AWS Fargate – Get up to 40% Better Price-Performance for Your Serverless Containers.

We already emphasized Graviton's great benefits, and today it's available for Fargate. If you can rebuild your container for ARM, or even better by using multi-arch builds, you can compare if the switch is worth it. There is a high chance you would get more performance per $, but of course, YMMV. You can even build ARM images in your non-ARM computer with docker buildx.

AWS Lambda now supports event filtering for Amazon SQS, Amazon DynamoDB, and Amazon Kinesis as event sources‍

With a syntax very similar to SNS event subscription filters, we can filter the payloads of DynamoDB, SQS, and Kinesis event sources of a Lambda function. This filtering feature can simplify the computation, operations and save you from the associated costs of unnecessarily invoking your lambda functions.

Announcing Amazon Kinesis Data Streams On-Demand
Introducing Amazon MSK Serverless in public preview‍

MSK and Kinesis now have on-demand or Serverless modes. At first sight, these announcements seem very important. But we need to address the elephant in the room. Although they are now on-serverless, there is a constant cluster fee to all of them. The on-demand pricing is unlike Lambda, DynamoDB, or Firehose, where your throughput and storage exactly is cost. For instance, Kinesis Data streams on-demand mode does not charge 0.15$/hr per shard but charges a flat 0.4$/hr. Serverless MSK (Managed Streaming for Kafka) charges a flat 0.75$/hr for a cluster free, compared to selecting how many and what type of instances you need. So, beware that it's not feasible to scale to 0 in On-Demand Kinesis and Serverless MSK. However, they can be considered significant improvements. Now, you don't need to calculate how many shards you need and manually scale them or automate them on your own with Cloudwatch metrics. And for MSK, you don't need to select an instance type, the number of instances; you worry about how many partitions you have in your topic.

Announcing Amazon Redshift Serverless (Preview)
Introducing Amazon EMR Serverless in preview‍

On the other hand, Redshift and EMR Serverless announcements are better. Redshift clusters are charged per hour as you need them and can be paused and resumed as you wish, meaning only at-rest storage is charged for the backups. Although it's not easier to use as BigQuery, it's a considerable billing and operational improvement for many teams out there. Similarly, EMR now charges per vCPU and memory used throughout the execution. You can configure the CPU/Memory ratio per worker and let the EMR autoscale your cluster per the needs of the job.

Amazon DynamoDB announces the new Amazon DynamoDB Standard-Infrequent Access table class, which helps you reduce your DynamoDB costs by up to 60 percent.

Well, it might be a surprise, but I consider DynamoDB to be one of the best serverless services, along with SQS. It's dead cheap and straightforward to store simple information with on-demand mode, and if you are proficient in data modeling, it can scale to TBs of data for many applications. This announcement introduces a new pricing scheme for infrequently accessed data, cheaper to store but more expensive to query. So you need to do some maths to assess if it's for you.

Amazon SQS Enhances Dead-letter Queue Management Experience For Standard Queues‍

We love SQS at Resmo. DLQ (Dead-letter Queue) is useful when a specific message is unprocessable for the time being. Still, eventually, it needs to be reprocessed due to a bug fix or when the intermittent error is no more. However, to redrive a message (sending from DLQ to back to original queue) was a manual process, or most of us had custom code to perform reading from DLQ and pushing it back to SQS. Now, the AWS console has a new feature for this purpose, which can efficiently redrive your messages in DLQ. It always feels to delete some code with new announcements.

AWS Lambda now supports partial batch response for SQS as an event source‍

You can now say 3 of the messages have failed and let Lambda internal SQS poller know that instead of failing all, causing unnecessary double receives. As you can configure to receive 10,000 messages in an invocation, failing all just because a few of them did not actually make sense and made using the SQS event source in Lambda harder.

Security, Compliance

Amazon DynamoDB now helps you meet regulatory compliance and business continuity requirements through enhanced backup features in AWS Backup.

Compliance requirements seem hard, but all of them have value. What happens if your account is hacked or inaccessible, and now all your data is lost? The best practice is to move your data to another more protected place. DynamoDB backups are compatible with AWS Backup and can be replicated to your dedicated backup, data vault accounts. Most people routinely export the data and streams to S3 and manually copy them to other buckets and accounts. I'm sure this implementation is similar; because we lose the instantaneous backup capabilities if the cross-account backup is enabled.

Amazon SQS Announces Server-Side Encryption with Amazon SQS-managed encryption keys (SSE-SQS)

SQS now supports server-side encryption, just like S3. Although highly reliable, this eliminates the need to use KMS, still an external and costly service. SQS queues with KMS encryption have a KMS key reuse period configuration to avoid making an API call to KMS for each message to reduce cost and complexity. Now, it's unnecessary, and the data is encrypted by default!

AWS announces the new Amazon Inspector for continual vulnerability management‍

Scanning workloads for software vulnerabilities and network exposure is a must for your security posture and compliance requirements. Improvements to Amazon Inspector service are AWS organizations for central management, scanning for ECR images, replacing Inspector agent with SSM agent, and a new way to calculate risk score with contextual information.

Amazon S3 Object Ownership can now disable access control lists to simplify access management for data in S3.

ACL in S3 predates IAM launch in 2011. If you stumbled upon it, you might see its dated technology. And combined with bucket policies, IAM policies, VPC policies, it had some overlaps and was often confused. Owners can completely turn off the ACL feature, so every object in the bucket becomes the owner, and all other policies are applied instead of ACLs.

Announcing preview of AWS Backup for Amazon S3‍

AWS Backup has become the go-to place for all backup needs. Because it abstracts all the other services with backup plans that apply to 12 services, including S3, EC2, RDS, and DynamoDB, Backup can probably handle it now if you have data on AWS. We used manual work to back up data, replicate it by regions and account, and ensure the contents are not temperable, and AWS Backup simplifies that. It's also a great single resource for your compliance needs! And today, it also supports S3, a vital data asset for many companies.

Network

Amazon CloudFront now supports configurable CORS, security, and custom HTTP response headers.

CloudFront is an excellent service; however, it lacked the essential ability to set custom headers for years! For instance, it seems pretty straightforward, but if you were serving static files from S3 and wanted to set CORS headers, you had to use another layer (API Gateway, ALB, EC2..), eliminating some of the benefits of CloudFront. Glad to see this feature land at last!

Application Load Balancer and Network Load Balancer end-to-end IPv6 support
Amazon Virtual Private Cloud (VPC) customers can now create IPv6-only subnets and EC2 instances
AWS launches NAT64 and DNS64 capabilities to enable communication between IPv6 and IPv4 services‍

There have been IPv6-focused announcements around re:Invent this year. Although why the world can't abandon IPv4 in-favor of IPv6 is another topic that has deep discussions, these new features will allow the adoption of IPv6 in AWS and the world more appropriately. For instance, ALBs could be configured in Dual Stack mode to receive IPv6 traffic, but today you can configure IPv6 target groups and let the traffic flow in IPv6 mode in your internal network.

Announcing preview of AWS Private 5G‍

While this announcement is not for everyone, it's fascinating. AWS can now deliver you radio units, 5G access software, SIM cards and maintain them, so you can set up your own private 5G network that is also scalable to additional devices and throughput. Next-gen IoT applications and colossal deployments can make use of Private 5G.

Introducing AWS Cloud WAN Preview‍

When announced, Transit Gateway simplified a lot of complex networking. To simplify even more, AWS released Cloud WAN that helps you to build global networks without diving into low-level details. It's a central dashboard and can impose policies from one place. It allows you to select core edge locations, segments and have your VPCs routable by them. Check this blog post for technical details!

Amazon Virtual Private Cloud (VPC) announces Network Access Analyzer to help you easily identify unintended network access.

Last year AWS announced VPC Reachability Analyzer to test whether two network endpoints would communicate. Network Access Analyzer moves this capability much further. It allows you to ask more high-level questions to prove whether your requirements are met, such as databases should not be exposed to the internet; Production and Staging segmentation is in-place. Considering there are many network services at AWS resulting in too many ways to connect, automated reasoning is a gift.

Amazon Virtual Private Cloud (VPC) announces IP Address Manager (IPAM) to help simplify IP address management on AWS.

If you have massive networks spanning regions, accounts, offices, management of network address space becomes an issue. There are external vendors, and AWS also announced its IPAM solution for VPCs. See this blog post to see how it looks, which can simplify managing massive networks.

Development

Kotlin, Swift, and Rust SDK previews for AWS

AWS SDKs were released as a preview for these platforms. Don't miss it!‍

Announcing General Availability of Construct Hub and AWS Cloud Development Kit (AWS CDK) Version 2‍

I'll be honest here. We are not a big CloudFormation fan, primarily because of unacceptable delays on some resources that would result in patches we would not like to maintain ourselves. We are using Terraform to provision our resources. However, the CDK was helpful to bring CloudFormation to the masses with the languages that they are already familiar with to provision resources. The extensibility of programming languages paved a more innovative way. Version 2 of CDK is an improvement in developer productivity. AWS libraries are no longer separate modules, making dependency management much more manageable. You don't need to install S3 and KMS packages separately. The Construct Hub is a place for the community to share and discover reusable CDK libraries and modules.

Machine-Learning

Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts‍

Machine Learning is everywhere. And SageMaker is sure to make it easier every day. SageMaker Canvas is a visual service for analysts who can upload their data, follow the directions, build ML models, and create predictions. It's a great starting point if you are new to ML!

Amazon SageMaker Studio Lab (currently in preview), a free, no-configuration ML service‍

An alternative to Google Colab, SageMaker Studio Lab is based on JupterLab, and an open environment to install whatever additional library you require, and it's free! To try, head to its home page!

Announcing Amazon DevOps Guru for RDS, an ML-powered capability that automatically detects and diagnoses performance and operational issues within Amazon Aurora‍

DevOps guru is a Machine Learning assisted technology that can examine AWS Config, CloudWatch, CloudFormation, and X-Ray to discover operational issues to give you insights on abnormal activities. Today, it also supports RDS, which can explain why your database has performance issues and how queries hurt performance by analyzing database logs, metrics, and traces. Running relational databases can be an operational burden and a source of performance bottlenecks, even with Aurora. This feature can be helpful to identify issues.

Monitoring

Introducing Amazon CloudWatch Metrics Insights (Preview)‍

If you have many resources, you also have that many metrics in Cloudwatch. Not everything requires an alarm, and it can be costly. However, occasionally, you need a quick answer to your metric queries, but investigating thousands of metrics is not feasible in CloudWatch; even the UI allows you to draw at most 100 metrics simultaneously. One way is to export your metrics to another system such as DataDog, SignalFx, and CloudWatch Metric Streams can be a bit more cost-efficient and with less lag.

This new feature allows you to query 10,000 different CloudWatch metrics with a SQL-based query engine; you can run top-N queries to find out top-10 queues with oldest messages, longest-running Lambda functions, busiest EC2s with a straightforward query. There are many more examples in the documentation.

Amazon CloudWatch Evidently – Experiments and Feature Management‍

This service feels like a SaaS; it allows you to run experiments and feature management by hosting features, aka flags, so that you can run your A/B tests, experiments, rollouts without the need of an external service like LaunchDarkly.

Real-User Monitoring (RUM) for Amazon CloudWatch‍

RUM is another SaaS-like service from CloudWatch that monitors and collects web page load and render performance. The biggest reason to use this service instead of commercial providers would be X-Ray integration to correlate the issues with actual backend operations. It also allows sampling in case cost becomes an issue.

Nice!

This category was hard to classify, but we wanted to shout out some practical developments!

AWS Amplify Studio‍

Amplify Studio is a huge announcement. Although it might be a bit complex for some, AWS Amplify allowed developers to build full-stack web and mobile apps more accessible. It helped create a backend, a GraphQL API, handle user management, consisted of UI libraries for many React and Flutter. Amplify Studio takes it one step further and presents a visual builder for all stages, including importing designs from Figma to exporting to extensible code. It's a good service for the folk who wants to build full-stack applications AWS visually very quickly.

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) now supports checking for blue/green deployment when making configuration changes.‍

I remember, in 2017, changing the IAM settings of an Amazon ES cluster required a complete blue-green deployment, which shuffles all nodes and the data along with it. Nowadays it's better, but you may never know. Even the documentation is unsure what causes a blue-green: Changes that usually cause blue/green deployments.

Although blue-green works reliably, in the end, depending on your data size, it can take very long, and performance impact might not be negligible. This new feature will let you know whether the API call to update your domain configuration would cause a blue-green deployment so you can plan your operation.

Announcing Amazon Athena ACID transactions, powered by Apache Iceberg (Preview)‍

To no one's surprise, we love SQL at Resmo! Athena makes querying arbitrary data on S3 much easier. Athena is based on Presto, a distributed SQL engine that can work on files on S3 (more sources with connectors). Until today, these files were read-only; Athena could only write the query result to another path in S3 with CTAS. This new development allows one to issue INSERT, UPDATE and DELETE commands for row-level operations concurrently, with ACID transactions! This is a surprise but a welcome one. Apache Iceberg powers the new feature. It also supports time travel and recovery of deletes. As the underlying data in S3 is immutable, Iceberg keeps track of data and schema changes separately. In short, you can think of querying of WAL (write-ahead log) of a traditional database.

Amazon Athena accelerates queries with AWS Glue Data Catalog partition indexes.

If you have thousands of indices in Athena, you will see a performance penalty. If you use hour-based or another high cardinality field(s) as partitions, this new feature will allow filtering partitions by indexing in Glue Data Catalog to make query planning and execution faster.

AWS Identity and Access Management now makes it more efficient to troubleshoot access denied errors in AWS‍

Access Denied! Why? There can be many reasons in AWS. With this new feature, error messages have additional details to discover which policy is responsible for the deny, whether it's SCP, attached, or inline. One caveat, though, the new feature has started with limited services, SageMaker, CodeCommit, and SecretsManager, hope it will reach other more popular services soon!

AWS CloudTrail announces ErrorRate Insights‍

As our AWS footprint enlarges, keeping track of all running software can be tricky, especially if there is no good observability. Unless you have exported the CloudTrail logs to your logging pipelines such as Splunk and Elasticsearch and queried among errors, you might have no idea. The lingering errors cause both instabilities and sometimes cost due to forever retries. This feature helps you to discover error trends among API calls so that you can take action.

‍

SaaS Sprawl 101