Rafael Salerno - DevOps Engineering: junho 2019

quinta-feira, 13 de junho de 2019

What should we know about AWS Networking?

I did and I used this summary with some bullets and some parts of text extracted from AWS papers and from come online courses like cloud guru, linux academy, udemy. To study concepts about AWS Networking.

I hope it could be useful to someone.

OSI Model:

AWS Responsibility:

Physical (CAT5, fiber optical cable)
Data Link (MAC)

Customer Responsibility :

Network (IP,ARP)
Transport (TCP)
Session (Setup, Negotiation,Teardown )
Application (web browser)

Unicast: Communication in which a frame is sent from a host and addressed to a specific destination.

Multicast: Communication in which a frame is sent to a specific group of devices or clients.

Broadcast: Communication in which a frame is sent from one address to all other addresses.

TCP:

connection based, stateful, acknowledge
After everything I say, I want you to confirm that you receive it
Ex: web, email, file transfer

UDP:

Connectionless, stateless, simple no retransmission delays.
I’m going to start talking and its ok if you miss some words
Ex: Streaming media, DNS

ICMP:

Used by network devices to exchange info
We routers can keep in touch about the health of the network using our own language
Ex: traceroute, ping

Ephemeral Ports:

Short-lived transport protocol used in IP communications

Above the "well-known” IP ports (above 1024)

Dynamic ports

Suggested range is 49152 to 65535 but

Linux kernels generally use 32568 to 61000

Windows plataformas default from 1025

NACL and security groups implications

AWS Managed VPN

AWS managed IpSec VPN connection over your existing internet
Quick and usually simple way to establish secure tunnelled connection to a VPC
Redundant link for Direct connect or other VPC VPN
Support to Static routes or BGP peering and routing
Dependent on your internet connection

Limitations:

Network latency, variability, and availability are dependent on internet conditions
Customer managed endpoint is responsible for implementing redundancy and failover (if required)
Customer device must support single-hop BGP (when leveraging BGP for dynamic routing)

Direct Connect

Dedicate network connection over private lines straight int AWS backbone
Used when required a big pipe int AWS, lots of resource and service being provided on AWS to tour corporate users
More predictable network perfomance, potential bandwidth cost reduction, up to 10Gbps provisioned connections, supports BGP peering and routing.
Work with your existing Data network Provider, create virtual interfaces (VIF) to connect to VPCs (privates VIF ) or others AWS services like S3 or Glacier (public VIF)

Limitations:

May required additional telecom and hosting provider relationships and/or new network circuits.

VPN Cloud Hub

Connect location in a Hub and Spoke manner using AWS’s Virtual private gateway.
Used when is necessary has link remote offices for backup or primary WAN access to AWS resources or each others.
Reuses existing internet connection, Supports BGP route to direct traffic for example, use MPLS first then CloudHub VPN as Backup.
Depends on the internet connection, no inherent redundancy.
Assing multiple Customer Gateway to a Virtual Private gateway, each whit their own BGP ASN and unique IP range.

Limitations:

Network latency, variability, and availability are dependent on the internet
User managed branch office endpoints are responsible for implementing redundancy and failover (if required)

AWS Direct Connect + VPN

Use Case: IPsec VPN connection over private lines
Advantages are the same as the previous option with the addition of a secure IPsec VPN connection

Limitations:

Same as the previous option with a little additional VPN complexity

Software VPN

You provide your own VPN endpoint and software.
You must manage both ends of the VPN connection for compliance reasons or you want to use a VPN option not supports by AWS.
Ultimate flexibility and manageability.
You must design for any needed redundancy across the whole chain.
Install VPN via Marketplace appliance or on an EC2 instance.

Limitations:

Customer is responsible for implementing HA (high availability) solutions for all VPN endpoints (if required)

Transit VPC

Software appliance- based VPN connection with hub VPC

AWS managed IPsec VPN connection for spoke VPC connection

Common strategy for connection geographically disperse VPCs and locations in order to create a global network transit center.

Used when locations and VPC-deployed assets across multiple regions that need to communication with one another.

Ultimate flexibility and manageability but also AWS- managed VPN hub-and-spoke between VPCs.

You must design for any needed redundancy across the whole chain.

Provides like cisco , Juniper Network and Riverbed have offering which work with their equipments and AWS VPC.

VPC peering

AWS provided network connectivity between two VPCs.

Used when is necessary multiple VPCs need to communicate or access each other resources.

Uses AWS backbone without touching the internet.

If A is connected to B and B is connected to C, A cannot talk with C via B. (Transitive peering not supported)

VPC peering requested is made, accepter accepter request (either within Account or acros accounts).

Leverages AWS networking infrastructure

Does not rely on VPN instances or a separate piece of physical hardware

No single point of failure No bandwidth bottleneck

PrivateLink

AWS PrivateLink network connectivity between VPCs and/or AWS services using interfaces endpoints.

Used when is necessary keep private subnet truly private by using the AWS backbone to reach other services rather that the public internet.

Redundant: uses AWS backbone

As of October 2018, they can be accessed over inter-region VPC peering.

Create endpoint for needed AWS or marketplace service in all needed subnets, access via the provided DNS hostname.

Limitation:

VPC Endpoint services only available in AWS region in which they are created.

VPC Endpoint

It can be divided into two groups:

Interface Endpoint:

Elastic network interface with an Private IP

Uses DNS entries to redirect traffic.

Could be used on API Gateway, cloudFormation, Cloud watch, etc.

Securing with security groups.

Gateway Endepoints:

A gateway that is a target for a specific route.

Uses prefix list in the route table to redirect traffic.

Cloud be used on S3, DynamoDB.

Securing with VPC Endpoint Polices.

NO VPC Endpoint:

VPC Endpoint :

Internet Gateways:

Internet gateway
Egress-Only Internet Gateway
NAT Instance
Nat Gateway

Internet gateway

Horizontally scaled, redundant and highly available component that allows communication between your VPC and the internet.

No available risk or bandwidth constraints.

If you subnet is associated with a route to the internet, the it is a public subnet.

Suports IVP4 and IPV6

Purpose 1: Provide route table target for internet bound traffic.

Purpose 2: Perform NAT for instance with public IP addresses.

Does not performance NAT for instances with privates IP’s only.

Egress-Only Internet Gateway

IPV6 address are globally unique and are therefore public by default.

Provides outbound internet access for IPV6 address instances.

Prevents inbound access to those IPV6 instances.

Stateful forwards traffic from instances to internet and the sends back the response.

Must create a custom route for ::/0 to the Egress-Only internet gateway.

Use Egress-Only internet Gateway instead of NAT for IPV6.

Allow IPV6 base on traffic within a VPC to access the internet, whilst denying any internet based resources, the possibility of initiating a connection back into the VPC.

NAT Instance

Ec2 instance from a special AWS provided AMI

Translate traffic from many private IP instance to a single public IP and back.

Doesn’t allow public internet initiate connections into private instances.

Not support fot IPV6(use Egress-Only internet gateway )

NAT instances must live o a public subnet with route to Internet Gateway.

Private instances in private subnets must have route to the NAT instances, usually the default route destination of 0.0.0.0/0

NAT Gateway

Fully managed NAT services that replaces need for NAT instances on EC2.

Must be created in a public subnet.

Uses an elastic IP for public IP for the life of the Gateway.

Private instances in private subnets must have route to the NAT instances, usually the default route destination of 0.0.0.0/0

Create in specific AZ redundancy, create NAT gateway in each AZ with routes for private subnet to use the local Gateway.

Up to 5Gbps bandwidth that can scale up to 45Gbps.

Can’t use a NAT Gateway to access VPC peering, VPN or Direct connect, so be sure to include specific route to those in your route table

Advices:

Only two componentes allow VPC to internet communication using IPV6 address and those are “Internet gateway”(inbound) and “Egress-Only Internet Gateway “(outbund). NAT Instance and NAT Gateway explicitly don’t support IPV6 traffic and a Direct Connection carries data between a Data center and an AWS VPC, but doesn’t travel over the internet.

NAT Gateway vs NAT Instance

Availability :

NAT Gateway -> High availability within AZ

NAT Instance -> On your own

Bandwidth:

NAT Gateway -> Up to 45 Gbps

NAT Instance -> Depends on bandwidth of instance type

Maintenance:

NAT Gateway -> Managed by AWS

NAT Instance -> On your own

Performance:

NAT Gateway -> Optimised for NAT

NAT Instance -> Amazon Linux AMI configured to perform NAT

Public IP:

NAT Gateway -> Elastic IP that can not be detached

NAT Instance -> Elastic IP that can be detached.

Security Groups:

NAT Gateway -> Cannot be associate with NAT Gateway

NAT Instance -> Can use Security Groups

Bastion Server:

NAT Gateway -> Not supported

NAT Instance -> Can be used as bastion server

Routing Table

VPCs have an implicit route and main routing table

You can modify the main routing table or create new tables

Each route table contains a local route for the CIDR block

Most specific route for an address wins

Boarder Gateway Protocol

Popular routing protocol for the internet

Propagate information about the network to allow for dynamic routing

Required for Direct connection and optional for VPN

Alternative of not using BGP with AWS VPC is static routes

AWS supports BGP community tagging as a way to control traffic scope and route preference

Required TCP port 179 + ephemeral port

Autonomous System Number (ASN) = Unique endpoint identifier.

Weighting is local to the route and higher weight is preferred path to outbound traffic.

Enhanced Networking

Generally used for high Performance Computing use-cases.

Uses single root I/O virtualisation (SR-IOV) to delivery high performance that traditional virtualised network interfaces.

Might have to install driver if other than Amazon Linux HVM AMI

Intel 82599 VF interface (10 Gbps)

Elastic Network Adapter (25 Gbps)

Placement Groups:

Clustered
Spread

Clustered:

Instances are placed into a low-latency group within a single AZ

Used when need low network latency and/or high network throughput

Get the most out of Enhanced networking instances

Finite capacity, recommended launching all you might need up front.

Spread:

Instances spread across underlying hardware fails.

Reduce risk of simultaneous failure if underlying hardware fails.

Can span multiples AZ’s

Max of 7 instance running per groups per AZ.

Route 53

Check the health of your domain resources.

Routes internet traffic for your domain.

Whats is a DNS?

The Domain Name Systems (DNS) is the phonebook of the Internet. Humans access information online through domain names, like google.com y ahoo.com.

DNS record types (A, CNAME, MX, TXT, etc)

A-Records (Host address)

The A-record is the most basic and the most commonly used DNS record type.

It is used to translate human friendly domain names such as "www.example.com" into IP-addresses such as 23.211.43.53 (machine friendly numbers).

A-records are the DNS server equivalent of the hosts file - a simple domain name to IP-address mapping.

CNAME-records are domain name aliases.

Computers on the Internet often performs multiple roles such as web-server, ftp-server, chat-server etc.

To mask this, CNAME-records can be used to give a single computer multiple names (aliases).

For example, the computer "computer1.xyz.com" may be both a web-server and an ftp-server, so two CNAME-records are defined:

"www.xyz.com" = "computer1.xyz.com" and "ftp.xyz.com" = "computer1.xyz.com".

TXT-records are used to hold descriptive text.

They are often used to hold general information about a domain name such as who is hosting it, contact person, phone numbers, etc.

One common use of TXT-records is for SPF (see http://www.openspf.org).

ALIAS-Records (Auto Resolved Alias)

ALIAS-records are virtual alias records resolved by Simple DNS Plus at at the time of each request - providing "flattened" (no CNAME-record chain) synthesized records with data from a hidden source name.

This can be used for different purposes - including solving the classic problem with CNAME-records at the domain apex (for the zone name / for "the naked domain").

Route 53 Concepts (alias, hosted zone, etc)

MX-Records (Mail exchange)

MX-records are used to specify the e-mail server(s) responsible for a domain name.

Each MX-record points to the name of an e-mail server and holds a preference number for that server.

When sending an e-mail to "user@example.com", your e-mail server must first look up any MX-records for "example.com" to see which e-mail servers handles incoming e-mail for "example.com".

Route 53 Routing Polices

Simple: Simple routing is the most simple and common DNS policy which can accommodate a single FQDN (fully qualified domain name) or IP address. In case of an A record you have to enter the IP address as the value. For load balancers you use CNAME type.

Failover: Failover routing allows you to route traffic to a resource when the resource is healthy and to another resource when the first one is unhealthy.

Geolocation: Geolocation Routing Policy allows the access to the resources based on the geographic location of the users or client.

Latency: You want to return a website’s IP address to client which has lower latency compared to its identical peer hosted in a different AWS Region

Multivalue Answer: Multivalue answer Routing Policy is like Simple Routing Policy but it can return multiple IP addresses associated with an FQDN.

Weighted: Result is returned based on a weight of the DNS record. This is used for distributing the number of sessions equally or unequally among the servers.

Cloud Front:

Distributed content delivery service for simple static asset caching up to 4k live and on-demand video streaming.

CloudFront is integrated with AWS – both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services. CloudFront works seamlessly with services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code closer to customers’ users and to customize the user experience. Lastly, if you use AWS origins such as Amazon S3, Amazon EC2 or Elastic Load Balancing.

Elastic Load Balance:

Distributed inbound connections to backend endpoints.

Three deferents options:

Application Load Balancer (Layer 7)
Network Load Balancer (layer 4)
Classic Load Balancer (layer 4 or layer 7)

Can be used for public or private network.

Protocols:

ALB: HTTPS, HTTP

NLB: TCP

Classic LB: TCP, SSL, HTTP, HTTPS

Path or Host-based Routing:

ALB: YES

NLB: NO

Classic LB: NO

SSL Offloading:

ALB: YES

NLB: NO

Classic LB: YES

Server Name Indication(SNI)

ALB:YES

NLB: NO

Classic LB: NO

Sticky Session:

ALB:YES

NLB: NO

Classic LB: YES

Static IP, Elastic IP:

ALB:NO

NLB: YES

Classic LB: NO

User Authentication:

ALB:YES

NLB: NO

Classic LB: NO

Application Load Balances:

Classic Load Balancer:

Network Load Balancer:

Concept of Stick Session:

MPLS is an encapsulation protocol used in many service provider and large- scale enterprise networks. Instead of relying on IP lookups to discover a viable "next-hop" at every single router within a path (as in traditional IP networking), MPLS predetermines the path and uses a label swapping push, pop, and swap method to direct the traffic to its destination. This gives the operator significantly more flexibility and enables users to experience a greater SLA by reducing latency and jitter.

Customer Gateway

A customer gateway (CGW) is the anchor on your side of the connection between your network and your Amazon VPC.4 In an MPLS scenario, the CGW can be a customer edge (CE) device located at a Direct Connect location, or it can be a provider edge (PE) device in an MPLS VPN network. For more information on which option best suits your needs, see the Colocation section later in this document.

CIDR = Classless Inter-Domain Routing

WAN = Wide Area Network

MPLS = Multiprotocol Label Switching

References:

domingo, 2 de junho de 2019

What should we know about AWS Data Store?

I hope it could be useful to someone.

Data store

Persistence data store: Durable database after restart( RDS,Glacier )
Transient Data store: temporary data that will be passed to another database or process (SQS, SNS)
Ephemeral Data Store: Data can be lost with stop (EC2 instance store, Memcached)

IOPS vs Throughput

IOPS : (input / output operations per second) measures how fast reading or writing
Throughput: measures the amount of data that can be moved over a period of time

Consistency is far better than rare moments of greatness.

Availability is considered a higher priority for the BASE model but not the ACID model which values consistency over availability. Lazy writes are typical of eventual consistency rather than perpetual consistency

ACID

Atomic -> transaction are all or nothing
Consistence -> transaction must be valid
Isolated -> Transaction can't be mess with one another
Durable -> completed transaction must stick around

BASE

Basic Availability -> values availability even if stale
Soft State -> might not be instantly consistent across stores
Eventual Consistency -> will achieve consistency at some point

DataBase Options:

Database on ec2: ultimate control over database, preferred DB not available on RDS

RDS: traditional database for OLTP, data is well structured ACID complyant

DynamoDB: Name/value or unpredictable data structure, in-memory performance with persistence, scale dynamically

Redshift: massive amount of data, primarily OLAP workloads

Neptune: relationships between objects a major portion of data values

Elasticache: Fast temporary storage for small amounts of data

Saas partitioning models:

Silo: separate database for each tenant
Bridge: single database, multiple schemas
Pool: Shared database, single schema

Silo Model:

Bridge Model:

Pool Model:

RDS:

Managed database option for mysql, Maria, postgreSQL,Sql server, oracle, aurora

Best for structured, relational data store needs.

Aims to be drop-in replacement for existing on-prem instance of same databases

Automated backup and patching in customer-defined maintenance windows.

Push button scaling, replication and redundancy.

Multi-AZ

Read replicas service regional users

For Mysql, non-transnational storage engine like MyISAM don’t support replication, you must use InnoDB or XtraBD on Maria

MariaDB is an open source fork of mysql

RDS Anti-patterns:

Lots of large binary object like BLOBs- user S3
Automated scalability - use dynamoDB
Name/Value Data Structure - use dynamoDB
Data is not well structured or unpredictable - - use dynamoDB
Other database platform like, DB2, Hana or complete control over the database -> ec2

S3

The maximum object at s3 can be 5TB

The biggest object on a put can be 5GB

It is recommended to use multi-part upload if it is greater than 100GB

The filename can be considered a key not exactly a path

Amazon S3 is designed for 99.999999999 percent (11 nines) durability per object

99.99 percent availability over a one-year period.

Consistency:

S3 prove read-after-write consistency for PUTs in new objects

Head or GET requests for keys before the object exists will return in eventual consistency

S3 offers eventual consistency for overwrite PUTs and DELETES

Update on a single key are atomic
If you make a HEAD or GET request for the S3 key name before creating the object, S3 provides eventual consistency for read-after-write. As a result, we will get a 404 Not Found error until the upload is fully replicated.

Security:

Resource-based->object ACL, Bucket policy
User-based -> IAM policy
Optional Multi-factor Authentication before delete

Data protection

New version with each write

Enable roll-back and un-delete capabilities

Old version count as billable size until they are permanently deleted.

Integrated with lifecycle management

Cross-Region Replication

LifecycleManagement

Optimize storage cost

Adhere to data retention policies

Keep S3 volumes well maintained

Analytics

Data lake concepts -> with Athena, Redshift Spectrum, QuickSigth

IoT Streaming data Repository -> Kinesis Firehose

Machine Learning and AI Storage -> Rekognition, Lex, MXNet

Storage Class Analysis -> s3 management Analytics

Encryption at Rest

SSE-S3 -> Use s3 existing encryption keys for AES-256
SSE-C -> Upload your own AES-256 encryption key which s3 will use when it writes the object
SSE-KMS -> Use a key generated and managed by AWS key management Service
Client-side -> Encrypt object using you own local encryption process before uploading S3

Tricks:

Transfer Acceleration -> Speed up data uploads using cloudFront in reverse
Requester pay -> the requester rather the bucket owner pays for requests and data transfer
Tags -> Assing tags to object for use in costing, billing, security.
Events -> trigger notification to SNS, SQS or lambda when certain events happen in you bucket
Satic web hosting -> simple and massively scalable static website hosting
BitTorrent -> use the bitTorrent protocol to retrieve any publicly available object by automatically generating a.torrent file

Glacier

Cheap, slow to respond, seldom accessed

Cold storage

Used by AWS Storage Gateway Virtual tape library

Integrated with aws s3 via lifecycle management

Faster retrieval speed options if you pay more

Amazon Glacier is designed to provide average annual durability of 99.999999999 percent (11 nines) for an archive.
Glacier Vault Lock is an immutable way to set policies on a Glacier vault such as retention or enforcing MFA before delete.

S3 for blobs

EFS: service provides scalable network file storage for Ec2 instances

EBS: service provide block storage volumes for ec2

Ec2 instance storage: temporary block storage volumes for ec2

Storage gateway: an on-premises storage appliance that integrate with cloud storage

Snowball: service transport large amount of data to and from cloud

Amazon S3 is designed for 99.999999999 percent (11 nines) durability and 99.99 percent availability

Each AWS Snowball appliance is capable of storing 50 TB or 80 TB of data

For Amazon S3, individual files are loaded as objects and can range up to 5 TB in size

CloudFront: service provide a global content delivery network

Amazon CloudFront is designed for low-latency and high-bandwidth delivery of content

CDN is an edge cache, Amazon CloudFront does not provide durable storage. The origin server, such as Amazon S3 or a web server running on Amazon EC2

Amazon s3 offers a range of storage classes desired for deferents use cases including:

S3 standard: for general propose storage of frequently accessed data .

S3 standard infrequent access (Standard-IA): for long lived, but less frequently accessed data

Glacier: for low cost archival data, retrieval jobs typically complete in 3 to 5 hours, you can improve the upload experience for large archives by using multipart upload for archives up to about 40 TB (the single archive limit)

S3 Usage Patterns:

First: used to store and distributed static web content and media
Second: used to host entire static websites
Third: used as a data store for computation and large-scale analytics, such as financial transactional analysis, clickstream analytics, and media transcoding.

Elastic Block Storage

Virtual hard drives

Can only be user with ec2

Tied to a single AZ

Variety of optimised choices for IOPS, Throughput and cost

Temporary

Ideal for caches, buffers, works areas

Data goes away when ec2 is stopped or terminated

Cost effective and easy backups strategy

Share data set with other user or account

Migrate a system to a new AZ or Region

Amazon EBS provides a range of volume types that are divided into two major categories: SSD-backed storage volumes and HDD-backed storage volumes.

Volume types:

SSD-Backed Provisioned IOPS (io1): I/O-intensive NoSQL and relational databases
SSD-Backed General Purpose (gp2) : Boot volumes, low-latency interactive apps, dev & test
HDD-Backed Throughput Optimized (st1) : Big data, data warehouse, log processing
HDD-Backed Cold (sc1) : Colder data requiring fewer scans per day

Elastic File Server

Implementation of NFS file share

Elastic storage capacity, and pay for only what you see

Multi-AZ metadata and data storage

Configure mount points in one, or many, AZs.

Can be mounted from on-premises system ONLY if using direct connect

Alternative, use EFS file sync agent

if your overall Amazon EFS workload will exceed 7,000 file operations per second per file system, we recommend the files system use Max I/O performance mode.

IAM permissions for API calls; security groups for EC2 instances and mount targets; and Network File System-level users, groups, and permissions.

Storage gateway

Virtual machine that you run on-premises with VMWare or HyperV

Provides local storage resources backed by S3 and glacier

Ofter used in disaster recovery preparedness to sync to aws

Useful in cloud migration

File gateway: Allow on-prem or Ec2 instance to object in s3 via NFS or SMB mount point

Volume Gateway store mode: async replication of on-prem data to s3

Volume gateway cached mode: primary data stored in s3 with frequently access data cached locally on-prem

Tape Gateway: virtual media changer and tape library for use with existing backup software

WorkDocs

Secure, fully managed file collaboration service

Can integrate with AD for SSO

Web, mobile and native clientes (no linux client)

HIPPA, PCI DSS and ISO compliance requirements

Available SDK for creating complementary apps

Database on Ec2:

Run any database with full control and ultimate flexibility.

Must manage everything like backup, redundancy, patching, scale

Good option with your require a database not yet supported by RDS, such as IBM DB2, or SAP Hana

Good option if it is not feasible to migrated to aws managed database

AWS offers two EC2 instance families that are purposely built for storage-centric workloads:

-SSD-Backed Storage-Optimized (i2):NoSQL databases, like Cassandra and MongoDB, scale out transactional databases, data warehousing, Hadoop, and cluster file systems.

-HDD-Backed Dense-Storage (d2): Massively Parallel Processing (MPP) data warehousing, MapReduce and Hadoop distributed computing, distributed file systems, network file systems, log or data-processing applications.

DynamoDB

Managed, multi-az NoSQL data store with cross-region Replication option

Defaults to eventual consistency reads but can request strongly consistency read via SDK parameter

Priced on throughput, rather than computer

Provision read and write capacity in anticipation of need

Autoscale capacity adjusts per configured min/max levels

On-demand capacity for flexible capacity at a small premium cost

Achieve ACID compliance with DynamoDB transactions

Secondary index

Global Secondary index

-> Partition key and sort key can be different from those on the table

-> User when you want a fast query of attributes outside the primary key without having to do a table scan (read everything sequentially )

Local Secondary index

-> Same partition key as the table but different sort key

-> Use when you already know the partition key and want to quickly query on some other attributes

Max 5 local and 5 global secondary indexes

Max 20 attributes across all indexes

Indexes take up storage space

Example:

Suppose we created a Global secondary index using customerNum, we could query by this fields with light-speed

If you need to:

-> access just a few attributes the fastest way possible, Consider projecting just those few attributes in a global secondary index, Benefits: lowest possible latency access for non-key items.

-> frequently access some non-key attributes , Consider projecting those attributes in a global secondary index, Benefits: lowest possible latency access for non-key items.

-> frequently access most non-key attributes , Consider projecting those attributes or even the entire table in a global secondary index, Benefits: maximum flexibility

-> rarely query but write or update frequently, Consider projecting keys only for the global secondary index, Benefits: very fast write or updates for non-partition keys itens

Redshift:

Fully managed, clustered peta-byte scale data warehouse

Extremely cost-effective as comparable to some other on-premises data warehouse platform

PostgresSQL compatible with JDBC and ODBC drivers available compatible with most BI tools out of the box

Features parallel processing and columnar data store which are optimised for complex queries

Option to query directly from data file on s3 via Redshift spectrum

Multitenancy on Amazon Redshift

Amazon Redshift also places some limits on the constructs that you can create within each cluster. Consider the following limits:

* 60 databases per cluster

* 256 schemas per database

* 500 concurrent connections per database

* 50 concurrent queries

* Access to a cluster enables access to all databases in the cluster

Datalake:

Query raw data without extensive per-processing

Lessen time from data collection to data value

Identify correlations between disparate data sets

Neptune:

Fully-managed graphs database

Supports open graphs APIs for both Gremlin and SPARQL

ElasticCache

Fully-managed implemented of two popular in-memory data store - Redis and Memcached

Push button scalability for memory, writes and reads

In memory key/value store - not persistent in the traditional sense

Billed by node size and hours of use

Web Session store - in case with load balance web servers, store web session information in redis so if a server is lost, the session info is not lost and another web server can pick-up

Database caching -use memcached in front of AWS RDS to cached popular queries to offload work from RDS and return results faster to users.

Leaderboards- use redis to provide a live leaderboard from millions of users of your mobile app.

Streaming Data Dashboards-provide a landing spot of streaming sensor data on the factory floor, providing live real-time dashboard displays.

Memcahed: If you need cached like a queries database

Redis

If you need encryption, HIPPA compliance, suporte clustering.

High availability

Pub/sub capability

Geo spacial indexing

Backup and restore

Alternatives to ElastiCache:

Amazon CloudFront content delivery network (CDN)—this approach is used to cache web pages, image assets, videos, and other static data at the edge, as close to end users as possible
Amazon RDS Read Replicas—some database engines, such as MySQL, supportthe ability to attach asynchronous read replicas.
On-host caching—a simplistic approach to caching is to store data on each Amazon EC2 application instance, so that it's local to the server for fast lookup

When deciding between Memcached and Redis, here are a few questions to consider:

Is object caching your primary goal, for example to offload your database? If so, use Memcached.
Are you interested in as simple a caching model as possible? If so, use Memcached.
Are you planning on running large cache nodes, and require multithreaded performance with utilization of multiple cores? If so, use Memcached.
Do you want the ability to scale your cache horizontally as you grow? If so, use Memcached.
Does your app need to atomically increment or decrement counters? If so, use either Redis or Memcached.
Are you looking for more advanced data types, such as lists, hashes, and sets? If so, use Redis.
Does sorting and ranking datasets in memory help you, such as with leaderboards? If so, use Redis.
Are publish and subscribe (pub/sub) capabilities of use to your application? If so, use Redis.
Is persistence of your key store important? If so, useRedis.
Do you want to run in multiple AWS Availability Zones (Multi-AZ) with failover? If so, use Redis.

References:

https://d1.awsstatic.com/whitepapers/Storage/AWS%20Storage%20Services%20Whitepaper-v9.pdf

https://d1.awsstatic.com/whitepapers/Multi_Tenant_SaaS_Storage_Strategies.pdf

https://www.youtube.com/watch?v=TJxC-B9Q9tQ

https://www.youtube.com/watch?v=_YYBdsuUq2M

https://www.youtube.com/watch?v=9wgaV70FeaM