Rafael Salerno - DevOps Engineering: agosto 2019

domingo, 25 de agosto de 2019

AWS Architecting to Scale in a nutshell

I did and I used this summary with some bullets and some parts of text extracted from AWS papers and from come online courses like cloud guru, linux academy, udemy. To study concepts about AWS Architecting to Scale.

I hope it could be useful to someone.

Scaling in cloud is totaly related with Microservice architectures, this approach to software development to speed up deployment cycles, foster innovation and ownership, improve maintainability and scalability of software applications, and scale organizations delivering software and services by using an agile approach that helps teams to work independently from each othe

Microservices architectures are not a completely new approach to software engineering, but rather a combination of various successful and proven concepts such as:

Agile software development
Service-oriented architectures
API-first design
Continuous Integration/Continuous Delivery (CI/CD)

In many cases, design patterns of the Twelve-Factor App are leveraged for microservices.

Distributed Data Management

Monolithic applications are typically backed by a large relational database, which defines a single data model common to all application components. In a microservices approach, such a central database would prevent the goal of building decentralized and independent components. Each microservice component should have its own data persistence layer.

Distributed data management, however, raises new challenges. As a consequence of the CAP Theorem,distributed microservices architectures inherently trade off consistency for performance and need to embrace eventual consistency.

In a distributed system, business transactions can span multiple microservices. Because they cannot leverage a single ACID transaction, you can end up with partial executions. In this case, we would need some control logic to redo the already processed transactions. For this purpose, the distributed Saga pattern is commonly used. In the case of a failed business transaction, Saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions.

Saga execution coordinator:

Loosely Coupled Architecture

Loosely couple architectures have several benefits but the main benefit in terms of scalability is atomic functional units. These discrete units of work can scale independently.

Layers of abstraction
Permits more flexibility
Interchangeable components
More atomic functional units
Can scale components independetely

Horizontal vs Vertical Scaling

Horizontal Scaling	Vertical Scaling
Add more instances as demand increase	Add more CPU and/or more RAM to existing instance as demand increase
No downtime required to scale up or scale down	Requires restart to scale up or down
Automatic using Auto-Scaling Groups	Would require scripting to automate
(Theoretically) Unlimited	Limited by increase sizes

Auto-Scaling Groups

If your scaling is not picking up the load fast enough to maintain a good service level, reducing the cooldown can make scaling more dramatic and responsive

Automatically provides horizontal scaling for your landscape.
Triggered by an event or scaling actions to either launch or terminate instances.
Availability, Cost and System Metrics can all factor into scaling.

Four Scaling options:

Maintain- Keep a specific or minimum number of instances running
Manual - Use maximum, minimum, or specific number of instances
Schedule - Increase or decrease instances based on schedule
Dynamic- Scale based on real-time metrics of the systems

Launch Configuration:

Specify VPC and subnets for scaled instances
Attach to a ELB
Define a Health Check Grace Period
Define size of group to stay at initial size.
Use scaling policy which can be based from metrics

Scaling Types:

Scaling Type	What	When
Maintain	Hands-off way to maintain X number of instances	“I need 3 instances always"
Manual	Manually change desired capacity via console or CLI	“My needs change so rarely that I can just manually add and remove"
Scheduled	Adjust min/max instances based on specific times	“Every Monday morning, we get a rush an our website"
Dynamic	Scale in response to behaviour of elements in the environment.	“When CPU utilization gets to 70% on current instances, scale up."

Scaling Policies:

Scaling	What
Target Tracking Policy	Scaled based on a predefined or custom metrics in relation to a target value
Simple Scaling Policy	Waits until health check and cool down period expires before evaluating new need.
Step Scaling Policy	Responds to scaling needs with more sophistication and logic.

Scaling Cooldowns

Configurable duration that give your scaling a chance to “come up to speed “ and absorb load.
Default couldown period is 300 seconds.
Automatically applies to dynamic scaling and optionally to manual scaling but not supported by scheduled scaling.
Can override default couldown via scaling-specific cool down.

Kinesis

Kinesis Data Streams can immediately accept data that has been pushed into a stream, as soon as the data is produced, which minimises the chances of data loss at the producer stage.

The data does not need to be batched first.

They can also extract metrics, generate reports and perform analytics on the data in real-time, therefore the first two options are correct.

Kinesis Data Streams is not a long term storage solution as the data can only be stored within the shards for a maximum of 7 days. Also, they can't handle the loading of the streamed data directly into data stores such as S3.

Although data can be read (or consumed) from shards within Kinesis Streams using either the Kinesis Data Streams API or the Kinesis Consumer Library (KCL), AWS always recommend using the KCL. The KPL (Kinesis Producer Library) will only allow writing to Kinesis Streams and not reading from them. You can not interact with Kinesis Data Streams via SSH.

Collections of services for processing streams of various data.
Data is processed in “shards” - with each shard able to ingest 1000 records per second.
A default limit of 500 shards, but you can request an increase to unlimited shards
Record consists of Partitions Key, Sequence Number and Data Blob (up to 1MB).
Transient Data Store - Default retention of 24 hours, bu can be configured for up to 7 Days.

Kinesis Video Streams

Kinesis Data Streams

Kinesis Data Analytics

Kinesis Data Streams Key Concepts

DynamoDB

Throughput :

Read Capacity Units
Write Capacity Units

Max item size is 400KB

Terminology:

Partition: A physical space where DynamoDB data is stored.
Partition key: A unique identifier for each record, sometimes called a Hash Key.
Sort Key: In combination with a partition key, optional second part of a composite key that defines storage order, sometimes called a Range Key.

To determine the partitions, we need to know the table size, the RCUs and the WCUs. But we know we will at least have 3 partitions given the 25GB size ( 25 / 10 = 2.5 rounded up to 3 )

Partition Calculation:

By Capacity: Total RCU/ 3000 + Total WCU/ 1000
By Size: Total Size / 10 GB
Total Partition: Round up for the MAX (By Capacity, By Size)

Wrong way to use Partition key with date and sort key with ID:

When ask for all the sensor readings for 2018-01-01, will search in the same partition.

In right way, Partition key with ID and Sort Key with date.

When ask for all the sensor readings for 2018-01-01, will search in all partition.

Auto Scaling for DynamoDB:

Using target tracking method to try to stay close to target utilisation.
Currently does not scale down if table’s consumption drops to zero.
Workaround 1: Send request to the tables unit auto scales down.
Workaround 2: Manually reduce the max capacity to be the same as minimum capacity.
Also supports Global Secondary indexes- think of them like a copy of table.

Cloud Front

Behaviors allow us to define different origins depending on the URL path. This is useful when we want to serve up static content from S3 and dynamic content from an EC2 fleet for example for the same website.

Can delivery content to your users faster by caching static and dynamic content at edge locations.
Dynamic content delivery is achieved using HTTP cookies forwarded from your origin.
Supports Adobe Flash Media Server’s RTMP protocol but you have to choose RTMP delivery method.
Web distribution also supports media streaming and live streaming but use HTTP or HTTPS.
Origins can be S3, EC2,ELB or another web server.
Multiple origins can be configured.
Use behaviour to configure serving up origin content based on URL paths.

Invalidation Requests

Simply delete file from the origin and wait for the TTL to expire.
Use the AWS Console to request invalidation for all content or a specific path such as /images/*
Use the CloudFront API to submit an invalidation request.
Use third-party tools to perform CloudFont invalidation (CloudBerry,Ylastic,CDN Planet, CloudFront Purge Tool)

Simple Notification Service (SNS)

Enables a Publish/Subscribe design pattern.
Topics = A channel for publishing a notification
Subscription = Configuring and endpoint to receive messages published on the topic
Endpoint protocols include HTTP(S), Email, SMS, SQS, Amazon Device Messaging (push notification )and Lambda

Simple Queue Service (SQS)

Reliable, highly-scalable, hosted message queue service
Available integration with KMS for encrypted messaging.
Transient storage default 4 days, mas 14 days.
Optionally supports First-in First-out queue ordering.
Maximum message size of 256KB but using a special Java SQS SDK, you can have message as large as 2GB.

Amazon MQ

Managed implementation of Apache ActiveMQ
Fully managed and highly available within a region.
ActiveMQ API and supports for JMS, NMS, MQQT, WebSocket.
Design as a drop-in replacement for on-premises message brokers.
Use SQS if you creating a new application from scratch.
Use MQ if you want an easy low-hassle path to migrate from existing message brokers to AWS.

Lambda

Allows you to run code on-demand without the need for infrastructure.
Supports Node,js, Python, Java, Go and C#.
Extremely useful option for creating rerverless architectures.
Code is stateless and execute on an event basis (SNS,SQS, S3, DynamoDB Streams etc.).
No fundamental limits to scaling a function since AWS dynamically allocates capacity in relation to events.

Simple Workflow Service (AWS SWF)

Create distributed asynchronous system as workflows.
Supports both sequencial and parallel processing.
Tracks the state of your workflow which you interact and update via API.
Best suited for humans enable workflow like a order fulfilment or procedural requests.
AWS recommends new applications look like at step functions over SWF.

Example:

Step Functions

Managed workflow and orchestration platform
Scalable and highly available
Define your app as state machine
Create tasks, sequencial steps, parallel steps, branching paths or timers.
Amazon State Language declarative JSON.
Apps can interact and update the stream via Step Function API
Visual interface describe flow and realtime status.

AWS Batch

Management tools for creating, managing and executing batch-oriented tasks using EC2 instances.

Create a Computer Environment: Management to Unmanaged, Spot or On-Demand, vCPUs
Create a Job Queue with priority and assigned to a Computer Environment
Create Job Definition; Script to JSON, environment variables, mount points, IAM role, containers images. etc
Schedule the Job

	When	Use Case
Step Function	Out-of-the-Box coordination of AWS service components	Order Processing Flow
Simple workflow Service	Need to support external processes or specialized execution logic	Loan Application Process with Manual Review Steps
Simple Queue Service	Messaging Queue, Store and forward patterns	Image Resize Process
AWS Batch	Scheduled or reoccurring tasks that do not required heavy logic	Rotate Logs Daily on Firewall Appliance

Elastic Map Reduce (EMR)

The Zoo

Managed Hadoop framework for processing huge amount of data
Also supports Apache Spark, HBase, Presto and Flink
Most commonly used for log analysis, financial analysis or extract, translate and loading ETL activity.
A Step is a programatic tasks for performing some process on the data
A cluster is a collection of EC2 instances provisioned by EMR to run your steps

Components of AWS EMR

AWS EMR Process

An Overview of Traditional Web Hosting

The same kind of application on AWS

Security groups in a web application

Memcached vs. Redis

Memcached—a widely adopted in-memory key store, and historically the gold standard of web caching. ElastiCache is protocol-compliant with Memcached, so popular tools that you use today with existing Memcached environments will work seamlessly with the service. Memcached is also multithreaded, meaning it makes good use of larger Amazon EC2 instance sizes with multiple cores.

Redis—an increasingly popular open-source key-value store that supports more advanced data structures such as sorted sets, hashes, and lists. Unlike Memcached, Redis has disk persistence built in, meaning that you can use it for long-lived data. Redis also supports replication, which can be used to achieve Multi-AZ redundancy, similar to Amazon RDS.

Architecture with ElastiCache for Memcached

Architecture with ElastiCache for Redis

Reference:

sábado, 10 de agosto de 2019

What should we know about AWS Migration?

I hope it could be useful to someone.

Cloud Adoption Framework

Business:

Creation of a strong business case for cloud adoption.

Business goal are congruent with cloud objective.

Ability to mensure benefits (TCO,ROI)

People:

Evaluate organizacional roles and structure, new skill and process needs and identify gaps.

Incentives and Career Management aligned with evolving roles.

Training options appropriate for learning styles.

Plataform:

Resource provisioning can happen with standardisation.

Architecture pattern adjusted to leverage cloud native.

New application development skills and processes enable more agility.

Security:

Identity and access Management modes change.

Logging and audit capabilities will envolve.

Shared Responsibility Model removes some and adds some facets.

Operations:

Services monitoring has potential to be high automated.

Performance management can scale as needed.

Business continuity and disaster recovery takes on new methods in the cloud.

Cloud Adoption Phases

Hybrid Architecture

Hybrid Architectures make use of cloud resources along with on-premises resources.

Very common first step as a pilot for cloud migrations.

Infrastructure can argument or simple be extensions of on-premises platform- VMWare for example.

Ideally, integrations are loosely coupled, meaning each end can exits without extensive knowledge of the other side.

Storage Gateway creates a bridge between on-premises and AWS.

Seamless to end-users.

Common first step into the cloud due to low risk and appealing economics.

Middleware often a great way to leverage cloud services.

Loosely coupled canonical based.

VMWare vCenter plugin allows transparent migration of VMs to and from AWS.

VMWare Cloud furthers this concept with more public cloud native features.

Migration tools

AWS Server Migration Service

Automates migration of on-premises VMWare vSphere or Microsoft Hyper-V/SCVMM virtual machines to AWS.

Replicates VMs to AWS, syncing volumes and creating periodic AMIs.

Minimizes cutover downtime by syncing VMs incrementally.

Supports Windows and linux VMs only.

The Server Migration Connector is downloaded ad a virtual appliance into your on-perm vSphere or Hyper-V setup.

Database Migration Service

DMS along with the Schema Conversion helps customers migrate databases to AWS RDS or EC2-bases database.

Schema Conversion Tools (SCT) can copy database schemas for homogeneous migration(Same database) and contest schemas for heterogeneous migration(different database ).

DMS is used for smaller, simpler conversion and also supports MongoDB and DynamoDB.

Schema Conversion Tools (SCT) used for large, more complex datasets like data warehouses.

DMS has replication function for on-premises to AWS or to Snowball or S3.

Application Discovery Service

Gathers information about on-premises data centers to help in cloud migration planning.

Often customers don’t know the full inventory or status of all their data centers assets, so this tool helps with that inventory.

Collects configs, usage and behaviour data from yours servers to help in estimating TCO (Total Cost of Ownership) of running on AWS .

Can run as agent-less (VMWare environment) or agent-based (non-VMWare environment).

Only supports those OSes that AWS supports (Linux and Windows)./

AWS Migration HUB

Migration hub simplifies and accelerate discovery and migration form your data centers to the AWS cloud.

CIDR Reservation

Ensure your IP addresses will not overlap between VPC and on-premises.

VPCs supports IPV4 netmasks range from /16 to /28.

/16 = 255.255.0.0 = 65,024 Hosts
/28 = 255.255.255.240 = 16 Hosts

5 IPs are reserved in every VPC subnet (example: 10.0.0.0/24).

10.0.0.0 Network Address
10.0.0.1 Reserved by AWS for VPC router
10.0.0.2 Reserved by AWS for DNS
10.0.0.3 Reserved by AWS for future use
10.0.0.255 VPCs don’t supports broadcast so AWS reserves this address.

Network Migration

Most organisation start with a VPC connection to AWS.

As usage grows, they might choose Direct connect but keep the VPN as a Backup.

Transition from VPN to Direct Connect can be relatively seamless using BGP.

Once Direct Connect is set-up, configure both VPN and Direct Connect within the same BGP prefix.

From the AWS side, the Direct Connect path is always preferred.

Amazon Snow Family

Evolution of AWS Import/Export process.

Move massive amounts of data to and from AWS.

Data transfer as fast or as slow as you ’re willing to pay an common carrier.

Encrypted at rest.

AWS Import/Export: Ship an external hard driver to AWS. Someone at AWS plugs it in and copies your data to S3.

AWS SnowBall: Ruggedized NAS in box AWS ships to you. You copy over up to 80TB of your data and ship it back to AWS. They copy the data over to S3.

AWS SnowBall Edge: Same as Snowball, but will onboard Lambda and clustering.

AWS Snowmobile: A literal shipping container full of storage (up to 100PB) and a truck to transport it.

AWS CAF perspectives

Organizational change management to accelerate your cloud transformation

Business Drivers

The number one reason customers choose to move to the cloud is for the agility they gain. The AWS Cloud provides more than 90 services including everything from compute, storage, and databases, to continuous integration, data analytics, and artificial intelligence.

Common drivers that apply when migrating to the cloud are:

Operational Costs
Workforce Productivity
Cost Avoidance
Operational Resilience
Business Agility

Migration Strategy

The 6 R’s”: 6 Application Migration Strategies

Migration Pattern	Transformation Impact	Complexity
Refactoring	Rearchitecting and recoding require investment in new High capabilities, delivery of complex programs and projects, and potentially significant business disruption. Optimization for the cloud should be realized.	High
Replatforming	Amortization of transformation costs is maximized over larger High migrations. Opportunities to address significant infrastructure upgrades can be realized. This has a positive impact on compliance, regulatory, and obsolescence drivers. Opportunities to optimize in the cloud should be realized.	High
Repurchasing	A replacement through either procurement or upgrade. Medium Disposal, commissioning, and decommissioning costs may be significant.	Medium
Rehosting	Typically referred to as lift and shift or forklifting. Automated Medium and scripted migrations are highly effective.	Medium
Retiring	Decommission and archive data as necessary. Low	Low
Retaining	This is the do nothing option. Legacy costs remain and Low obsolescence costs typically increase over time.	Low

1. Re-host (Referred to as a “lift and shift.”)
- Move applications without changes
2. Re-platform (Referred to as “lift, tinker, and shift.”)
- Make a few cloud optimizations to achieve a tangible benefit.
3. Re-factor / Re-architect
- Re-imagine how the application is architected and developed using cloud-native features.
4. Re-purchase
- Move from perpetual licenses to a software-as-a-service model.
5. Retire
- Remove applications that are no longer needed
6. Retain ( Referred to as re-visit.)
- Keep applications that are critical for the business but that require major refactoring before they can be migrated

Comparison of cloud migration strategies

Your migration strategy should address the following questions:

Is there a time sensitivity to the business case or business driver, for example, a data center shutdown or contract expiration?
Who will operate your AWS environment and your applications? Do you use an outsourced provider today? What operating model would you like to have long-term?
What standards are critical to impose on all applications that you migrate?
What automation requirements will you impose on applications as a starting point for cloud operations, flexibility, and speed? Will these requirements be imposed on all applications or a defined subset? How will you impose these standards?

Building a Business Case for Migration

A migration business case has four categories:

1) run cost analysis

2) cost of change

3) labor productivity

4) business value.

A business case for migration addresses the following questions:

What is the future expected IT cost on AWS versus the existing (base) cost?
What are the estimated migration investment costs?
What is the expected ROI, and when will the project be cash flow
positive?
What are the business benefits beyond cost savings?
How will using AWS improve your ability to respond to business changes?

The data from each value category shown in the following table provides a compelling case for migration.

The following are key elements of the platform work stream:

AWS landing zone – provides an initial structure and pre-defined configurations for AWS accounts, networks, identity and billing frameworks, and customer-selectable optional packages.

Account structure – defines an initial multi-account structure and pre- configured baseline security that can be easily adopted into your organizational model.

Network structure – provides baseline network configurations that support the most common patterns for network isolation, implements baseline network connectivity between AWS and on-premises networks, and provides user- configurable options for network access and administration.

Pre-defined identity and billing frameworks – provide frameworks for cross-account user identity and access management (based on Microsoft Active Directory) and centralized cost management and reporting.

Pre-defined user-selectable packages – provide a series of user-selectable packages to integrate AWS-related logs into popular reporting tools, integrate with the AWS Service Catalog, and automate infrastructure.

Application Migration Process

Migration Steps & Tools

Application migration to AWS involves multiple steps, regardless of the database engine:

1. Migration assessment analysis

2. Schema conversion to a target database platform

3. SQL statement and application code conversion

4. Data migration

5. Testing of converted database and application code

6. Setting up replication and failover scenarios for data migration to the target platform

7. Setting up monitoring for a new production environment and go live with the target environment

Each application is different and may require extra attention to one or more of these steps:

Tools for automate migration

AWS Schema Conversion Tool (AWS SCT) – a desktop tool that automates conversion of database objects from different database migration systems (Oracle, SQL Server, MySQL, PostgreSQL) to different RDS database targets (Aurora, PostgreSQL, Oracle, MySQL, SQL Server).

AWS Database Migration Service (DMS) – a service for data migration to and from AWS database targets.

AWS SCT and AWS DMS can be used independently. For example, AWS DMS can be used to synchronize homogeneous databases between environments, such as refreshing a test environment with production data. However, the tools are integrated so that the schema conversion and data migration steps can be used in any order. Later in this guide we will look into specific scenarios of integrating these tools.

AWS Database Migration Service

You can migrate data in two ways:

As a full load of existing data
As a full load of existing data, followed by continuous replication of data changes to the target

CDC offers two ways to implement ongoing replication:

Migrate existing data and replicate ongoing changes - implements ongoing replication by:

a. (Optional) Creating the target schema.

b. Migrating existing data and caching changes to existing data as it is migrated.

c. Applying those cached data changes until the database reaches a steady state.

d. Lastly, applying current data changes to the target as soon as they are received by the replication instance.

Replicate data changes only – replicate data changes only (no schema) from a specified point in time.

Challenges and Barriers

Your organization needs to overcome the following key challenges and barriers during this stage of the transformation:

• Limited knowledge and training

• Executive support and funding

• Purchasing public cloud services

• IT ownership and direction

Reference: