AWS – CloudTrial

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service.

AWS CloudTrail provides a record of your AWS API calls. You can use this data to gain visibility into user activity, troubleshoot
operational and security incidents, or to help demonstrate compliance with internal policies or regulatory standards.

This information is collected and written to log files that are stored in an Amazon S3 bucket that you specify.

– Once you have enabled CloudTrail, event logs are delivered every 5 minutes. You can configure CloudTrail so that it aggregates log files from multiple regions into a single Amazon S3 bucket.
– In addition to CloudTrail’s user activity logs, you can use the Amazon CloudWatch Logs feature to collect and monitor system, application, and custom log files from your EC2 instances and other sources in near real time.

Use Cases

  • Security analysis
  • Track changes to AWS Resources
  • Compliance Aid
  • Troubleshoot Operational issues

– by default,  cloudTrail log files are encrypted using S3 Server Side Encryption (SSE) and placed into your S3 Bucket.

– You can turn on Amazon SNS notifications so that you can take immediate action on delivery of new logs

AWS – CloudWatch

Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on AWS. You can use Amazon CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. Amazon CloudWatch can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and services, and any log files your applications generate. You can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. You can use these insights to react and keep your application running smoothly.

 

-Many metrics are received and aggregated at 1 – minute intervals. Some are at 3 minute or 5 – minute interval

  • Metrics data are available for 2 weeks
  • Metrics can not be deleted, but they automatically expire after 2 weeks

Metrics Retention

CloudWatch now stores all metrics for 15 months at no extra charge ( nov 2016 ). In order to keep the overall volume of data reasonable, historical data is stored at a lower level of granularity, as follows:

  • One minute data points are available for 15 days.
  • Five minute data points are available for 63 days.
  • One hour data points are available for 455 days (15 months).

CloudWatch metrics require a custom monitoring script to populate the metric:

  • Swap Usage
  • Available Disk Space

Aggregation : 

  • CloudWatch does not aggregate data across regions
  • Aggregated statistics are only available when using detailed monitoring.

Cloudwatch

– do not provide detailed monitoring for EMR

  • by default detailed monitoring is enable for Auto Scaling

– provide free detailed monitoring for :

  • AWS Route 53
  • AWS RDS
  • AWS ELB
  • opsworks

 

  • to upload custom metrics you can use the AWS CLI or the API

Reference

 

HDP – Data workflow

Sqoop

Apache Sqoop efficiently transfers bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. Sqoop can also be used to extract data from Hadoop and export it into external structured datastores.

Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB

Flume

A service for streaming logs into Hadoop

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.

YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an Enterprise Hadoop cluster

Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop.

Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop.

Kafka

NFS

WebHDFS

 

 

 

AWS – SQS

Amazon Simple Queue Service (SQS) and Amazon SNS are both messaging services within AWS, which provide different benefits for developers. Amazon SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism, eliminating the need to periodically check or “poll” for updates.

Amazon SQS is a message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components. Amazon SQS provides flexibility for distributed components of applications to send and receive messages without requiring each component to be concurrently available.

Amazon Simple Queue service (SQS) is a fast, reliable, scalable, fully managed message queuing service

You can use SQS to transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available.

Each queue start with default settings of 30 seconds for the visibility timeout .

You can change that settings for entire queue.

You can change – specifying a new timeout value using the

ChangeMessageVisibilitiy

  • Messages can be retained in queues for up to 14 days.
  • the maximum VisibilityTimeout of an SQS message in a queue is 12 hours ( 30 sec visibility timeout default )
  • Message can contain upto 256KB of text, billed at 64KB chunks
  • Maximum long poling timeout 20 seconds

First 1 million request are free, the $0.50 per every million requests

No order – SQS messages can be delivered multiple times in any order

Amazon SQS uses short polling by default, querying only a subset of the servers to determine whether any messages are available for inclusion in the response.

Long polling setup Receive Message Wait Time – 20 s (value from 1 s to 20 s )

Benefit of Long polling

Long polling helps reduce your cost of using Amazon SQS by reducing the number of empty responses and eliminate false empty responses.

  • Long polling reduce the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response
  • Long polling eliminate false empty responses by querying all of the servers
  • Long polling returns messages as soon message becomes available

 

FIFO queues are designed to enhance messaging between applications when the order of operations and events is critical, for example:

  • Ensure that user-entered commands are executed in the right order.
  • Display the correct product price by sending price modifications in the right order.
  • Prevent a student from enrolling in a course before registering for an account.

Note

The name of a FIFO queue must end with the .fifo suffix. The suffix counts towards the 80-character queue name limit. To determine whether a queue is FIFO, you can check whether the queue name ends with the suffix.

Reference

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html

AWS – SNS

Amazon Simple Notification Service (SNS) is a simple, fully-managed “push” messaging service that allows users to push texts, alerts or notifications, like an auto-reply message, or a notification that a package has shipped.

Amazon Simple Notification Service (Amazon SNS) is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients. In Amazon SNS, there are two types of clients—publishers and subscribers—also referred to as producers and consumers

 

AWS – SWF

SWF – actors

  • workflow starters – human interaction to complete order or collection of services to complete a work order.
  • deciders – program that co-ordinates the tasks
  • activity workers – interact with SWF to get task, process received task and return the results

 

Amazon SWF is useful for automating workflows that include long-running human tasks.

AWS – CloudFormation

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

When you use AWS CloudFormation, you work with templates and stacks. You create template to describes your AWS resources and their properties.

CloudFormation has two parts: templates and stacks. A template is a JavaScript Object Notation (JSON) text file. The file, which is declarative and not scripted, defines what AWS resources or non-AWS resources are required to run the application.

Your CloudFormation templates templates can live with your application in your version control repository, allowing architectures to be reused

 

 

 

Amazon – EBS

-Data that is stored on an Amazon EBS volume will persist independently of the life of the instance

– if you use Amazon EBS volume as root partition , you will need to set the Delete on Termination flag to “N” if you want your Amazon EBS volume to persist outside the life of the instance

Snapshots 

You need to retain only the most recent snapshots in order to restore the volume

 

  • Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted
  • by default, only you can create a volumes from snapshots that you own
  • you can not enable encryption for an exiting EBS volume

-You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time that snapshot command has been issued .

  • To create a snapshot for Amazon EBS volumes that server as a root devices you should stop the instance before taking a snapshot
  • The snapshot that you take of an encrypted volume are also encrypted and can be moved between AWS regions as nedded
  • You can not share  encrypted snapshots with other AWS accounts and you ca not make them public

 

– EBS encryption feature is only available on EC’2 more powerfull instances types ( e.g M3, C3, R3, CR1, G2, and I2 Instances )

You can not attached an encrypted EBS volume to other instances

With Amazon EBS encryption, you can now create an encrypted EBS volume and attach it to a supported instance type. Data on the volume, disk I/O, and snapshots created from the volume are then all encrypted. The encryption occurs on the servers that host the EC2 instances, providing encryption of data as it moves between EC2 instances and EBS storage. EBS encryption is based on the industry standard AES-256
cryptographic algorithm.

Public snapshots of encrypted volumes are not supported, but you can share an encrypted snapshot with specific accounts if you
take the following steps:

– Use a custom CMK, not your default CMK, to encrypt your volume.
– Give the specific accounts access to the custom CMK.
– Create the snapshot.
– Give the specific accounts access to the snapshot.

 

Amazon EBS provides three volume types: General Purpose (SSD) volumes, Provisioned IOPS (SSD) volumes, and Magnetic volumes

 

Snapshot on RAID Volumes

Migrate data between encrypted and unencrypted volume

 

Warning
On an EBS-backed instance, the default action is for the root EBS volume to be deleted when the instance is terminated.
Storage on any local drives will be lost.

That mean EBS volume is deleted when you terminate the instance !

 

Notes :

M- General purpose

C – Compute optimized

R- instance are optimised for memory-intensive

G – GPU

Amazon – WorkSpaces

Amazon WorkSpace is a fully managed, secure desktop computing service which run on AWS cloud. Amazon WorkSpace allows you to easily provision cloud-based virtual desktops and provide your users access to the documents, applications, and resources they need from any supported device, including Windows and Mac computers