GCP

Zones, Regions, Dual-Regions, and Multi-Regions

Subnets are regional resources

– Because subnets are regional objects, the region you select for a resource determines the subnets it can use.”

–  multi-regions and dual-regions are geo-redundant

  • Regions are independent geographic areas that consist of zones
  • A dual-region is a specific pair of regions

-Cloud KMS resources can be created in the following dual-regional locations
-A dual-region is a specific pair of regions
-Objects stored in a multi-region or dual-region are geo-redundant

– Data that is geo-redundant is stored redundantly in at least two separate geographic places separated by at least 100 miles
-Geo-redundancy occurs asynchronously

Currently, nam4 and eur4 are the only Dual-Regions available.

-Geo-redundancy occurs asynchronously

-Data that is geo-redundant is stored redundantly in at least two separate geographic places separated by at least 100 miles

A GCP organization’s combined IAM policy at any level of the Cloud Resource Hierarchy is a combination of the policies at that level, plus any policies inherited from higher levels.

 

Cloud Spanner – Global replication of relational data

BigQuery dataset, Location Types are available : Regional and Multi Regional

 

Billing

Billing accounts can contain billing sub accounts

Billing Account are connected to Payments Profile

Billing Account user – Link projects to billing accounts

Export billing options

Export Cloud Billing to :

  • BigQuery
  • Cloud Storage

Billing for resources that participate in a Shared VPC network is attributed to the service project where the resource is located

 

Cloud IAM 

An Organization contains one or more folders. A Folder contains one or more Projects . A Project contains one or more Resources.

-A Role is a collection of permissions

-An IAM Policy object consists of a list of bindings

  • Projects can contain resources in different Region
  • Projects are configured with default Region and Zone
  • You don’t assign permissions to users directly. Instead, you assign them a Role which contains one or more permissions
  • Members can be of the following types: Google account, Service account, Google group, G Suite domain, Cloud Identity domain
  • A Binding binds a list of members to a role.”

Each GCP project can contain only a single App Engine application, and once created you cannot change the location of your App Engine application

MFA stands for Multi-Factor Authentication, and it is a best practice to use this to secure accounts.

IAM roles can be assigned per bucket.

Predefined roles are granular and assigned to the service level for much more fine-tuned access

Primitive roles are broad, project-wide roles assigned to the project level.

 

Cloud Identity and GSuite are the two ways to centrally manage Google accounts.

KMS – stand for in Cloud KMS – Key management Services

 

Service Accounts

Resources not hosted on GCP should use a custom service account key for authentication.

 

Cloud SDK 

The gcloud alpha and gcloud beta commands are two groups of additional Cloud SDK commands that you can install for the gcloud component.

 

 

-Maximum Size of a Cloud Storage Bucket – unlimited

-Cloud Storage offers unlimited object storage and individual objects can be as large as 5TB

-Versioning can be enabled on a Cloud Storage Bucket.

 

gsutil – This is a Cloud SDK component used to interact with Cloud Storage.

to view which project is default, run gcloud config list command. This will list properties for the active configuration, including the default project.

 

$ gcloud compute instances list

$ gcloud compute ssh

$ gcloud compute ssh ovi@server –dry-run

 

** *** Snapshot ***

$ gcloud compute snapshots list

$ gcloud compute disks list

$ gcloud compute disks snapshot development-server

$ gcloud compute images list

$ gcloud container clusters list

$ gcloud config list

$ gcloud app versions list

 

gcloud config configurations create

gcloud config configurations activate

gcloud config set project [ Project_ID]

gcloud logging read “login_name”

gcloud logging read “login_name” –limit 15

 

DISK

Create disk:

gcloud compute disks create (DISK_NAME) –type=(DISK_TYPE0 –size=(SIZE) –zone=(ZONE)

gcloud compute disks create disk-1 –size=50GB –zone=us-east1-b

Resize disk:

gcloud compute disks resize (disk_name)–size=(size) –zone=(zone)

gcloud compute disks resize disk-1 –size=150 –zone=us-east1-b

Attach disk:

gcloud compute instances attach-disk instance –disk=(disk_name) –zone=(zone)

 

snapshot

gcloud compute disks snapshot web1 –snapshot-names web1-backup-v1 –zone us-central1-a
gcloud compute snapshots list
gcloud compute snapshots describe web1-backup-v1

 

-persistent disks will not be deleted when an instance is stopped.

– persistent disk performance is based on the total persistent disk capacity attached to an instance and the number of vCPUs that the instance has. Incrementing the persistent disk capacity will increment its throughput and IOPS

 

Video for reference: Installing the Cloud SDK

View default cloud configuration

gcloud config list

gcloud container clusters get-credentials —> to authenticate and configure kubectl  

 

Preemptible Virtual Machines

Affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads.
Go to console

Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours

  1. // ENABLE PREEMPTIBLE OPTION
  2. gcloud compute instances create my-vm --zone us-central1-b --preemptible

 

App Engine 

  • web based workloads, high availability, no ops

Flexible environments are able to use a Dockerfile to create custom runtimes

-App Engine is regional

-App Engine traffic can be split by cookie, by IP address, and at random. We cannot split traffic by zone.

App Engine Standard Environment.

  • default timeout setting for a Service Instance deployed to the App Engine Standard Environment is 60 s 
  • The App Engine Standard environment does not allows Instance Runtimes to be modified
  • App Engine Standard Environment does scale down to zero when not in use

 

App Engine Flexible Environment

  • Runtime modifications are allowed for instances running in the App Engine Flexible environment.

In App Engine Flex the connection to Stackdriver (i.e. agent installation and configuration) is handled automatically for you

App Engine Flexible Environment does not scale down to zero

 

Deploying and Manipulating Multiple App Engine Versions

gcloud app deploy –version 1

canary test 

gcloud app deploy –no-promote –version 2

 

 

Compute Engine 

Managed Instance Group

Unmanaged Instance Group

Unmanaged instance groups do not offer...multi-zonal support

Maximum size of Compute Engine Local Disks – 3 TB

 

Cloud Functions

-billing interval is – 100 ms

-Horizontal Scaling

– Microservices Architecture

-Cloud Functions does scale down to zero when not in use

Cloud Run

  • Uses Stateless HTTP containers
  • Scalability
  • Built on Knative

 

Cloud Storage 

-Cloud Storage allows Organizations to use CSEKs (Customer Supplied Encryption Keys).

-Data in a regional location operates in a multi-zone replicated configuration

 

*** create a bucket

$ gsutil mb -c regional -l us-east gs://ovi

$ gsutil versioning get gs://ovi

$ gsutil versioning set on gs://ovi

$ gsutil ls -a gs://ovi

$ gsutil cp <file> gs://ovi

 

ovi_p_eb632cd8@cloudshell:~ (ovi-24-565a3874)$ gsutil ls gs://ovi11
gs://ovi11/IMG_2759.jpg
gs://ovi11/IMG_2770.jpg

 

ovi@cloudshell:~ (ovi-24-565a3874)$ touch ovi_file
ovi@cloudshell:~ (ovi-24-565a3874)$ gsutil cp ovi_file gs://ovi11
Copying file://ovi_file [Content-Type=application/octet-stream]...
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.

ovi@cloudshell:~ (ovi-24-565a3874)$ gsutil ls gs://ovi11
gs://ovi11/IMG_2759.jpg
gs://ovi11/IMG_2770.jpg
gs://ovi11/ovi_file

 

Pub/Sub is a messaging service for exchanging event data among applications and services. A producer of data publishes messages to a Pub/Sub topic. A consumer creates a subscription to that topic. Subscribers either pull messages from a subscription or are configured as webhooks for push subscriptions. Every subscriber must acknowledge each message within a configurable window of time.

Cloud Pub/Sub as the messaging service to capture real time data ( ex:  IoT )
– is designed to provide reliable, many-to-many, asynchronous messaging between applications (real time IoT data capture)

-Cloud Pub/Sub is designed to handle infinitely-scalable streaming data ingest

Pub/Sub

1. Create a topic.

2. Subscribe to the topic.

3. Publish a message to the topic.

4. Receive the message.

gcloud init
gcloud pubsub topics create ovi-topic
gcloud pubsub subsriptions create –topic ovi-topic ovi-sub
gcloud pubsub topics publish ovi-topic –message “hello”
gcloud pubsub subscriptions pull –auto-ack ovi-sub

 

 

gcloud config configurations activate  — Activate an existing configuration

gcloud config list                                  — list the settings for the active configuration

App Engine is a Platform as a Service – It is a fully managed solution.

gcloud container cluster resize   — this command is used to resize a Kubernetes clusters

ex:

gcloud container clusters resize oviproject –node-pool ‘primary-node-pool’ –num-nodes 25

 

gcloud config configurations create — create and activate a new configuration

 

 

Log sinks can be exported to Cloud Pub/Sub.

 

 

Storage Option

  • Multi-Regional – Data accessed frequently with highest availability / Geo-redundant
  • Regional          –  Data accessed frequently within region / Regional, redundant across availability zones
  • Nearline           – Data accessed less than once per month  /  Regional / Store infrequently accessed content
  • Coldline            – Data accessed less than once per year  /  Regional / Archive storage, backup, Disaster recovery

 

Coldline Storage is the best choice for data that you plan to access at most once a year, due to its slightly lower availability, 90-day minimum storage duration, costs for data access, and higher per-operation costs

-Lifecycle management policies can be submitted via JSON format.

 

Cloud SQL

– Read replicas and failover replicas are charged at the same rate as stand-alone instances

-Cloud SQL for PostgreSQL does not yet support replication from an external master or external replicas for Cloud SQL instances
“This functionality is not yet supported for PostgreSQL instances

-GCP Cloud SQL provides which of the following Backup types : automated backups , on-demand backups

-Cloud SQL read replicas and failovers must be in the same region. The failover must be in a different zone in the same region.

-Cloud SQL is a relational database and not the best fit for time-series log data formats

 

Cloud Spanner

Cloud Spanner scales horizontally and serves data with low latency while maintaining transactional consistency

After you create an instance, you cannot change the configuration of that instance later

Cloud Spanner Instance Configuration can be set to which of the following Location: regional, multi-regional

Cloud Spanner is a SQL/relational database.

Cloud Spanner acts is a SQL database that is horizontally scalable for cross-region support and can host large datasets.

 

BigQuery – Calculating cost 

UI: query validator

CLI: –dry-run

REST: dryRun Property

-BigQuery is the only one of these Google products that supports an SQL interface

-BigQuery is billed based on the amount of data read. The dry-run flag is used to determine how many bytes are going to be read.

-Analytics DataWare house

-Use a BigQuery with table partitioning

-BigQuery is the best choice for data warehousing

-BigQuery does not offer low latency and millisecond response time

-The Big Query instance Labels and Display Name can be modified without any downtime

BigQuery is a serverless warehouse for analytics and supports the volume and analytics requirement

–  move large datasets directly to BigQuery, consider BigQuery Data Transfer Service, which automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis

 

Cloud Bigtable

A petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads.

  • Bigtable is priced by provisioned node
  • Bigtable does not autoscale
  • Bigtable does not store data in GCS
  • Bigtable is not made for store large objects

Use Cloud Bigtable as the storage engine for large-scale, low-latency applications as well as throughput-intensive data processing and analytics.

Apache HBase is Open Source version of Bigtable

-Each cluster is located in a single zone

-Maximum number of Clusters for a Cloud Bigtable Instance is – 4

After creating a Cloud Bigtable instance, any of the following settings can be updated without any downtime:

– The application profiles for the instance, which contain replication settings
– Upgrade a development instance to a production instance

– The number of nodes in each cluster
– The number of clusters in the instance

Cloud BigTable

– Service is ideal for Time-Series data

– ideal for applications requiring very high read/write throughput and can store Petabytes of unstructured data

– can be deployed zonal

-Bigtable is not a relational database.

-Cloud Bigtable provides the ability to isolate workloads by allowing applications to connect to specific Clusters

-Cloud Bigtable is optimized for time-series data. It is cost-efficient, highly available, and low-latency

 

Cloud Datastore

-Datastore can be queried, it’s fully managed, and is a great option for catalog based applications. Datastore also supports a basic query/filter syntax.

-Datastore is a managed NoSQL database well suited to mobile applications

– Cloud Datastore queries can deliver their results at either of two consistency levels:

-Strongly consistent queries guarantee the freshest results, but may take longer to complete.
-Eventually consistent queries generally run faster, but may occasionally return stale results.

-You can store your Datastore mode data in either a multi-region location or a regional location

 

Cloud Firestore is the next generation of Cloud Datastore

Firestore
Easily develop rich applications using a fully managed, scalable, and serverless document database

Cloud Dataflow – service of processing large volume of data

  • Cloud Dataflow provides you with a place to run Apache Beam based jobs, on GCP
  • Cloud Dataflow provides for both streaming and batch pipelines
  • uses cases

( Serverless ETL, processing data from IoT Devices, processing Data from POS systems)

– a fully managed ETL/ELT service for transforming, transporting, and enriching data

– Dataflow is built on top of Apache Beam and is ideal for new, cloud-native batch and streaming data processing

 

Cloud Dataproc – o handle existing Hadoop/Spark jobs. ( Use it to replace existing hadoop infra.)

Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.

Cloud Dataproc can leverage Preemptive Compute Engine VMs

Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.

Dataproc is for managed Hadoop/Spark workflows

Cloud Dataproc has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Stackdriver Logging, and Stackdriver Monitoring, so you have more than just a Spark or Hadoop cluster, you have a complete data platform”

Cloud Dataproc and Cloud Dataflow can both be used for data processing, and there’s overlap in their batch and streaming capabilities

 

Cloud Dataparse

 

Cloud Composer

Cloud Composer is a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers

A fully managed workflow orchestration service built on Apache Airflow.

 

Preemtible instances are short lived instances ( 24 hours maxim )

-A static website can be hosted with cloud storage for very little money.

 

 

Cloud Functions 

billing interval for Cloud Functions is 100 ms

Apigee – Design, Secure, Publish, Analyze, Monitor, and Monetize APIs

Cloud Functions support : Go,node,python

 

Cloud Datastudio ( similar to Tableau, Power BI )

Data Studio is able to easily create useful charts from live BigQuery data to get insight.

Security 

Cloud Audit Log

GCP Service maintains logs for each GCP Project, Folder, and Organization

 

Cloud Security Scanner

 

Cloud Armour 

works with Global HTTP(S) Load Balancers to deliver defense against DDoS (Distributed Denial of Service) attacks

Data Loss Prevention API

-Use the Data Loss Prevention API to automatically detect and redact sensitive data

-Fully managed service designed to help you discover, classify, and protect your most sensitive data

Trusted Platform Module (TPM)

 

Cloud Code 

– provides everything you need to write, debug, and deploy Kubernetes applications

Cloud Source 

–  is a GCP Service that is used for Code Version Control

Cloud TPU 

GCP Service provides a custom-designed family of ASIC (Application-Specific Integrated Circuit) hardware accelerators, which are specifically for machine learning

 

Cloud Datafusion ( similar to Cloud Dataflow)

 

Cloud Data Catalog 

provides Organizations with a central location to discover, manage, and understand all their data in the Google Cloud

Cloud Memorystore

 

Cloud IoT Core 

provides the ability to securely connect, manage, and ingest data from globally dispersed devices

MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks.

Cloud Firestore is the next generation of Cloud Datastore

Cloud Build 

Cloud Source 

Cloud Dataprep 

  • provides features to visually explore, scrub, clean, and prepare structured and unstructured data
  • Dataprep cleans data in a web interface format using data from Cloud Storage or Bigquery.
  • Dataprep is a UI driven data preparation service that runs on top of Cloud Dataflow

 

Cloud Datalab

-is a data exploration tool which provides an intuitive notebook format to combine code, results, and visualizations

– is most useful for Data Scientists

 

StackDriver 

-Once logs are past their retention period and are deleted, they are permanently gone. Export logs to Cloud Storage or BigQuery for long-term retention

-Performance statistics would be best served viewing in Stackdriver Monitoring using custom metrics.

Stackdriver has an integrated service to export logs for Analysis to: BigQuery, Pub/Sub, Storage

 

Cloud endpoints 

-GCP Service provides API Management by using either Frameworks for App Engine, OAS (OpenAPI Specification), or gRPC

-Develop, deploy, protect, and monitor your APIs with Cloud Endpoint

Apigee

provides the ability to Design, Secure, Publish, Analyze, Monitor, and Monetize APIs?

 

Deployment manager 

gsutil -m cp -r gs://ovi/deployment-manager/* .

gcloud deployment-manager deployments create my-vm –config vm-web.yaml

gcloud deployment-manager deployments create vpcs –config vpc-dependencies.yaml

gcloud deployment-manager deployments describe vpcs

gcloud deployment-manager deployments delete vpcs

 

Machine Types:
General-purpose: n1
n1-standard
n1-highcpu
n1-highmem

Compute-optimized: c2
c2-standard

Memory-optimized: n1, m2
n1-ultramem
n1-megamem
m2-ultramem

Shared-core:
f1-micro
g1-small

to initialize gcloud: simply `gcloud init` and follow the prompts. And this also configures `gsutil` and `bq`.

 

Networking 

VPC

GCP VPC are global

-GCP Resources within a single VPC Subnet must be within same region (Subnets are regional resources)

-VPC network peering provides cross-project VPC communication within the same or different organizations

-VPC Network Peering and Shared VPC are methods for connecting two GCP VPC, not for connecting an On-Prem network to GCP Cloud Services

Shared VPC ( two main components )

  • Host Project
  • Service Project

Billing for resources that participate in a Shared VPC network is attributed to the service project where the resource is located

– VPC Network Peering is only between two Google Cloud

  • Each Cloud VPN tunnel can support up to 3 Gbps. Actual bandwidth depends on several factors

 

Direct Peering exists outside of Google Cloud Platform

-(Direct Peering) can be used by GCP, but does not require it.

-Direct Peering can be used for G Suite Platform, existing outside of GCP

-you can’t use Google Cloud VPN in combination with Dedicated Interconnect, but you can use your own VPN solution.”

-you can’t use Google Cloud VPN in combination with Partner Interconnect, but you can use your own VPN solution.”

 

Dedicated Interconnect

  • find a collocation facility
  • Connect On-premise to Collocation
  • Order LOA-CFA ( Letter of Authorization and Connecting Facility Assignment )

 

Partner Interconnect 

 

Cloud VPN 

 

 

Cloud load balancer  

Global HTTP(s) – Cloud Load Balancer offers cookie-based Session Affinity

Global HTTP(S)can be configured for use as a CDN (Content Delivery Network)?

Global SSL proxy – type of Cloud Load Balancer is intended for Global SSL Encrypted Traffic that is not HTTP(S)

Global TCP proxy – type of Cloud Load Balancer is intended for Global Traffic that is not HTTP(S) and not SSL Encrypted

Global HTTP(S) –  type of Cloud Load Balancer is intended to provide Global URL Routing
The HTTP(S) load balancer in GCP handles WebSocket traffic natively. Backends that use WebSocket to communicate with clients can use the HTTP(S) load balancer as a front end for scale and availability.
Network load balancers only distribute traffic to a single region. For global load balancing to multiple regions, use an HTTP load balancer or a TCP/SSL Proxy load balancer option
– Network Load Balancers are not proxies
– responses go directly to clients – direct server return
– Source IP address not modified – The LB preservers the source IP addresses of packets
-Network tags allow more granular access based on individually tagged instances.
LOGS/Monitoring 
GCP projects store logs in:
-Default bucket
-Required bucket
The default retention period for log stored in default bucket is 30 days 
Storage Transfer Service
Transfer Appliance
Transfer Appliance is a high-capacity storage device that enables you to transfer and securely ship your data to a Google upload facility, where we upload your data to Google Cloud Storage. For Transfer Appliance capacities and requirements, see Specifications.

Other data transfer options

Cases Study

TerramEarth

  • Cloud IoT Core
  • Cloud Dataflow
  • Cloud BigQuery
  • Cloud ML Engine
  • Cloud Datalab
  • Datastudio

 

signed URL 

  • Allows timed access with a URL link.
  • Allows someone object access without requiring them to have a GCP account.

Security

-Forseti security

https://forsetisecurity.org/

 

 

Leave a comment