Zones, Regions, Dual-Regions, and Multi-Regions
– Subnets are regional resources
– Because subnets are regional objects, the region you select for a resource determines the subnets it can use.”
– multi-regions and dual-regions are geo-redundant
- Regions are independent geographic areas that consist of zones
- A dual-region is a specific pair of regions
-Cloud KMS resources can be created in the following dual-regional locations
-A dual-region is a specific pair of regions
-Objects stored in a multi-region or dual-region are geo-redundant
– Data that is geo-redundant is stored redundantly in at least two separate geographic places separated by at least 100 miles
-Geo-redundancy occurs asynchronously
Currently, nam4 and eur4 are the only Dual-Regions available.
-Geo-redundancy occurs asynchronously
-Data that is geo-redundant is stored redundantly in at least two separate geographic places separated by at least 100 miles
A GCP organization’s combined IAM policy at any level of the Cloud Resource Hierarchy is a combination of the policies at that level, plus any policies inherited from higher levels.
Cloud Spanner – Global replication of relational data
BigQuery dataset, Location Types are available : Regional and Multi Regional
Billing
Billing accounts can contain billing sub accounts
Billing Account are connected to Payments Profile
Billing Account user – Link projects to billing accounts
Export billing options
Export Cloud Billing to :
- BigQuery
- Cloud Storage
Billing for resources that participate in a Shared VPC network is attributed to the service project where the resource is located
Cloud IAM
An Organization contains one or more folders. A Folder contains one or more Projects . A Project contains one or more Resources.
-A Role is a collection of permissions
-An IAM Policy object consists of a list of bindings
- Projects can contain resources in different Region
- Projects are configured with default Region and Zone
- You don’t assign permissions to users directly. Instead, you assign them a Role which contains one or more permissions
- Members can be of the following types: Google account, Service account, Google group, G Suite domain, Cloud Identity domain
- A Binding binds a list of members to a role.”
Each GCP project can contain only a single App Engine application, and once created you cannot change the location of your App Engine application
MFA stands for Multi-Factor Authentication, and it is a best practice to use this to secure accounts.
IAM roles can be assigned per bucket.
–Predefined roles are granular and assigned to the service level for much more fine-tuned access
–Primitive roles are broad, project-wide roles assigned to the project level.
Cloud Identity and GSuite are the two ways to centrally manage Google accounts.
KMS – stand for in Cloud KMS – Key management Services
Service Accounts
–Resources not hosted on GCP should use a custom service account key for authentication.
Cloud SDK
The gcloud alpha and gcloud beta commands are two groups of additional Cloud SDK commands that you can install for the gcloud component.
-Maximum Size of a Cloud Storage Bucket – unlimited
-Cloud Storage offers unlimited object storage and individual objects can be as large as 5TB
-Versioning can be enabled on a Cloud Storage Bucket.
gsutil – This is a Cloud SDK component used to interact with Cloud Storage.
to view which project is default, run gcloud config list command. This will list properties for the active configuration, including the default project.
$ gcloud compute instances list
$ gcloud compute ssh
$ gcloud compute ssh ovi@server –dry-run
** *** Snapshot ***
$ gcloud compute snapshots list
$ gcloud compute disks list
$ gcloud compute disks snapshot development-server
$ gcloud compute images list
$ gcloud container clusters list
$ gcloud config list
$ gcloud app versions list
gcloud config configurations create
gcloud config configurations activate
gcloud config set project [ Project_ID]
gcloud logging read “login_name”
gcloud logging read “login_name” –limit 15
DISK
Create disk:
gcloud compute disks create (DISK_NAME) –type=(DISK_TYPE0 –size=(SIZE) –zone=(ZONE)
gcloud compute disks create disk-1 –size=50GB –zone=us-east1-b
Resize disk:
gcloud compute disks resize (disk_name)–size=(size) –zone=(zone)
gcloud compute disks resize disk-1 –size=150 –zone=us-east1-b
Attach disk:
gcloud compute instances attach-disk instance –disk=(disk_name) –zone=(zone)
snapshot
gcloud compute disks snapshot web1 –snapshot-names web1-backup-v1 –zone us-central1-a
gcloud compute snapshots list
gcloud compute snapshots describe web1-backup-v1
-persistent disks will not be deleted when an instance is stopped.
– persistent disk performance is based on the total persistent disk capacity attached to an instance and the number of vCPUs that the instance has. Incrementing the persistent disk capacity will increment its throughput and IOPS
Video for reference: Installing the Cloud SDK
View default cloud configuration
gcloud config list
gcloud container clusters get-credentials —> to authenticate and configure kubectl
Preemptible Virtual Machines
Affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads.
Go to console
Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours”
// ENABLE PREEMPTIBLE OPTION
gcloud compute instances create my-vm --zone us-central1-b --preemptible
App Engine
- web based workloads, high availability, no ops
Flexible environments are able to use a Dockerfile to create custom runtimes
-App Engine is regional
-App Engine traffic can be split by cookie, by IP address, and at random. We cannot split traffic by zone.
App Engine Standard Environment.
- default timeout setting for a Service Instance deployed to the App Engine Standard Environment is 60 s
- The App Engine Standard environment does not allows Instance Runtimes to be modified
- App Engine Standard Environment does scale down to zero when not in use
App Engine Flexible Environment
- Runtime modifications are allowed for instances running in the App Engine Flexible environment.
In App Engine Flex the connection to Stackdriver (i.e. agent installation and configuration) is handled automatically for you
App Engine Flexible Environment does not scale down to zero
Deploying and Manipulating Multiple App Engine Versions
gcloud app deploy –version 1
canary test
gcloud app deploy –no-promote –version 2
Compute Engine
Managed Instance Group
Unmanaged Instance Group
Unmanaged instance groups do not offer...multi-zonal support
Maximum size of Compute Engine Local Disks – 3 TB
Cloud Functions
-billing interval is – 100 ms
-Horizontal Scaling
– Microservices Architecture
-Cloud Functions does scale down to zero when not in use
Cloud Run
- Uses Stateless HTTP containers
- Scalability
- Built on Knative
Cloud Storage
-Cloud Storage allows Organizations to use CSEKs (Customer Supplied Encryption Keys).
-Data in a regional location operates in a multi-zone replicated configuration
*** create a bucket
$ gsutil mb -c regional -l us-east gs://ovi
$ gsutil versioning get gs://ovi
$ gsutil versioning set on gs://ovi
$ gsutil ls -a gs://ovi
$ gsutil cp <file> gs://ovi
ovi_p_eb632cd8@cloudshell:~ (ovi-24-565a3874)$ gsutil ls gs://ovi11 gs://ovi11/IMG_2759.jpg gs://ovi11/IMG_2770.jpg
ovi@cloudshell:~ (ovi-24-565a3874)$ touch ovi_file ovi@cloudshell:~ (ovi-24-565a3874)$ gsutil cp ovi_file gs://ovi11 Copying file://ovi_file [Content-Type=application/octet-stream]... / [1 files][ 0.0 B/ 0.0 B] Operation completed over 1 objects. ovi@cloudshell:~ (ovi-24-565a3874)$ gsutil ls gs://ovi11 gs://ovi11/IMG_2759.jpg gs://ovi11/IMG_2770.jpg gs://ovi11/ovi_file
Pub/Sub is a messaging service for exchanging event data among applications and services. A producer of data publishes messages to a Pub/Sub topic. A consumer creates a subscription to that topic. Subscribers either pull messages from a subscription or are configured as webhooks for push subscriptions. Every subscriber must acknowledge each message within a configurable window of time.
Cloud Pub/Sub as the messaging service to capture real time data ( ex: IoT )
– is designed to provide reliable, many-to-many, asynchronous messaging between applications (real time IoT data capture)
-Cloud Pub/Sub is designed to handle infinitely-scalable streaming data ingest
Pub/Sub
1. Create a topic.
2. Subscribe to the topic.
3. Publish a message to the topic.
4. Receive the message.
gcloud init
gcloud pubsub topics create ovi-topic
gcloud pubsub subsriptions create –topic ovi-topic ovi-sub
gcloud pubsub topics publish ovi-topic –message “hello”
gcloud pubsub subscriptions pull –auto-ack ovi-sub
gcloud config configurations activate — Activate an existing configuration
gcloud config list — list the settings for the active configuration
App Engine is a Platform as a Service – It is a fully managed solution.
gcloud container cluster resize — this command is used to resize a Kubernetes clusters
ex:
gcloud container clusters resize oviproject –node-pool ‘primary-node-pool’ –num-nodes 25
gcloud config configurations create — create and activate a new configuration
Log sinks can be exported to Cloud Pub/Sub.
Storage Option
- Multi-Regional – Data accessed frequently with highest availability / Geo-redundant
- Regional – Data accessed frequently within region / Regional, redundant across availability zones
- Nearline – Data accessed less than once per month / Regional / Store infrequently accessed content
- Coldline – Data accessed less than once per year / Regional / Archive storage, backup, Disaster recovery
Coldline Storage is the best choice for data that you plan to access at most once a year, due to its slightly lower availability, 90-day minimum storage duration, costs for data access, and higher per-operation costs
-Lifecycle management policies can be submitted via JSON format.
Cloud SQL
– Read replicas and failover replicas are charged at the same rate as stand-alone instances
-Cloud SQL for PostgreSQL does not yet support replication from an external master or external replicas for Cloud SQL instances
“This functionality is not yet supported for PostgreSQL instances
-GCP Cloud SQL provides which of the following Backup types : automated backups , on-demand backups
-Cloud SQL read replicas and failovers must be in the same region. The failover must be in a different zone in the same region.
-Cloud SQL is a relational database and not the best fit for time-series log data formats
Cloud Spanner
Cloud Spanner scales horizontally and serves data with low latency while maintaining transactional consistency
After you create an instance, you cannot change the configuration of that instance later
Cloud Spanner Instance Configuration can be set to which of the following Location: regional, multi-regional
Cloud Spanner is a SQL/relational database.
Cloud Spanner acts is a SQL database that is horizontally scalable for cross-region support and can host large datasets.
BigQuery – Calculating cost
UI: query validator
CLI: –dry-run
REST: dryRun Property
-BigQuery is the only one of these Google products that supports an SQL interface
-BigQuery is billed based on the amount of data read. The dry-run flag is used to determine how many bytes are going to be read.
-Analytics DataWare house
-Use a BigQuery with table partitioning
-BigQuery is the best choice for data warehousing
-BigQuery does not offer low latency and millisecond response time
-The Big Query instance Labels and Display Name can be modified without any downtime
–BigQuery is a serverless warehouse for analytics and supports the volume and analytics requirement
– move large datasets directly to BigQuery, consider BigQuery Data Transfer Service, which automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis
Cloud Bigtable
A petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads.
- Bigtable is priced by provisioned node
- Bigtable does not autoscale
- Bigtable does not store data in GCS
- Bigtable is not made for store large objects
Use Cloud Bigtable as the storage engine for large-scale, low-latency applications as well as throughput-intensive data processing and analytics.
Apache HBase is Open Source version of Bigtable
-Each cluster is located in a single zone
-Maximum number of Clusters for a Cloud Bigtable Instance is – 4
After creating a Cloud Bigtable instance, any of the following settings can be updated without any downtime:
– The application profiles for the instance, which contain replication settings
– Upgrade a development instance to a production instance
– The number of nodes in each cluster
– The number of clusters in the instance
Cloud BigTable
– Service is ideal for Time-Series data
– ideal for applications requiring very high read/write throughput and can store Petabytes of unstructured data
– can be deployed zonal
-Bigtable is not a relational database.
-Cloud Bigtable provides the ability to isolate workloads by allowing applications to connect to specific Clusters
-Cloud Bigtable is optimized for time-series data. It is cost-efficient, highly available, and low-latency
Cloud Datastore
-Datastore can be queried, it’s fully managed, and is a great option for catalog based applications. Datastore also supports a basic query/filter syntax.
-Datastore is a managed NoSQL database well suited to mobile applications
– Cloud Datastore queries can deliver their results at either of two consistency levels:
-Strongly consistent queries guarantee the freshest results, but may take longer to complete.
-Eventually consistent queries generally run faster, but may occasionally return stale results.
-You can store your Datastore mode data in either a multi-region location or a regional location
Cloud Firestore is the next generation of Cloud Datastore
Firestore
Easily develop rich applications using a fully managed, scalable, and serverless document database
Cloud Dataflow – service of processing large volume of data
- Cloud Dataflow provides you with a place to run Apache Beam based jobs, on GCP
- Cloud Dataflow provides for both streaming and batch pipelines
- uses cases
( Serverless ETL, processing data from IoT Devices, processing Data from POS systems)
– a fully managed ETL/ELT service for transforming, transporting, and enriching data
– Dataflow is built on top of Apache Beam and is ideal for new, cloud-native batch and streaming data processing
Cloud Dataproc – o handle existing Hadoop/Spark jobs. ( Use it to replace existing hadoop infra.)
Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.
Cloud Dataproc can leverage Preemptive Compute Engine VMs
Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
Dataproc is for managed Hadoop/Spark workflows
–Cloud Dataproc has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Stackdriver Logging, and Stackdriver Monitoring, so you have more than just a Spark or Hadoop cluster, you have a complete data platform”
Cloud Dataproc and Cloud Dataflow can both be used for data processing, and there’s overlap in their batch and streaming capabilities
Cloud Dataparse
Cloud Composer
Cloud Composer is a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers
A fully managed workflow orchestration service built on Apache Airflow.
Preemtible instances are short lived instances ( 24 hours maxim )
-A static website can be hosted with cloud storage for very little money.
Cloud Functions
billing interval for Cloud Functions is 100 ms
Apigee – Design, Secure, Publish, Analyze, Monitor, and Monetize APIs
Cloud Functions support : Go,node,python
Cloud Datastudio ( similar to Tableau, Power BI )
Data Studio is able to easily create useful charts from live BigQuery data to get insight.
Security
Cloud Audit Log
GCP Service maintains logs for each GCP Project, Folder, and Organization
Cloud Security Scanner
Cloud Armour
works with Global HTTP(S) Load Balancers to deliver defense against DDoS (Distributed Denial of Service) attacks
Data Loss Prevention API
-Use the Data Loss Prevention API to automatically detect and redact sensitive data
-Fully managed service designed to help you discover, classify, and protect your most sensitive data
Trusted Platform Module (TPM)
Cloud Code
– provides everything you need to write, debug, and deploy Kubernetes applications
Cloud Source
– is a GCP Service that is used for Code Version Control
Cloud TPU
GCP Service provides a custom-designed family of ASIC (Application-Specific Integrated Circuit) hardware accelerators, which are specifically for machine learning
Cloud Datafusion ( similar to Cloud Dataflow)
Cloud Data Catalog
provides Organizations with a central location to discover, manage, and understand all their data in the Google Cloud
Cloud Memorystore
Cloud IoT Core
provides the ability to securely connect, manage, and ingest data from globally dispersed devices
MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks.
Cloud Firestore is the next generation of Cloud Datastore
Cloud Build
Cloud Source
Cloud Dataprep
- provides features to visually explore, scrub, clean, and prepare structured and unstructured data
- Dataprep cleans data in a web interface format using data from Cloud Storage or Bigquery.
- Dataprep is a UI driven data preparation service that runs on top of Cloud Dataflow
Cloud Datalab
-is a data exploration tool which provides an intuitive notebook format to combine code, results, and visualizations
– is most useful for Data Scientists
StackDriver
-Once logs are past their retention period and are deleted, they are permanently gone. Export logs to Cloud Storage or BigQuery for long-term retention
-Performance statistics would be best served viewing in Stackdriver Monitoring using custom metrics.
Stackdriver has an integrated service to export logs for Analysis to: BigQuery, Pub/Sub, Storage
Cloud endpoints
-GCP Service provides API Management by using either Frameworks for App Engine, OAS (OpenAPI Specification), or gRPC
-Develop, deploy, protect, and monitor your APIs with Cloud Endpoint
Apigee
provides the ability to Design, Secure, Publish, Analyze, Monitor, and Monetize APIs?
Deployment manager
gsutil -m cp -r gs://ovi/deployment-manager/* .
gcloud deployment-manager deployments create my-vm –config vm-web.yaml
gcloud deployment-manager deployments create vpcs –config vpc-dependencies.yaml
gcloud deployment-manager deployments describe vpcs
gcloud deployment-manager deployments delete vpcs
Machine Types:
General-purpose: n1
n1-standard
n1-highcpu
n1-highmem
Compute-optimized: c2
c2-standard
Memory-optimized: n1, m2
n1-ultramem
n1-megamem
m2-ultramem
Shared-core:
f1-micro
g1-small
to initialize gcloud: simply `gcloud init` and follow the prompts. And this also configures `gsutil` and `bq`.
Networking
VPC
–GCP VPC are global
-GCP Resources within a single VPC Subnet must be within same region (Subnets are regional resources)
-VPC network peering provides cross-project VPC communication within the same or different organizations
-VPC Network Peering and Shared VPC are methods for connecting two GCP VPC, not for connecting an On-Prem network to GCP Cloud Services
Shared VPC ( two main components )
- Host Project
- Service Project
Billing for resources that participate in a Shared VPC network is attributed to the service project where the resource is located
– VPC Network Peering is only between two Google Cloud
- Each Cloud VPN tunnel can support up to 3 Gbps. Actual bandwidth depends on several factors
– Direct Peering exists outside of Google Cloud Platform
-(Direct Peering) can be used by GCP, but does not require it.
-Direct Peering can be used for G Suite Platform, existing outside of GCP
-you can’t use Google Cloud VPN in combination with Dedicated Interconnect, but you can use your own VPN solution.”
-you can’t use Google Cloud VPN in combination with Partner Interconnect, but you can use your own VPN solution.”
Dedicated Interconnect
- find a collocation facility
- Connect On-premise to Collocation
- Order LOA-CFA ( Letter of Authorization and Connecting Facility Assignment )
Partner Interconnect
Cloud VPN
Cloud load balancer
Global HTTP(s) – Cloud Load Balancer offers cookie-based Session Affinity
Global HTTP(S) – can be configured for use as a CDN (Content Delivery Network)?
Global SSL proxy – type of Cloud Load Balancer is intended for Global SSL Encrypted Traffic that is not HTTP(S)
Global TCP proxy – type of Cloud Load Balancer is intended for Global Traffic that is not HTTP(S) and not SSL Encrypted
Other data transfer options
- Cloud Storage Transfer Service: Quickly imports online data into Google Cloud Storage.
- Google BigQuery Data Transfer Service: Automates data movement from Software as a Service (SaaS) applications such as Google Ads and Google Ad Manager on a scheduled, managed basis.
Cases Study
TerramEarth
- Cloud IoT Core
- Cloud Dataflow
- Cloud BigQuery
- Cloud ML Engine
- Cloud Datalab
- Datastudio
signed URL
- Allows timed access with a URL link.
- Allows someone object access without requiring them to have a GCP account.
Security
-Forseti security