This is the note of the course Cloud Engineering with GCP . Go to this article to see how I prepared for GCP ACE exam.
Mind Map
- GCP
- Network Resources
- VPC
- Network
- Subnet
- IP
- Load Balancer
- HTTP(S)
- SSL Proxy
- TCP Proxy
- Regional
- Multi-Regional
- VPN & Peering & Interconnect
- VPC
- Compute Services
- Compute Engine (GCE)
- Disk type
- persistent
- HDD/SSD
- Local SSD
- persistent
- Instance template
- Instance group
- Disk type
- App Engine (GAE)
- Standard
- Flexible
- Kubernetes Engine (GKE)
- Cloud Function
- Compute Engine (GCE)
- Storage
- Data warehouse
- BigQuery
- Database
- NoSQL
- BigTable
- Cloud Firestore
- Datastore mode
- Native mode
- Relational
- Cloud SQL
- Cloud Spanner
- Redis/Memcache
- Cloud Memorystore
- NoSQL
- Data warehouse
- SDK
- Stackdriver
- Monitoring
- Logging
- Debug
- Error reporting
- Trace
- Big data services
- BigQuery
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- Cloud Datalab
- Network Resources
SDK
- Cloud SDK installation and quick start
- gcloud tool guide
- <code>gcloud</code> command
- <code>gsutil</code> command
- <code>bq</code> command
Resource Management
- Resource Hierarchy
- Global
- Images
- Snapshots
- Networks
- Regional
- External IP addr.
- Zonal
- Instances
- Disks
- Global
- label vs tag
label tag user-defined kv pair user-defined string propagated through billing firewall rules
Billing
- Docs - Billing
- Pricing Calculator
- Four tools
- Budget & Alerts
- Billing export -> to BQ or Cloud Storage
- Reports
- Quotas
- types
- Rate Quotas: e.g. GKE API: 1000 requests per 100 sec
- Allocation Quotas: e.g. 5 networks per project
- can be canged on quota page
- types
GCP Hiearchy
- Organization (-Folder (- Folder)) - Project - Resource
- Policies are inherited downwards
Project ID | Project name | Project number |
---|---|---|
globally unique | not need to be unique | globally unique |
user defined | user defined | assigned by GCP |
immutable | mutable | immutable |
# set default project
gcloud config set project <project id>
IAM
Who
can do what
on which resource
- 3 types of IAM Roles:
- Primitive
- Owner: invite/remove members, delete project
- Editor: deploy/modified/configure
- Viewer: read-only
- Billing Administrator: control billing but not affect projects
- Predefined
- Custom
- Primitive
- Services Account
- Authentication for services
- Use email addr. & cryptographic keys
- commands
# add IAM policy binding for a project gcloud projects add-iam-policy-binding --member=(user|group|serviceAccount:email) --role=ROLE
- IAM policy (doc
)
# sample IAM policy { "bindings": [ { "role": "roles/resourcemanager.organizationAdmin", "members": [ "user:[email protected]", "group:[email protected]", "domain:google.com", "serviceAccount:[email protected]" ] }, { "role": "roles/resourcemanager.organizationViewer", "members": [ "user:[email protected]" ], "condition": { "title": "expirable access", "description": "Does not grant access after Sep 2020", "expression": "request.time < timestamp('2020-10-01T00:00:00.000Z')", } } ], "etag": "BwWWja0YfJA=", "version": 3 }
- audit log: logs of Admin Activity, Data Access, and System Event
# read audit log, `project` can be changed to `folder` or `organization` gcloud logging read "logName : projects/project-id/logs/cloudaudit.googleapis.com" --project=project-id
- Resource manager roles
- organization
- Admin: full control over all resources
- Viewer: view access to all resources
- folder
- Admin: full control over folders
- Creator: browse hierarchy and create folder
- Viewer: view folders and projects below a resources
- project
- Creator: create projects
- Deleter: delete projects
- organization
- Docs
- Docs - Understanding IAM roles
- Best Practices
- Google Cloud Identity
- Understanding Service Accounts
- Cloud Audit Logging overview
- Security and Identity Fundamentals Quest
Compute Resources
- Brief Comparison: App Engine, Compute Engine, GKE
- Related Blogpost from GCP
- Product page - hosting options
GCE | GKE | GAE-standard | GAE-flexible | Cloud Functions | |
---|---|---|---|---|---|
Lang | Any | Any | Python, Node.js, Go, Java, PHP | Python, Node.js, Go, Java, PHP, Ruby, .NET, Custom Runtime | Python, Node.js, Go |
Scaling | server autoscaling | cluster | autoscaling managed servers | autoscaling managed servers | serverless |
Use case | general | container workloads | scalable web app, mobile backend app | scalable web app, mobile backend app | light-weightevent actions |
App Engine (GAE)
Standard | Flexible | |
---|---|---|
Language | Java, Python, Go, PHP | any |
Instance startup | milli-secs | minutes |
SSH | N | Y |
3rd-party lib | N | Y |
Pricing model | after free daily use, pay per instance class, w/ auto-shutdown | Pay for resource alloc. per hour, no auto-shutdown |
other | all requests timeout is 60 sec |
# migrate the service to new version gcloud app versions migrate v2 --service="s1" # split traffic evenly between 'v1' and 'v2' of service 's1', run: gcloud app services set-traffic s1 --splits v2=.5,v1=.5 # deploy but send no traffic gcloud app deploy --no-promote
Compute Engine (GCE)
- Docs
- Features
- machine types
- predefined
standard high-memory high-CPU memory-optimized compute-optimized shared-core name n1-standard-<N>
n1-highmem-<N>
n1-highcpu-<N>
n1-ultramem-<N>
c2-standard-<N>
f1-micro
,g1-small
mem/vCPU ratio 3.75GB 6.5GB 0.9GB ~24GB 4GB new platform for small intensive app - (
N
indicates # of vCPUs)
- custom
- number of vCPU = $2^n$ ($n = 0, 1, 2,…$)
- memory must in (0.9, 6.5) GB per vCPU
- total memory must be multiple of 256MB
- extend memory -> provide higher memory/vCPU ratio with additional cost
- predefined
- disk:
- persistent: up to 128 disks x 0.5TB (64TB in total)
- HDD (standard) & SSD
- resize while running
- encryption keys
- snapshot for backup (doc )
- local SSD: up to 8disk x 375GB (3TB in total)
HDD SSD local SSD RAM Disk data redundency Y Y N N encryption at rest Y Y Y N/A snapshotting Y Y N N bootable Y Y N N use case general, bulk file store random IOPS high IOPS, low latency low latency & risk of data loss
- persistent: up to 128 disks x 0.5TB (64TB in total)
- Billing
- per-second billing (with minimum of 1-min cost)
- each vCPU & GB of memory is billed separately
- discount
- sustained use (monthly-based)
- committed use (1-3 years)
- preemptible
- cost reduced up to 80%
- live up to 24 hour
- no auto-restart
# create an instance
gcloud compute instances create --machine-type=<machine-type> <instance_name>
# move an instance to another zone within same region
gcloud compute instances move <instance_name>
- move instance to another region
- make snapshot of persistent disk
- create new persistent disk in target zones by restoring from snapshot
- create VMs in target zones
- assign static IP, update the VM’s references, and then delete the original VM
Instance Group
- Docs - Instance Group
- Docs - Instance Template
- Docs - Create VM from Template
- Docs - Create Groups of Managed Instances
Autoscaling
Kubernetes Engine (GKE)
# create cluster
gcloud container clusters create <name>
# Add node to node pool (scale)
gcloud container clusters resize --num-nodes=3
# scale a deployment
kubectl scale <service> --replica=3
# auto scale
kubectl autoscale <service> --min=10 --max=15 --cpu=80
Storage
Cloud Storage (GCS)
Fully-managed scalable -> no manually provisioning is needed
Not a filesystem
Always encrypted with HTTPS
gsutil
is frequently usedCan enable versioning
gsutil versioning set (on|off) gs://<bucket_name>
Data transfer
ACL (Access Control Lists)
- e.g.
allUsers
allAuthenticatedUsers
# set ACL gsutil acl set private gs://bucket # Change ACL for all users to have read access (O for owner, W for write, R for read) gsutil acl ch -u AllUsers:R gs://example-bucket/example-object
- e.g.
signed URL
# duration = 10 minutes gsutil signurl -d 10m /path/to/key gs://bucket/object
Storage Class
- Docs - Storage Classes
- Regional or Multi-regional can be changed to Nearline or Coldline
- Regional cannot be changed to Multi-regional and vice versa
Multi-Regional | Regional | Nearline | Coldline | |
---|---|---|---|---|
Data that is | Frequently access | Freqenctly access within region | Access less than once per month | Access less than once per year |
SLA | 99.95% | 99.90% | 99% | 99% |
Storage Price | $\star\star\star\star$ | $\star\star\star$ | $\star\star$ | $\star$ |
Retrieval Price | $\star$ | $\star$ | $\star\star\star$ | $\star\star\star$ |
Use cases | Content storage & delivery | In-region analytics, GCE/GKE-related data | Long-tail content, backup | Archiving, disaster recovery |
minimum duration | - | - | 30 days | 90 days |
Lifecycle
- Docs - Lifecycle
- e.g.
- delete objects older than XXX days
- delete objects created before <Date>
- keep N latest version only
gsutil lifecycle set <config-json-file> gs://bucket
- config file example
# Move object from multi regional to COLDLINE in 30 days and delete after 300 days after creation { "lifecycle": { "rule": [ { "action": { "type": "SetStorageClass", "storageClass": "COLDLINE" }, { "condition": { "age": 30, "matchesStorageClass": ["MULTI_REGIONAL", "STANDARD", "DURABLE_REDUCED_AVAILABILITY"] } } }, { "action": { "type": "Delete" }, "condition": { "age": 270, "storageClass": "COLDLINE" } } ] } }
Database Services
- Data storage comparison
Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL Document NoSQL wide-column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Transaction Y Single-row N Y Y N Complex query N N N Y Y Y Capacity TB+ Pb+ Pb+ TB Pb Pb+ Unit Size 1Mb/entity ~10Mb/cell, ~100Mb/row 5TB/object determined by engine 10240MiB/row 10Mb/row Best for Semi-structured app data analytical data, heavy read/write Web app Large-scale app interactive querying, offline analytics Use Case App IoT, AddTech Images, media files High I/O, global consistency is needed Data warehouse SQL-like query, free daily quota Same API with HBase, user can increase # of instance
- Decision chart
- data structured?
- No -> Cloud Storage
- Yes -> Analytic workload?
- Yes -> need updates or low latency
- Yes -> BigTable
- No -> BigQuery
- No -> Relational data?
- No -> Cloud Firestore
- Yes -> Need horizontal scale
- Yes -> Cloud Spanner
- No -> Cloud SQL
- Yes -> need updates or low latency
- data structured?
Bigtable
- Feats
- scale to PB
- Low latency (sub 10ms)
- Auto update index for frequently accessed data -> balance workload between nodes
Cloud Spanner
- Feats
- scale to PB
- Strong consistency
- HA
Cloud SQL
- MySQL/PostgreSQL
- Auto replication
- Vertical scaling (machine type)
- (weak) Horizontal scaling (# of instaces)
- <code>gcloud sql</code>
Cloud Firestore
- 2 modes:
- Datastore mode -> server
- Native mode -> mobile & web app
Network Resources
VPC
- Docs
- Contained in a project
- VPC has global scope but subnets are regional (and cross zones)
- Important features
- built-in route table
- built-in global distributed firewall
Netowrks
- Default quota: 5 networks in a project
- Has no IP range
- Global and span all regions
- Contains subnetworks
- Types:
- Default
- one subnet per region
- default firewall rules: allow all ingress from ICMP, RDP & SSH and all internal traffic
- auto-mode
- one subnet per region
- regional IP alloc. (cannot overlapped with other subnets in the same network)
- /20 mask (can be expanded up to /16)
- all in 10.128.0.0/9
- default firewall rules: deny all ingress, allow all egress
- custom-mode
- no default subnets
- full control of IP ranges
- regional IP alloc.
- Default
- Conversion
- OK: default/auto -> custom
- No: custom -> default/auto
- Communication
- VMs in same network: using internal IP
- VMs in different network (even in same region): using external IP by default
- 4 reserved IP in each subnet
- 1st: network gateway
- 2nd: subnet gateway
- 2nd-to-last & last: for broadcast
- example:
- VMs in same subnet but different zones
- still communicate with same subnet IP
- single firewall rule can apply to both VMs
- VMs in same subnet but different zones
- IP range of a subnet cannot be shrinked
IP
- Each VM has internal & external IPs
- Internal
- Allocated from subnet to VM with DHCP
- DHCP lease renew every 24h
- Register to network-scope DNS with (VM name, IP)
- External
- Can be
- Ephemeral (assiged from pool)
- Reserved
- VM does not know its external IP (mapped to internal IP)
- Handled by using internal DNS resolver
- Can be
- Internal
Routes & Firewall Rules
- Every network has
- routes: let instances in a network send traffic directly
- s default route: direct packets to dest. outside the network
Pricing
- Network
- Ingress: no charge
- Egress with no charge:
- to same zone via internal IP
- to Google products (YouTube, map, etc.)
- to different GCP services within same region
- Egress with chareg ($0.01 per GB):
- to instances in another zone in the same region
- to same zone via external IP
- between regions in US & Canada
- Egress between regions outside US & Canada: varies by region
- External IP address
- static IP (assigned but unused): $0.01 per hour
- static/ephemeral IP used on standard VM: $0.004 per hour
- static/ephemeral IP used on preemptible VM: $0.002 per hour
- static/ephemeral IP attached to forward rules: no charge
Cloud Load Balancing
HTTP(S) | SSL Proxy | TCP Proxy | Network UDP/TCP | Internal UDP/TCP | Internal HTTPS | |
---|---|---|---|---|---|---|
types | HTTP/HTTPS | TCP with SSL load | TCP without SSL load / doesn’t preserver client IP | TCP/UDP without SSL load / preserver client IP | TCP or UDP | HTTP or HTTPS |
scope | global, IPv4/IPv6 | global, IPv4/IPv6 | global, IPv4/IPv6 | regional, IPv4 | regional, IPv4 | regional, IPv4 |
ex/in | external | external | external | external | internal | internal |
port for LB | HTTP:80/8080, HTTPS:443 | specific | specific | Any | Any | HTTP:80/8080, HTTPS:443 |
Interconnections
Connection | Provides | Capacity | Requirements | Access Types |
---|---|---|---|---|
IPsec VPN tunnel | encrypted tunnel to VPC networks throughpublic internet | 1.5-3Gbps per tunnel | on-prem VPN-gateway | internal IP |
Dedicated Interconnect | dedicated, direct connection to VPC netwroks | 10Gbps per link (up to 8 links) | connection in colocation facility | internal IP |
Partner Interconnect | dedicated connection to VPC through a service provider | 50Mbps-10Gbps per connection | service provider | internal IP |
Direct Peering | dedicated, direct connection to Google’s netwroks | 10 Gbps per link | Connection in GCP PoPs | public IP |
Carrier Peering | Peering through Google’s public network through provider | varies | service provider | public IP |
- VPN
- low volume data connections (MTU < 1460 bytes)
- static routes & dynamic routes (Cloud Router, border gateway protocol (BGP))
- Interconnect
dedicated shared L3 (via public IP) Direct Peering Carrier Peering L2 (via VLAN) Dedicated Interconnect Partner Interconnect - Dedicated Interconnect: Need to provision an cross-connect between Google network & own router, & establish BGP session between Cloud Router & on-premises router
- Partner Interconnect: Need to connect to supported provider
- Peering
- No SLA
- Reach all Google services (GSuite, YouTube)
- Google’s edge points of presence (PoPs)
- Shared VPCs
- Multiple projects share one VPC
- One host project
- Multiple service projects
- Centralized
- Multiple projects share one VPC
- VPC Network Peering
- Connection accross two VPC networks (no matter they are within same project/organization or not)
- Decentralized
Stackdriver
- Monitoring
- Logging
- Debug
- Error reporting
- Trace
Big Data Platform
Cloud Dataproc
- Based on Hadoop
- Scale without terminating jobs
- Reduce cost with preemptible instances
Cloud Dataflow
- ETL pipeline
- Data analysis
- batch computation
- continuous computation using streaming
BigQuery
- data warehouse
- no cluster maintenance is required
- fast query
- pay separately for storage & queries
- reduce cost automatically if sustained usage
# show process bytes -> check price with price calculator bq query --dry_run
Cloud Pub/Sub
- “At least once” delivery
Cloud Datalab
- Built on Python Jupyter
Cloud Marketplace
- Tool for fast deploying functional packages, e.g. LAMP, Wordpress
- GCP doesn’t update/fix the services that are already deployed