Course Note: Cloud Engineering with GCP on Coursera

Posted by Matt Wang on Monday, June 15, 2020

This is the note of the course Cloud Engineering with GCP . Go to this article to see how I prepared for GCP ACE exam.

Mind Map

  • GCP
    • Network Resources
      • VPC
        • Network
        • Subnet
        • IP
      • Load Balancer
        • HTTP(S)
        • SSL Proxy
        • TCP Proxy
        • Regional
        • Multi-Regional
      • VPN & Peering & Interconnect
    • Compute Services
      • Compute Engine (GCE)
        • Disk type
          • persistent
            • HDD/SSD
          • Local SSD
        • Instance template
        • Instance group
      • App Engine (GAE)
        • Standard
        • Flexible
      • Kubernetes Engine (GKE)
      • Cloud Function
    • Storage
      • Data warehouse
        • BigQuery
      • Database
        • NoSQL
          • BigTable
          • Cloud Firestore
            • Datastore mode
            • Native mode
        • Relational
          • Cloud SQL
          • Cloud Spanner
        • Redis/Memcache
          • Cloud Memorystore
    • SDK
    • Stackdriver
      • Monitoring
      • Logging
      • Debug
      • Error reporting
      • Trace
    • Big data services
      • BigQuery
      • Cloud Dataproc
      • Cloud Dataflow
      • Cloud Pub/Sub
      • Cloud Datalab

SDK

Resource Management

  • Resource Hierarchy
    • Global
      • Images
      • Snapshots
      • Networks
    • Regional
      • External IP addr.
    • Zonal
      • Instances
      • Disks
  • label vs tag
    • labeltag
      user-defined kv pairuser-defined string
      propagated through billingfirewall rules

Billing

  • Docs - Billing
  • Pricing Calculator
  • Four tools
    • Budget & Alerts
    • Billing export -> to BQ or Cloud Storage
    • Reports
    • Quotas
      • types
        • Rate Quotas: e.g. GKE API: 1000 requests per 100 sec
        • Allocation Quotas: e.g. 5 networks per project
      • can be canged on quota page

GCP Hiearchy

  • Organization (-Folder (- Folder)) - Project - Resource
  • Policies are inherited downwards
Project IDProject nameProject number
globally uniquenot need to be uniqueglobally unique
user defineduser definedassigned by GCP
immutablemutableimmutable
# set default project
gcloud config set project <project id>

IAM

  • Who can do what on which resource
  • 3 types of IAM Roles:
    • Primitive
      • Owner: invite/remove members, delete project
      • Editor: deploy/modified/configure
      • Viewer: read-only
      • Billing Administrator: control billing but not affect projects
    • Predefined
    • Custom
  • Services Account
    • Authentication for services
    • Use email addr. & cryptographic keys
  • commands
    # add IAM policy binding for a project
    gcloud projects add-iam-policy-binding --member=(user|group|serviceAccount:email) --role=ROLE
    
  • IAM policy (doc )
    # sample IAM policy
    {
      "bindings": [
        {
          "role": "roles/resourcemanager.organizationAdmin",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-project-id@appspot.gserviceaccount.com"
          ]
        },
        {
          "role": "roles/resourcemanager.organizationViewer",
          "members": [
            "user:eve@example.com"
          ],
          "condition": {
            "title": "expirable access",
            "description": "Does not grant access after Sep 2020",
            "expression": "request.time < timestamp('2020-10-01T00:00:00.000Z')",
          }
        }
      ],
      "etag": "BwWWja0YfJA=",
      "version": 3
    }
    
  • audit log: logs of Admin Activity, Data Access, and System Event
    # read audit log, `project` can be changed to `folder` or `organization`
    gcloud logging read "logName : projects/project-id/logs/cloudaudit.googleapis.com" --project=project-id
    
  • Resource manager roles
    • organization
      • Admin: full control over all resources
      • Viewer: view access to all resources
    • folder
      • Admin: full control over folders
      • Creator: browse hierarchy and create folder
      • Viewer: view folders and projects below a resources
    • project
      • Creator: create projects
      • Deleter: delete projects

Compute Resources

GCEGKEGAE-standardGAE-flexibleCloud Functions
LangAnyAnyPython, Node.js, Go, Java, PHPPython, Node.js, Go, Java, PHP, Ruby, .NET, Custom RuntimePython, Node.js, Go
Scalingserver autoscalingclusterautoscaling managed serversautoscaling managed serversserverless
Use casegeneralcontainer workloadsscalable web app, mobile backend appscalable web app, mobile backend applight-weightevent actions

App Engine (GAE)

StandardFlexible
LanguageJava, Python, Go, PHPany
Instance startupmilli-secsminutes
SSHNY
3rd-party libNY
Pricing modelafter free daily use, pay per instance class, w/ auto-shutdownPay for resource alloc. per hour, no auto-shutdown
otherall requests timeout is 60 sec
  • # migrate the service to new version
    gcloud app versions migrate v2 --service="s1"
    
    # split traffic evenly between 'v1' and 'v2' of service 's1', run:
    gcloud app services set-traffic s1 --splits v2=.5,v1=.5
    
    # deploy but send no traffic
    gcloud app deploy --no-promote
    

Compute Engine (GCE)

  • Docs
  • Features
    • choose vCPUs & memory
    • Networking
    • Access
      • Linux: SSH (tcp:22)
      • Windows: RDP (tcp:3389)
    • metadata server
      • e.g. startup-script-url & shutdown-script-url
      • fetch instance metadata from application (doc )
        curl metadata.google.internal/computeMetadata/v1/
        
    • OS images (doc )
      • Choose between
        • public base images
        • custom images
  • machine types
    • predefined
      • standardhigh-memoryhigh-CPUmemory-optimizedcompute-optimizedshared-core
        namen1-standard-<N>n1-highmem-<N>n1-highcpu-<N>n1-ultramem-<N>c2-standard-<N>f1-micro, g1-small
        mem/vCPU ratio3.75GB6.5GB0.9GB~24GB4GB
        new platformfor small intensive app
      • (N indicates # of vCPUs)
    • custom
      • number of vCPU = $2^n$ ($n = 0, 1, 2,…$)
      • memory must in (0.9, 6.5) GB per vCPU
      • total memory must be multiple of 256MB
      • extend memory -> provide higher memory/vCPU ratio with additional cost
  • disk:
    • persistent: up to 128 disks x 0.5TB (64TB in total)
      • HDD (standard) & SSD
      • resize while running
      • encryption keys
      • snapshot for backup (doc )
    • local SSD: up to 8disk x 375GB (3TB in total)
    • HDDSSDlocal SSDRAM Disk
      data redundencyYYNN
      encryption at restYYYN/A
      snapshottingYYNN
      bootableYYNN
      use casegeneral, bulk file storerandom IOPShigh IOPS, low latencylow latency & risk of data loss
  • Billing
    • per-second billing (with minimum of 1-min cost)
    • each vCPU & GB of memory is billed separately
    • discount
      • sustained use (monthly-based)
      • committed use (1-3 years)
      • preemptible
        • cost reduced up to 80%
        • live up to 24 hour
        • no auto-restart
# create an instance
gcloud compute instances create --machine-type=<machine-type> <instance_name>

# move an instance to another zone within same region
gcloud compute instances move <instance_name>
  • move instance to another region
    • make snapshot of persistent disk
    • create new persistent disk in target zones by restoring from snapshot
    • create VMs in target zones
    • assign static IP, update the VM’s references, and then delete the original VM

Instance Group

Autoscaling

Kubernetes Engine (GKE)

# create cluster
gcloud container clusters create <name>

# Add node to node pool (scale)
gcloud container clusters resize --num-nodes=3

# scale a deployment
kubectl scale <service> --replica=3

# auto scale
kubectl autoscale <service> --min=10 --max=15 --cpu=80

Storage

Cloud Storage (GCS)

  • Fully-managed scalable -> no manually provisioning is needed

  • Not a filesystem

  • Always encrypted with HTTPS

  • gsutil is frequently used

  • Can enable versioning

    • gsutil versioning set (on|off) gs://<bucket_name>
      
  • Data transfer

    • resumable by default
    • -m for transfering in parallel:
      gsutil -m cp -p file gs://bucket/object
      
    • Use BOTO file for additional configuration - doc
    • Streaming upload/download with -I option (doc )
      some_program | gsutil -m cp -I gs://my-bucket
      
  • ACL (Access Control Lists)

    • e.g.
      • e-mail
      • allUsers
      • allAuthenticatedUsers
    • # set ACL
      gsutil acl set private gs://bucket
      
      # Change ACL for all users to have read access (O for owner, W for write, R for read)
      gsutil acl ch -u AllUsers:R gs://example-bucket/example-object
      
  • signed URL

    # duration = 10 minutes
    gsutil signurl -d 10m /path/to/key gs://bucket/object
    

Storage Class

  • Docs - Storage Classes
  • Regional or Multi-regional can be changed to Nearline or Coldline
  • Regional cannot be changed to Multi-regional and vice versa
Multi-RegionalRegionalNearlineColdline
Data that isFrequently accessFreqenctly access within regionAccess less than once per monthAccess less than once per year
SLA99.95%99.90%99%99%
Storage Price$\star\star\star\star$$\star\star\star$$\star\star$$\star$
Retrieval Price$\star$$\star$$\star\star\star$$\star\star\star$
Use casesContent storage & deliveryIn-region analytics, GCE/GKE-related dataLong-tail content, backupArchiving, disaster recovery
minimum duration--30 days90 days

Lifecycle

  • Docs - Lifecycle
  • e.g.
    • delete objects older than XXX days
    • delete objects created before <Date>
    • keep N latest version only
  • gsutil lifecycle set <config-json-file> gs://bucket
    
  • config file example
    • # Move object from multi regional to COLDLINE in 30 days and delete after 300 days after creation
      {
        "lifecycle": {
          "rule": [
            {
              "action": {
                "type": "SetStorageClass",
                "storageClass": "COLDLINE"
              },
              {
                "condition": {
                  "age": 30,
                  "matchesStorageClass": ["MULTI_REGIONAL", "STANDARD", "DURABLE_REDUCED_AVAILABILITY"]
                }
              }
            },
            {
              "action": {
                "type": "Delete"
              },
              "condition": {
                "age": 270,
                "storageClass": "COLDLINE"
              }
            }
          ]
        }
      }
      

Database Services

  • Data storage comparison
    • Cloud DatastoreBigtableCloud StorageCloud SQLCloud SpannerBigQuery
      TypeNoSQL DocumentNoSQL wide-columnBlobstoreRelational SQL for OLTPRelational SQL for OLTPRelational SQL for OLAP
      TransactionYSingle-rowNYYN
      Complex queryNNNYYY
      CapacityTB+Pb+Pb+TBPbPb+
      Unit Size1Mb/entity~10Mb/cell, ~100Mb/row5TB/objectdetermined by engine10240MiB/row10Mb/row
      Best forSemi-structured app dataanalytical data, heavy read/writeWeb appLarge-scale appinteractive querying, offline analytics
      Use CaseAppIoT, AddTechImages, media filesHigh I/O, global consistency is neededData warehouse
      SQL-like query, free daily quotaSame API with HBase, user can increase # of instance
  • Decision chart
    • data structured?
      • No -> Cloud Storage
      • Yes -> Analytic workload?
        • Yes -> need updates or low latency
          • Yes -> BigTable
          • No -> BigQuery
        • No -> Relational data?
          • No -> Cloud Firestore
          • Yes -> Need horizontal scale
            • Yes -> Cloud Spanner
            • No -> Cloud SQL

Bigtable

  • Feats
    • scale to PB
    • Low latency (sub 10ms)
    • Auto update index for frequently accessed data -> balance workload between nodes

Cloud Spanner

  • Feats
    • scale to PB
    • Strong consistency
    • HA

Cloud SQL

  • MySQL/PostgreSQL
  • Auto replication
    • Vertical scaling (machine type)
    • (weak) Horizontal scaling (# of instaces)
  • <code>gcloud sql</code>

Cloud Firestore

  • 2 modes:
    • Datastore mode -> server
    • Native mode -> mobile & web app

Network Resources

VPC

  • Docs
  • Contained in a project
  • VPC has global scope but subnets are regional (and cross zones)
  • Important features
    • built-in route table
    • built-in global distributed firewall

Netowrks

  • Default quota: 5 networks in a project
  • Has no IP range
  • Global and span all regions
  • Contains subnetworks
  • Types:
    • Default
      • one subnet per region
      • default firewall rules: allow all ingress from ICMP, RDP & SSH and all internal traffic
    • auto-mode
      • one subnet per region
      • regional IP alloc. (cannot overlapped with other subnets in the same network)
      • /20 mask (can be expanded up to /16)
      • all in 10.128.0.0/9
      • default firewall rules: deny all ingress, allow all egress
    • custom-mode
      • no default subnets
      • full control of IP ranges
      • regional IP alloc.
  • Conversion
    • OK: default/auto -> custom
    • No: custom -> default/auto
  • Communication
    • VMs in same network: using internal IP
    • VMs in different network (even in same region): using external IP by default
  • 4 reserved IP in each subnet
    • 1st: network gateway
    • 2nd: subnet gateway
    • 2nd-to-last & last: for broadcast
  • example:
    • VMs in same subnet but different zones
      • still communicate with same subnet IP
      • single firewall rule can apply to both VMs
  • IP range of a subnet cannot be shrinked

IP

  • Each VM has internal & external IPs
    • Internal
      • Allocated from subnet to VM with DHCP
      • DHCP lease renew every 24h
      • Register to network-scope DNS with (VM name, IP)
    • External
      • Can be
        • Ephemeral (assiged from pool)
        • Reserved
      • VM does not know its external IP (mapped to internal IP)
        • Handled by using internal DNS resolver

Routes & Firewall Rules

  • Every network has
    • routes: let instances in a network send traffic directly
    • s default route: direct packets to dest. outside the network

Pricing

  • Network
    • Ingress: no charge
    • Egress with no charge:
      • to same zone via internal IP
      • to Google products (YouTube, map, etc.)
      • to different GCP services within same region
    • Egress with chareg ($0.01 per GB):
      • to instances in another zone in the same region
      • to same zone via external IP
      • between regions in US & Canada
    • Egress between regions outside US & Canada: varies by region
  • External IP address
    • static IP (assigned but unused): $0.01 per hour
    • static/ephemeral IP used on standard VM: $0.004 per hour
    • static/ephemeral IP used on preemptible VM: $0.002 per hour
    • static/ephemeral IP attached to forward rules: no charge

Cloud Load Balancing

HTTP(S)SSL ProxyTCP ProxyNetwork UDP/TCPInternal UDP/TCPInternal HTTPS
typesHTTP/HTTPSTCP with SSL loadTCP without SSL load / doesn’t preserver client IPTCP/UDP without SSL load / preserver client IPTCP or UDPHTTP or HTTPS
scopeglobal, IPv4/IPv6global, IPv4/IPv6global, IPv4/IPv6regional, IPv4regional, IPv4regional, IPv4
ex/inexternalexternalexternalexternalinternalinternal
port for LBHTTP:80/8080, HTTPS:443specificspecificAnyAnyHTTP:80/8080, HTTPS:443

Interconnections

ConnectionProvidesCapacityRequirementsAccess Types
IPsec VPN tunnelencrypted tunnel to VPC networks throughpublic internet1.5-3Gbps per tunnelon-prem VPN-gatewayinternal IP
Dedicated Interconnectdedicated, direct connection to VPC netwroks10Gbps per link (up to 8 links)connection in colocation facilityinternal IP
Partner Interconnectdedicated connection to VPC through a service provider50Mbps-10Gbps per connectionservice providerinternal IP
Direct Peeringdedicated, direct connection to Google’s netwroks10 Gbps per linkConnection in GCP PoPspublic IP
Carrier PeeringPeering through Google’s public network through providervariesservice providerpublic IP
  • VPN
    • low volume data connections (MTU < 1460 bytes)
    • static routes & dynamic routes (Cloud Router, border gateway protocol (BGP))
  • Interconnect
    • dedicatedshared
      L3 (via public IP)Direct PeeringCarrier Peering
      L2 (via VLAN)Dedicated InterconnectPartner Interconnect
    • Dedicated Interconnect: Need to provision an cross-connect between Google network & own router, & establish BGP session between Cloud Router & on-premises router
    • Partner Interconnect: Need to connect to supported provider
  • Peering
    • No SLA
    • Reach all Google services (GSuite, YouTube)
    • Google’s edge points of presence (PoPs)
  • Shared VPCs
    • Multiple projects share one VPC
      • One host project
      • Multiple service projects
    • Centralized
  • VPC Network Peering
    • Connection accross two VPC networks (no matter they are within same project/organization or not)
    • Decentralized

Stackdriver

  • Monitoring
  • Logging
  • Debug
  • Error reporting
  • Trace

Big Data Platform

Cloud Dataproc

  • Based on Hadoop
  • Scale without terminating jobs
  • Reduce cost with preemptible instances

Cloud Dataflow

  • ETL pipeline
  • Data analysis
    • batch computation
    • continuous computation using streaming

BigQuery

  • data warehouse
  • no cluster maintenance is required
  • fast query
  • pay separately for storage & queries
  • reduce cost automatically if sustained usage
  • # show process bytes -> check price with price calculator
    bq query --dry_run
    

Cloud Pub/Sub

  • “At least once” delivery

Cloud Datalab

  • Built on Python Jupyter

Cloud Marketplace

  • Tool for fast deploying functional packages, e.g. LAMP, Wordpress
  • GCP doesn’t update/fix the services that are already deployed

Deployment Manager