How does Cloud SQL high availability work?

Regional HA creates a synchronous standby in a different zone. Failover is automatic and completes in under 60 seconds. The IP address stays the same after failover.

Deploy Cloud SQL PostgreSQL with Terraform (2026 Guide)

Q: Is the Cloud SQL Auth Proxy required?

Not required but strongly recommended. It adds automatic SSL/TLS, IAM authentication without passwords, and connection health checks. Run it as a sidecar on GKE or a systemd service on Compute Engine.

Q: How do I migrate from self-managed PostgreSQL to Cloud SQL?

Use the Database Migration Service (DMS). It handles the initial full dump and continuous replication via logical decoding until you cut over.

Managing PostgreSQL on a VM gives you full control, but it also gives you full responsibility: patching, backups, failover, connection pooling, and the 3 AM pager when replication breaks. Cloud SQL takes those off your plate. You get a managed PostgreSQL instance with automated backups, point-in-time recovery, and optional high availability, all provisioned through Terraform so the configuration lives in version control where it belongs.

Original content from computingforgeeks.com - post 165828

This guide covers creating a Cloud SQL PostgreSQL 17 instance using both gcloud and Terraform, configuring private IP networking via VPC peering, enabling IAM database authentication, setting up read replicas, and connecting from GKE using the Cloud SQL Auth Proxy. For securing database credentials in GCP, see the Secret Manager tutorial. If your workloads run on GKE, the Workload Identity guide explains how the Auth Proxy authenticates without exporting service account keys.

Verified working: April 2026. Cloud SQL PostgreSQL 17, Enterprise edition, Terraform google provider 6.x

Cloud SQL vs Self-Managed PostgreSQL

The trade-off is cost versus operational burden. Cloud SQL costs more per vCPU-hour than a Compute Engine VM running PostgreSQL, but you do not spend engineering time on patching, backup validation, or failover testing.

Feature	Cloud SQL (Enterprise)	Self-Managed on GCE
Patching	Automated (maintenance window)	Manual, your responsibility
Backups	Automated daily + PITR, 7-day default retention	pg_dump / pgBackRest, self-managed
High Availability	One checkbox (regional HA with automatic failover)	Patroni/repmgr + load balancer, significant setup
Read Replicas	API call or Terraform resource	Manual streaming replication config
Connection Pooling	Built-in (pgBouncer via AlloyDB Omni, or Auth Proxy)	PgBouncer/Pgpool, self-managed
IAM Auth	Native (no passwords in connection strings)	Not applicable
Scale to Zero	Not supported (minimum 1 vCPU always running)	Not applicable
Max Storage	64 TB	Limited by disk size
PostgreSQL Versions	14, 15, 16, 17	Any version you compile

Pricing

Cloud SQL Enterprise edition charges per vCPU-hour and per GiB-hour of memory. There is no free tier for production instances (the free trial gives $300 in credits). Here are the real numbers for us-central1 as of April 2026:

Resource	Rate	Minimal Instance (2 vCPU, 8 GiB)
vCPU	$0.0413/hr	$60.30/month
Memory	$0.007/GiB-hr	$40.88/month
SSD Storage	$0.170/GiB-month	$1.70/month (10 GiB)
Total (single zone)		~$103/month
Total (HA, 2 zones)		~$206/month

HA doubles the compute cost because GCP runs a standby instance in another zone. Storage is shared, so it does not double. For a full breakdown of how GCP services accumulate cost, the GCP costs guide covers all the common gotchas.

Prerequisites

GCP project with billing enabled
APIs enabled: sqladmin.googleapis.com, compute.googleapis.com, servicenetworking.googleapis.com
gcloud CLI authenticated (gcloud auth application-default login)
Terraform 1.5+ with google provider 6.x
A VPC network (default or custom)

Enable the required APIs:

gcloud services enable sqladmin.googleapis.com \
  compute.googleapis.com \
  servicenetworking.googleapis.com

Create an Instance with gcloud

For a quick test or one-off instance, gcloud is the fastest path.

gcloud sql instances create pg-demo \
  --database-version=POSTGRES_17 \
  --tier=db-custom-2-8192 \
  --region=us-central1 \
  --storage-size=10GB \
  --storage-type=SSD \
  --storage-auto-increase \
  --backup-start-time=03:00 \
  --enable-point-in-time-recovery \
  --maintenance-window-day=SUN \
  --maintenance-window-hour=4 \
  --deletion-protection

Instance creation takes 3-5 minutes. Once ready, set the postgres user password:

gcloud sql users set-password postgres \
  --instance=pg-demo \
  --password='YourSecurePassword2026!'

Create a database:

gcloud sql databases create appdb --instance=pg-demo

Create with Terraform

Terraform gives you reproducible, version-controlled infrastructure. The configuration below creates the instance, database, and user.

resource "google_sql_database_instance" "postgres" {
  name             = "pg-demo"
  database_version = "POSTGRES_17"
  region           = "us-central1"
  project          = PROJECT_ID

  deletion_protection = false  # Set true in production

  settings {
    tier              = "db-custom-2-8192"
    disk_size         = 10
    disk_type         = "PD_SSD"
    disk_autoresize   = true
    availability_type = "ZONAL"  # "REGIONAL" for HA

    backup_configuration {
      enabled                        = true
      start_time                     = "03:00"
      point_in_time_recovery_enabled = true
      transaction_log_retention_days = 7

      backup_retention_settings {
        retained_backups = 7
      }
    }

    maintenance_window {
      day          = 7  # Sunday
      hour         = 4
      update_track = "stable"
    }

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.id
    }

    database_flags {
      name  = "cloudsql.iam_authentication"
      value = "on"
    }
  }

  depends_on = [google_service_networking_connection.private_vpc_connection]
}

resource "google_sql_database" "appdb" {
  name     = "appdb"
  instance = google_sql_database_instance.postgres.name
}

resource "google_sql_user" "app_user" {
  name     = "appuser"
  instance = google_sql_database_instance.postgres.name
  password = var.db_password
}

Store the db_password variable in Secret Manager or a terraform.tfvars file excluded from version control. Never hardcode passwords in Terraform configs.

Private IP Networking

By default, Cloud SQL gets a public IP. For production, disable the public IP and use private IP via VPC peering. This keeps database traffic off the internet entirely.

The private IP setup requires three resources: a reserved IP range, a VPC peering connection to Google’s service networking, and the Cloud SQL instance configured with ipv4_enabled = false.

resource "google_compute_global_address" "private_ip_range" {
  name          = "cloudsql-private-ip"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 16
  network       = google_compute_network.vpc.id
}

resource "google_service_networking_connection" "private_vpc_connection" {
  network                 = google_compute_network.vpc.id
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.private_ip_range.name]
}

With this in place, the Cloud SQL instance gets an internal IP from your VPC range. GKE pods and Compute Engine VMs in the same VPC can reach it directly. No public endpoint, no Cloud SQL Auth Proxy needed for basic connectivity (though the proxy adds connection pooling and IAM auth, which are still valuable).

IAM Database Authentication

IAM auth eliminates passwords for database connections. Instead, the connecting service account gets a short-lived OAuth2 token that Cloud SQL validates. This is the recommended approach for GKE workloads using Workload Identity.

Enable IAM auth on the instance (we already set the database flag in Terraform). Create an IAM database user:

resource "google_sql_user" "iam_user" {
  name     = "app-sa@PROJECT_ID.iam.gserviceaccount.com"
  instance = google_sql_database_instance.postgres.name
  type     = "CLOUD_IAM_SERVICE_ACCOUNT"
}

Grant the service account the roles/cloudsql.instanceUser role and roles/cloudsql.client role. The instanceUser role allows login, while client allows connecting via the Auth Proxy.

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:app-sa@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/cloudsql.instanceUser"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:app-sa@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

Then grant database-level permissions inside PostgreSQL:

GRANT ALL PRIVILEGES ON DATABASE appdb TO "app-sa@PROJECT_ID.iam";

Backups and Point-in-Time Recovery

Cloud SQL automated backups are enabled by default with a 7-day retention window. The backup runs daily at the time you specify (03:00 in our config). Point-in-time recovery uses the WAL (write-ahead log) to restore to any second within the retention window.

Restore to a specific timestamp:

gcloud sql instances clone pg-demo pg-demo-restored \
  --point-in-time="2026-04-10T14:30:00.000Z"

This creates a new instance from the backup. Cloud SQL does not support in-place PITR because that would require downtime on the running instance. The clone approach lets you validate the restore before switching traffic.

Read Replicas

Read replicas use PostgreSQL streaming replication under the hood. They are eventually consistent (replication lag is typically under 1 second) and can be in the same region or a different one for disaster recovery.

resource "google_sql_database_instance" "read_replica" {
  name                 = "pg-demo-replica"
  master_instance_name = google_sql_database_instance.postgres.name
  region               = "us-central1"
  database_version     = "POSTGRES_17"

  replica_configuration {
    failover_target = false
  }

  settings {
    tier            = "db-custom-2-8192"
    disk_autoresize = true
    disk_type       = "PD_SSD"

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.id
    }
  }
}

Point read-heavy application queries at the replica’s IP to offload the primary. Connection strings in your application should distinguish between write (primary) and read (replica) endpoints.

Connect from GKE with the Auth Proxy

The Cloud SQL Auth Proxy handles encryption, IAM auth, and connection management. On GKE, run it as a sidecar container in the same pod as your application. This pattern means your app connects to localhost:5432 and the proxy handles everything else.

containers:
  - name: app
    image: your-app:latest
    env:
      - name: DB_HOST
        value: "127.0.0.1"
      - name: DB_PORT
        value: "5432"
      - name: DB_NAME
        value: "appdb"
  - name: cloud-sql-proxy
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.14.3
    args:
      - "--structured-logs"
      - "--auto-iam-authn"
      - "PROJECT_ID:us-central1:pg-demo"
    securityContext:
      runAsNonRoot: true

The --auto-iam-authn flag enables automatic IAM authentication. Combined with GKE Workload Identity, no service account key file is needed. The Kubernetes service account maps to a GCP service account that has roles/cloudsql.client, and the proxy uses it transparently.

Connect from Compute Engine

If your application runs on a Compute Engine VM in the same VPC, you can connect directly to the private IP without the Auth Proxy. Install the PostgreSQL client:

sudo apt install -y postgresql-client

Connect using the private IP (find it in the Cloud SQL instance details or Terraform output):

psql -h 10.0.1.50 -U appuser -d appdb

For production, still use the Auth Proxy even on Compute Engine. It adds connection pooling and handles SSL certificate rotation automatically.

Monitoring

Cloud SQL exposes metrics natively in Cloud Monitoring (no agent installation needed). The most important metrics to watch:

database/cpu/utilization: sustained usage above 80% means it is time to scale up vCPUs
database/memory/utilization: PostgreSQL uses shared_buffers aggressively, so 70-80% usage is normal
database/disk/utilization: enable auto-resize and alert at 85%
database/postgresql/num_backends: connection count approaching max_connections means you need connection pooling
database/replication/replica_byte_lag: for read replicas, sustained lag above 1 MB indicates the replica cannot keep up

Create an alert policy for high CPU:

gcloud alpha monitoring policies create \
  --notification-channels=CHANNEL_ID \
  --display-name="Cloud SQL CPU > 80%" \
  --condition-display-name="High CPU" \
  --condition-filter='resource.type="cloudsql_database" AND metric.type="cloudsql.googleapis.com/database/cpu/utilization"' \
  --condition-threshold-value=0.8 \
  --condition-threshold-comparison=COMPARISON_GT \
  --condition-threshold-duration=300s

Production Checklist

Before going live, verify these settings:

High Availability: set availability_type = "REGIONAL" in Terraform. This creates a standby in another zone with automatic failover (doubles compute cost)
Maintenance window: schedule during lowest-traffic hours. Maintenance can cause a brief restart
Storage auto-resize: enabled by default, but set a storage auto-resize limit to prevent runaway growth from a bug flooding the database
Deletion protection: set deletion_protection = true in Terraform. Without it, a terraform destroy deletes the database with no confirmation
Private IP only: disable the public IP (ipv4_enabled = false). If you need occasional public access for debugging, use the Auth Proxy from your local machine instead
Database flags: tune shared_buffers (25% of RAM), work_mem, and max_connections based on workload. Cloud SQL exposes these as database flags

Cloud SQL vs AWS RDS PostgreSQL

If you are evaluating both clouds, this comparison covers the differences that actually matter in practice.

Feature	GCP Cloud SQL	AWS RDS PostgreSQL
Pricing model	Per vCPU-hour + per GiB-hour	Per instance-hour (fixed tiers)
Minimal instance	~$103/month (2 vCPU, 8 GiB)	~$49/month (db.t4g.medium, 2 vCPU, 4 GiB)
HA architecture	Regional (standby in another zone)	Multi-AZ (synchronous standby)
Read replicas	Same or cross-region	Same or cross-region, up to 15
Serverless scale-to-zero	Not supported	Aurora Serverless v2 (scales to 0.5 ACU)
Private networking	VPC peering (google_service_networking)	Subnet placement (no peering needed)
IAM auth	Native (IAM database users)	Supported (RDS IAM auth tokens)
Connection proxy	Cloud SQL Auth Proxy (sidecar)	RDS Proxy (managed, $$$)
Backup retention	1-365 days	0-35 days (automated), manual snapshots unlimited
Max storage	64 TB	64 TB (128 TB with io2)

RDS wins on entry-level pricing because of smaller instance types (t4g.micro starts at ~$12/month). Cloud SQL wins on IAM integration depth and the Auth Proxy’s zero-config connection handling. Both are solid choices; pick the one that fits your existing cloud footprint.

Troubleshooting

Error: “Failed to create subnetwork. Couldn’t find free blocks in allocated IP ranges”

The reserved IP range for VPC peering is exhausted. Either the prefix_length is too small or another Cloud SQL instance already consumed the range. Increase the prefix length (e.g., from /24 to /16) or create an additional reserved range.

Error: “Connection timed out” from GKE pods

The most common cause is a missing VPC peering route. Verify the peering connection is active:

gcloud compute networks peerings list --network=NETWORK_NAME

If the peering shows ACTIVE but connections still time out, check that the GKE cluster’s node network can route to the Cloud SQL private IP range. On Shared VPC setups, the host project must have the peering, not the service project.

Error: “FATAL: Cloud SQL IAM user authentication failed”

The IAM database user does not match the connecting service account. The username must be the full email without the .gserviceaccount.com domain suffix for PostgreSQL IAM users, or with it for Cloud IAM service account types. Double-check the google_sql_user resource type: it should be CLOUD_IAM_SERVICE_ACCOUNT for service accounts and CLOUD_IAM_USER for human users.

Terraform destroy fails with “deletion_protection is enabled”

Set deletion_protection = false in the Terraform config, run terraform apply to update the instance, then run terraform destroy. This two-step process is intentional: it prevents accidental destruction of production databases.

Cleanup

Remove all resources. If using Terraform:

terraform destroy

If the instance has deletion_protection enabled, disable it first:

gcloud sql instances patch pg-demo --no-deletion-protection
gcloud sql instances delete pg-demo

Delete the read replica separately (replicas must be deleted before the primary if using gcloud):

gcloud sql instances delete pg-demo-replica

FAQ

Can Cloud SQL PostgreSQL scale to zero?

No. Cloud SQL always runs at least one instance with the configured vCPU and memory. There is no serverless mode that scales to zero. If you need scale-to-zero for development databases, consider AlloyDB Omni (self-hosted) or Aurora Serverless v2 on AWS.

What PostgreSQL versions does Cloud SQL support?

As of April 2026, Cloud SQL supports PostgreSQL 14, 15, 16, and 17. Version 17 is the latest available. Major version upgrades are supported in-place, but test the upgrade on a clone first because some extensions may need recompilation.

How does high availability work?

Regional HA creates a standby instance in a different zone within the same region. Replication is synchronous: every write is confirmed on both the primary and standby before being acknowledged to the client. Failover is automatic and typically completes in under 60 seconds. The IP address stays the same after failover.

Is the Cloud SQL Auth Proxy required?

Not required, but strongly recommended. Without the proxy, you connect directly to the private IP using a password. The proxy adds automatic SSL/TLS encryption, IAM-based authentication (no passwords), and connection health checks. On GKE, run it as a sidecar. On Compute Engine, run it as a systemd service.

How do I migrate from self-managed PostgreSQL to Cloud SQL?

Use the Database Migration Service (DMS). Create a migration job with the source as your self-managed instance and the destination as a new Cloud SQL instance. DMS handles the initial full dump and then continuous replication via logical decoding until you are ready to cut over. Test the migration with a dry run first.