GCP Dataproc Cluster
Google Cloud Dataproc is a managed Spark / Hadoop service that provisions clusters of Compute Engine VMs on demand, runs distributed data-processing jobs, and tears the infrastructure down again when it is no longer required. A Dataproc Cluster represents that fleet of VMs together with their configuration (networking, IAM, autoscaling rules, encryption settings, initialisation actions, etc.).
For full details see the official documentation: https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#Cluster
Terrafrom Mappings:
google_dataproc_cluster.name
Supported Methods​
GET
: Get a gcp-dataproc-cluster by its "name"LIST
: List all gcp-dataproc-clusterSEARCH
Possible Links​
gcp-compute-network
​
Every Dataproc Cluster is launched inside a specific VPC network (and usually a sub-network) which controls its private IP range, routing and firewall behaviour.
gcp-storage-bucket
​
A Dataproc Cluster references one or more Cloud Storage buckets, e.g. the optional “cluster staging bucket” used for job jars and logs, or user-provided buckets mounted via Hadoop/Spark connectors.
gcp-compute-instance-group-manager
​
Each node pool (master, worker, secondary worker) in a Dataproc Cluster is implemented as a managed instance group created and controlled on the cluster’s behalf.
gcp-dataproc-autoscaling-policy
​
Clusters can be attached to an autoscaling policy that automatically adds or removes workers based on YARN metrics; the policy resource is linked to the cluster.
gcp-compute-node-group
​
If a cluster is deployed on sole-tenant nodes, the underlying VMs belong to a Compute Node Group which is referenced in the cluster specification.
gcp-iam-service-account
​
VMs in a Dataproc Cluster run under a default or user-supplied service account that grants them access to Storage, BigQuery, Pub/Sub and other Google Cloud APIs.
gcp-cloud-kms-crypto-key
​
Customer-managed encryption keys (CMEK) from Cloud KMS can be configured to encrypt the cluster’s persistent disks and in-cluster Storage buckets, creating a dependency on the Crypto Key.
gcp-compute-image
​
A Dataproc Cluster can use a custom or publicly available Compute Engine image (via the dataproc-image or machine-image fields) for its node VMs, linking it to the corresponding Image resource.