Dataproc Serverless service account

This document describes how to view and manage Identity and Access Management service account roles. A Dataproc Serverless batch workload or interactive session runs as the Compute Engine default service account, unless you specify a custom service account when you submit a batch workload, create a session, or create a session runtime template.

Required Dataproc Worker role

The Dataproc Serverless workload service account must have the Identity and Access Management Dataproc Worker role. The Compute Engine default service account (project_number[email protected]) that Dataproc Serverless uses has this role by default. If you specify your own service account for your batch workload, session, or session template, you must grant the Dataproc Worker role to your service account. Additional roles may be necessary to for other operations, such as reading and writing data to BigQuery.

View and manage IAM service account roles

To view and manage the roles granted to the Dataproc Serverless workload service account, do the following:

  1. In the Google Cloud console, go to the IAM page.

    Go to IAM

  2. Click Include Google-provided role grants.

  3. View the roles listed for the workload service account. The following image shows the required Dataproc Worker role listed for the Compute Engine default service account (project_number[email protected]) that Dataproc Serverless uses by default as the workload service account.

  4. You can click the pencil icon displayed on the service account row to grant or remove service account roles.

Cross-project service account

You can submit a Dataproc Serverless batch workload that uses a service account from a project that is different than the batch workload project (the project where the batch is submitted). In this section, the project where the service account is located is called the service account project, and the project where the batch is submitted is called the batch project.

Why use a cross-project service account to run a batch workload? One possible reason is if the service account in the other project has been assigned IAM roles roles that provide fine-grained access to the resources in that project.

Setup steps

  1. In the service account project:

    1. Enable service accounts to be attached across projects.

    2. Enable the Dataproc API.

      Enable the API

    3. Grant to your email account (the user who is creating the cluster) the Service Account User role on either the service account project or, for more granular control, the service account in the service account project.

      For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.

      gcloud CLI examples:

      The following sample command grants to the user the Service Account User role at the project level:

      gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \
          --member=USER_EMAIL \
          --role="roles/iam.serviceAccountUser"
      

      Notes:

      • USER_EMAIL: Provide your user account email address, in the format: user:[email protected].

      The following sample command grants to the user the Service Account User role at the service account level:

      gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \
          --member=USER_EMAIL \
          --role="roles/iam.serviceAccountUser"
      

      Notes:

      • USER_EMAIL: Provide your user account email address, in the format: user:[email protected].
    4. Grant the service account the Dataproc Worker role on the batch project.

      gcloud CLI example:

      gcloud projects add-iam-policy-binding BATCH_PROJECT_ID \
          --member=serviceAccount:SERVICE_ACCOUNT_NAME@SERVICE_ACCOUNT_PROJECT_ID.iam.gserviceaccount.com \
          --role="roles/dataproc.worker"
      
  2. In the batch project:

    1. Grant the Dataproc service agent service account the Service Account User and the Service Account Token Creator roles on either the service account project or, for more granular control, the service account in the service account project. By doing this, you allow the Dataproc service agent service account in the batch project to create tokens for the service account in the service account project.

      For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.

      gcloud CLI examples:

      The following commands grant the Dataproc service agent service account in the batch project the Service Account User and Service Account Token Creator roles at the project level:

      gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountUser"
      
      gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountTokenCreator"
      

      The following sample commands grant the Dataproc Service Agent service account in the batch project the Service Account User and Service Account Token Creator roles at the service account level:

      gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountUser"
      
      gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountTokenCreator"
      
    2. Grant the Compute Engine Service Agent service account in the batch project the Service Account Token Creator role on either the service account project or, for more granular control, the service account in the service account project. By doing this, you grant the Compute Agent Service Agent service account in the batch project the ability to create tokens for the service account in the service account project.

      For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.

      gcloud CLI examples:

      The following sample command grants the Compute Engine Service Agent service account in the batch project the Service Account Token Creator role at the project level:

      gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountTokenCreator"
      

      The following sample command grants the Compute Engine Service Agent service account in the cluster project the Service Account Token Creator role at the service account level:

      gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \
          --member=serviceAccount:service-BATCH_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \
          --role="roles/iam.serviceAccountTokenCreator"
      

Submit the batch workload

After complete the set up steps, you can submit a batch workload. Make sure to specify the service account in the service account project as the service account to use for the batch workload.