Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCPBATCH issue with private VPC network #7500

Open
yihming opened this issue Aug 20, 2024 · 10 comments
Open

GCPBATCH issue with private VPC network #7500

yihming opened this issue Aug 20, 2024 · 10 comments

Comments

@yihming
Copy link

yihming commented Aug 20, 2024

Hello,

I'm working on making our cromwell server work with GCP Batch and running in our private VPC network.

However, after following this tutorial, I encounter the following error:

com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: network field is invalid. network: projects/${project_id}/global/networks/${network_id}/ is not matching the expected format: global/networks/([a-z]([-a-z0-9]*[a-z0-9])?)$
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:92)
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:84)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1277)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:574)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:544)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:541)
	at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:489)
	at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:453)
	at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:486)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
		at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
		at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
		at cromwell.backend.google.batch.api.GcpBatchApiRequestHandler.$anonfun$submit$1(GcpBatchApiRequestHandler.scala:11)
		at cromwell.backend.google.batch.api.GcpBatchApiRequestHandler.withClient(GcpBatchApiRequestHandler.scala:29)
		at cromwell.backend.google.batch.api.GcpBatchApiRequestHandler.submit(GcpBatchApiRequestHandler.scala:9)
		at cromwell.backend.google.batch.actors.GcpBatchBackendSingletonActor$$anonfun$normalReceive$1.$anonfun$applyOrElse$1(GcpBatchBackendSingletonActor.scala:65)
		at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:678)
		at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
		at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
		at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
		at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
		at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
		at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
		at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

It seems that Cromwell only accepts public VPC network with names starting as global/networks/..., while my actual network name was automatically attached by prefix projects/${projectId}/global/networks/ (as shown in Line 1 of the error message above).

I just wonder if this is because I have something wrong in my conf file, or I missed some setup at GCP Batch side. Thanks!

I'm using Cromwell v87. And my conf file is

...
backend {
    ...
    providers {
        GCPBATCH {
            actor-factory = "cromwell.backend.google.batch.GcpBatchBackendLifecycleActorFactory"
            config {
                ...
                virtual-private-cloud {
                    network-label-key = "my-private-network"
                    subnetwork-label-key = "my-private-subnetwork"
                    auth = "application-default"
                }
                ...
        }
}

where my-private-network and my-private-subnetwork are GCP project labels.

@dspeck1
Copy link
Collaborator

dspeck1 commented Aug 20, 2024

Hi @yihming - thanks for providing the detail and log message. Please try removing the trailing / from the network url. so use projects/gred-cumulus-sb-01-991a49c4/global/networks/vpc-cumulus-sb-01 instead.

@yihming
Copy link
Author

yihming commented Aug 20, 2024

Hi @dspeck1 ,

Thank you for your immediate help!

I checked the my-private-network and my-private-subnetwork labels in my project (by running gcloud projects describe command), and neither of them has the trailing / (please see attached screenshot).

And actually this same settings in virtual-private-config stanza worked with Genomics API in the past 3 years. Then recently when I migrate to GCP Batch, it broke.

Screenshot 2024-08-20 at 14 21 36

@dspeck1
Copy link
Collaborator

dspeck1 commented Aug 20, 2024

Thanks! Sorry I was looking at it incorrectly. The GCP Batch backend adds the trailing slash. The Genomics API backend added a trailing slash as well. Google must have change the validation of the format. We will push a change that fixes it. In the interim if setting the network via the literal option instead of the label should fix it.

@yihming
Copy link
Author

yihming commented Aug 20, 2024

Thanks! I did see the trailing / is added by Cromwell: https://github.com/broadinstitute/cromwell/blob/develop/supportedBackends/google/batch/src/main/scala/cromwell/backend/google/batch/models/VpcAndSubnetworkProjectLabelValues.scala#L15.

I tried to set by literals as the following:

virtual-private-cloud {
                    network-name = "$NETWORK-NAME"
                    subnetwork-name = "$SUBNETWORK-NAME"
                    auth = "application-default"
}

where $NETWORK-NAME and $SUBNETWORK-NAME are replaced by the values of my-private-network and my-private-subnetwork labels, and hidden here.

but my server failed immediately when starting:

2024-08-20 21:43:02 main WARN  - Failed to build GcpBatchConfigurationAttributes on attempt 1 of 3, retrying.
cromwell.backend.google.batch.models.GcpBatchConfigurationAttributes$$anon$1: Google Cloud Batch configuration is not valid: Errors:
Virtual Private Cloud configuration is invalid. Missing keys: `network-label-key`.

It looks like the GCP Batch config requires network-label-key, which is not optional...

@yihming
Copy link
Author

yihming commented Aug 20, 2024

I then set network-label-key to a non-existing label name, hoping that cromwell could fall back to using literals at runtime:

virtual-private-cloud {
                    network-name = "projects/.../global/networks/$NETWORK-NAME"
                    subnetwork-name = "regions/.../subnetworks/$SUBNETWORK-NAME"
                    network-label-key = "dummy",
                    auth = "application-default"
}

Then it did.

@yihming
Copy link
Author

yihming commented Aug 20, 2024

@dspeck1 Can I confirm with you if the subnetwork name specified in subnetwork-name should follow regions/${region-name}/subnetworks/${subnetwork-name} pattern? I just cannot find how Cromwell adds prefix for subnetwork name in the source code. Thanks!

@yihming
Copy link
Author

yihming commented Aug 21, 2024

I can confirm that using the literal approach instead of project labels works in this case. One just need to:

  • Specify the full name of the private VPC network (i.e. with /) in network-name. It's in format projects/${project_id}/global/networks/${network_name}.
  • Specify the full name of the sub-network (i.e. with /) in subnetwork-name. It's in format regions/${region_name}/subnetworks/${subnetwork_name}.
  • Specify network-label-key, as this is required for GCP Batch backend. Just give a non-existing project label here, and Cromwell will fall back to use the literals at runtime, if it cannot find such project label in your project.

If the cromwell team can confirm that this is some inconsistency/bug corresponding to GCP Batch, I'd hope this issue could be fixed so that:

  1. Users don't have to always specify full names of private VPC network and subnetwork names. Namely, remove the trailing / when cromwell automatically attaches prefixes.
  2. When using the literal approach, just make network-label-key not required.

Thanks!

@dspeck1
Copy link
Collaborator

dspeck1 commented Aug 21, 2024

We are working on updating the code to fix the bugs describe above and will provide an update when complete.

@yihming
Copy link
Author

yihming commented Aug 21, 2024

Thank you @dspeck1 so much for your help!

@dspeck1
Copy link
Collaborator

dspeck1 commented Aug 21, 2024

Adding notes to issue re: PAPIv2 behavior:

  • does not require network label when virtual private cloud stanza is omitted
  • appends trailing / on subnetwork and genomics API accepts it
  • network label key required when virtual private cloud stanza added even if network name is defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants