Skip to content

ONNX models crash when they are used in Colab's T4 GPU runtime #14109

@maziyarpanahi

Description

@maziyarpanahi

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

@danilojsl

What are you working on?

Downloading and loading models on ONNX over GPU devices crashes. (at least on T4 on Colab)

Current Behavior

Crashes with:

An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

	at ai.onnxruntime.providers.OrtCUDAProviderOptions.add(Native Method)
	at ai.onnxruntime.providers.OrtCUDAProviderOptions.<init>(OrtCUDAProviderOptions.java:44)
	at com.johnsnowlabs.ml.onnx.OnnxWrapper$.mapToCUDASessionConfig(OnnxWrapper.scala:152)
	at com.johnsnowlabs.ml.onnx.OnnxWrapper$.mapToSessionOptionsObject(OnnxWrapper.scala:136)
	at com.johnsnowlabs.ml.onnx.OnnxWrapper$.com$johnsnowlabs$ml$onnx$OnnxWrapper$$withSafeOnnxModelLoader(OnnxWrapper.scala:90)
	at com.johnsnowlabs.ml.onnx.OnnxWrapper$.read(OnnxWrapper.scala:122)
	at com.johnsnowlabs.ml.onnx.ReadOnnxModel.readOnnxModel(OnnxSerializeModel.scala:98)
	at com.johnsnowlabs.ml.onnx.ReadOnnxModel.readOnnxModel$(OnnxSerializeModel.scala:75)
	at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.readOnnxModel(MPNetEmbeddings.scala:471)
	at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.readModel(MPNetEmbeddings.scala:416)
	at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.readModel$(MPNetEmbeddings.scala:407)
	at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.readModel(MPNetEmbeddings.scala:471)
	at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.$anonfun$$init$$1(MPNetEmbeddings.scala:424)
	at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.$anonfun$$init$$1$adapted(MPNetEmbeddings.scala:424)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

Expected Behavior

Should work before upgrading to newer version of Spark NLP

Steps To Reproduce

!pip install spark-nlp pyspark

embeddings = MPNetEmbeddings.pretrained() \
    .setInputCols(["document"]) \
    .setOutputCol("embeddings")

Spark NLP version and Apache Spark

Spark NLP version 5.2.0
Apache Spark version: 3.5.0

Type of Spark Application

Python Application

Java Version

11

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions