Skip to content

Commit 5fbfbb3

Browse files
committed
[FLINK-20086][docs] Override open() in UserDefinedFunction to load resources
1 parent 42f9d6e commit 5fbfbb3

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

docs/content.zh/docs/dev/python/table/udfs/python_udfs.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -558,3 +558,23 @@ class ListViewConcatTableAggregateFunction(TableAggregateFunction):
558558

559559
如果你在非 local 模式下运行 Python UDFs 和 Pandas UDFs,且 Python UDFs 没有定义在含 `main()` 入口的 Python 主文件中,强烈建议你通过 [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files) 配置项指定 Python UDF 的定义。
560560
否则,如果你将 Python UDFs 定义在名为 `my_udf.py` 的文件中,你可能会遇到 `ModuleNotFoundError: No module named 'my_udf'` 这样的报错。
561+
562+
## 在 UDF 中载入资源
563+
564+
有时候,我们想在 UDF 中只载入一次资源,然后反复使用该资源进行计算。例如,你想在 UDF 中首先载入一个巨大的深度学习模型,然后使用该模型多次进行预测。
565+
566+
你要做的是重载 `UserDefinedFunction` 类的 `open` 方法。
567+
568+
```
569+
class Predict(ScalarFunction):
570+
def open(self, function_context):
571+
import pickle
572+
573+
with open("resources.zip/resources/model.pkl", "rb") as f:
574+
self.model = pickle.load(f)
575+
576+
def eval(self, x):
577+
return self.model.predict(x)
578+
579+
predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
580+
```

docs/content/docs/dev/python/table/udfs/python_udfs.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -557,3 +557,23 @@ class ListViewConcatTableAggregateFunction(TableAggregateFunction):
557557

558558
To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended to bundle your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files), if your Python UDFs live outside of the file where the `main()` function is defined.
559559
Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'` if you define Python UDFs in a file called `my_udf.py`.
560+
561+
## Loading resources in UDFs
562+
563+
There are scenarios when you want to load some resources in UDFs first, then running computation (i.e., `eval`) over and over again, without having to re-load the resources. For example, you may want to load a large deep learning model only once, then run batch prediction against the model multiple times.
564+
565+
Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
566+
567+
```python
568+
class Predict(ScalarFunction):
569+
def open(self, function_context):
570+
import pickle
571+
572+
with open("resources.zip/resources/model.pkl", "rb") as f:
573+
self.model = pickle.load(f)
574+
575+
def eval(self, x):
576+
return self.model.predict(x)
577+
578+
predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
579+
```

0 commit comments

Comments
 (0)