-
Notifications
You must be signed in to change notification settings - Fork 899
Description
🌍 Environment
- Your operating system and version: Windows 10 20H2
- Your python version: python 3.8.7
- How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?: from exe file / no.
- Your Rust version (
rustc --version): 1.51.0 (beta) - Your PyO3 version: 0.13.2
- Have you tried using latest PyO3 master (replace
version = "0.x.y"withgit = "https://github.com/PyO3/pyo3")?: no
💥 Description
Hi everyone, I recently migrated some of my algorithms from python written to rust, the code is around 5,000 lines in total. The bad news is, I found that despite the excellent single-threaded execution efficiency, the pyo3 extension plugin in multi-threaded parallel mode does not execute very nicely. After releasing the GIL, running on my 8-core CPU, I was expecting a 4-8x speedup, but the actual speedup was only 2x.
I'm cautiously assuming this is caused by the type conversions between python and rust (convert python lists into rust vectors and then convert back) are all being executed under the GIL, this may be related to the fact that the data type I passed in was som ralatively long two-dimensional python lists. I would like to ask if this situation can be improved (the low-efficiency may be caused by my wrong calling method), or if it is my requirements that make it can not improve at all.
Minimum Implementation
lib.rs:
It accepts a M by N matrix and returns after each item +1. The algorithm is much more complex in real production.
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;
fn multithread_logic<const N:usize>(matrix: Vec<[f64;N]>
) -> Vec<Vec<f64>> {
let height = matrix.len() ;
let width = N;
let mut result = Vec::new();
for i in 0..height{
let mut row:Vec<f64> = Vec::new();
for j in 0..width {
row.push(matrix[i][j] + 1.0);
}
result.push(row);
}
result
}
#[pyfunction]
fn multithread(
py: Python,
matrix: Vec<[f64;32]>,
) -> Vec<Vec<f64>> {
py.allow_threads(|| multithread_logic(matrix))
}
#[pymodule]
fn testlib(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(multithread, m)?)?;
Ok(())
}call.py:
Uses a simple way to compare the speed of single-thread and multi-threaded execution speed. Time increases linearly in the actual execution, I'd like to know if this is due to my mismanagement of GIL.
import testlib
import time
import threading
matrix = [list(range(32)) for _ in range(2000)]
def single_thread(matrix):
for i in range(1000):
testlib.multithread(matrix)
st_time = time.time()
single_thread(matrix)
print(f"Single thread time: {time.time() - st_time} s")
st_time = time.time()
threads = []
for _ in range(8):
threads.append(threading.Thread(target = single_thread , args = (matrix,)))
for _ in threads:
_.start()
for _ in threads:
_.join()
print(f"Multi-threaded time: {time.time() - st_time} s")