Skip to content

Conversation

@1tgr
Copy link
Contributor

@1tgr 1tgr commented May 28, 2021

This change replaces slice1.iter().chain(slice2)....chain(sliceN).for_each(f) with f(slice1); f(slice2); ... f(sliceN);.

On a real-world project it significantly reduces number of lines of LLVM code generated by the compiler, with a similar improvement in compilation time.

I adapted the word_count example to add a #[pyclass]:

#[pyclass]
struct WordCount {
    contents: String,
}

#[pymethods]
impl WordCount {
    #[new]
    fn new(contents: String) -> Self {
        Self { contents }
    }

    fn search(&self, needle: &str) -> usize {
        search(&self.contents, needle)
    }

    fn search_sequential(&self, needle: &str) -> usize {
        search_sequential(&self.contents, needle)
    }

    fn search_sequential_allow_threads(&self, py: Python, needle: &str) -> usize {
        search_sequential_allow_threads(py, &self.contents, needle)
    }
}
...
m.add_class::<WordCount>()?;

According to cargo llvm-lines --release, before this change the top 10 functions in the word_count example, by number of LLVM lines, are:

  Lines         Copies       Function name
  -----         ------       -------------
  50562 (100%)  2257 (100%)  (TOTAL)
   2323 (4.6%)    16 (0.7%)  <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
   2125 (4.2%)    24 (1.1%)  core::iter::traits::iterator::Iterator::fold
   1255 (2.5%)    21 (0.9%)  alloc::alloc::box_free
   1026 (2.0%)    16 (0.7%)  core::result::Result<T,E>::map_err
    844 (1.7%)    12 (0.5%)  std::panicking::try
    789 (1.6%)     8 (0.4%)  pyo3::callback::handle_panic
    753 (1.5%)     3 (0.1%)  word_count::<impl pyo3::class::impl_::PyMethods<word_count::WordCount> for pyo3::class::impl_::PyClassImplCollector<word_count::WordCount>>::py_methods::METHODS::__wrap::{{closure}}
    665 (1.3%)    16 (0.7%)  core::iter::adapters::chain::Chain<A,B>::new
    648 (1.3%)    16 (0.7%)  core::iter::traits::iterator::Iterator::chain
    598 (1.2%)    30 (1.3%)  core::ptr::read

And after:

  Lines         Copies       Function name
  -----         ------       -------------
  45552 (100%)  2168 (100%)  (TOTAL)
   1255 (2.8%)    21 (1.0%)  alloc::alloc::box_free
   1026 (2.3%)    16 (0.7%)  core::result::Result<T,E>::map_err
    844 (1.9%)    12 (0.6%)  std::panicking::try
    789 (1.7%)     8 (0.4%)  pyo3::callback::handle_panic
    753 (1.7%)     3 (0.1%)  word_count::<impl pyo3::class::impl_::PyMethods<word_count::WordCount> for pyo3::class::impl_::PyClassImplCollector<word_count::WordCount>>::py_methods::METHODS::__wrap::{{closure}}
    657 (1.4%)     7 (0.3%)  core::iter::traits::iterator::Iterator::fold
    598 (1.3%)    30 (1.4%)  core::ptr::read
    570 (1.3%)    10 (0.5%)  alloc::raw_vec::RawVec<T,A>::current_memory
    558 (1.2%)     8 (0.4%)  alloc::boxed::Box<T,A>::into_unique
    546 (1.2%)     7 (0.3%)  std::thread::local::LocalKey<T>::try_with

@1tgr
Copy link
Contributor Author

1tgr commented May 28, 2021

I expect the actual machine code to be the same before and after, since slice's iterator doesn't do anything special in its for_each, meaning it's equivalent to this for loop.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the slice approach looks good to me!

@davidhewitt davidhewitt merged commit 8bf3ade into PyO3:main May 29, 2021
@1tgr 1tgr deleted the for-each branch May 29, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants