Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify code generated for for_each_method_def and for_each_proto_slot #1641

Merged
merged 2 commits into from
May 29, 2021

Conversation

1tgr
Copy link
Contributor

@1tgr 1tgr commented May 28, 2021

This change replaces slice1.iter().chain(slice2)....chain(sliceN).for_each(f) with f(slice1); f(slice2); ... f(sliceN);.

On a real-world project it significantly reduces number of lines of LLVM code generated by the compiler, with a similar improvement in compilation time.

I adapted the word_count example to add a #[pyclass]:

#[pyclass]
struct WordCount {
    contents: String,
}

#[pymethods]
impl WordCount {
    #[new]
    fn new(contents: String) -> Self {
        Self { contents }
    }

    fn search(&self, needle: &str) -> usize {
        search(&self.contents, needle)
    }

    fn search_sequential(&self, needle: &str) -> usize {
        search_sequential(&self.contents, needle)
    }

    fn search_sequential_allow_threads(&self, py: Python, needle: &str) -> usize {
        search_sequential_allow_threads(py, &self.contents, needle)
    }
}
...
m.add_class::<WordCount>()?;

According to cargo llvm-lines --release, before this change the top 10 functions in the word_count example, by number of LLVM lines, are:

  Lines         Copies       Function name
  -----         ------       -------------
  50562 (100%)  2257 (100%)  (TOTAL)
   2323 (4.6%)    16 (0.7%)  <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
   2125 (4.2%)    24 (1.1%)  core::iter::traits::iterator::Iterator::fold
   1255 (2.5%)    21 (0.9%)  alloc::alloc::box_free
   1026 (2.0%)    16 (0.7%)  core::result::Result<T,E>::map_err
    844 (1.7%)    12 (0.5%)  std::panicking::try
    789 (1.6%)     8 (0.4%)  pyo3::callback::handle_panic
    753 (1.5%)     3 (0.1%)  word_count::<impl pyo3::class::impl_::PyMethods<word_count::WordCount> for pyo3::class::impl_::PyClassImplCollector<word_count::WordCount>>::py_methods::METHODS::__wrap::{{closure}}
    665 (1.3%)    16 (0.7%)  core::iter::adapters::chain::Chain<A,B>::new
    648 (1.3%)    16 (0.7%)  core::iter::traits::iterator::Iterator::chain
    598 (1.2%)    30 (1.3%)  core::ptr::read

And after:

  Lines         Copies       Function name
  -----         ------       -------------
  45552 (100%)  2168 (100%)  (TOTAL)
   1255 (2.8%)    21 (1.0%)  alloc::alloc::box_free
   1026 (2.3%)    16 (0.7%)  core::result::Result<T,E>::map_err
    844 (1.9%)    12 (0.6%)  std::panicking::try
    789 (1.7%)     8 (0.4%)  pyo3::callback::handle_panic
    753 (1.7%)     3 (0.1%)  word_count::<impl pyo3::class::impl_::PyMethods<word_count::WordCount> for pyo3::class::impl_::PyClassImplCollector<word_count::WordCount>>::py_methods::METHODS::__wrap::{{closure}}
    657 (1.4%)     7 (0.3%)  core::iter::traits::iterator::Iterator::fold
    598 (1.3%)    30 (1.4%)  core::ptr::read
    570 (1.3%)    10 (0.5%)  alloc::raw_vec::RawVec<T,A>::current_memory
    558 (1.2%)     8 (0.4%)  alloc::boxed::Box<T,A>::into_unique
    546 (1.2%)     7 (0.3%)  std::thread::local::LocalKey<T>::try_with

@1tgr
Copy link
Contributor Author

1tgr commented May 28, 2021

I expect the actual machine code to be the same before and after, since slice's iterator doesn't do anything special in its for_each, meaning it's equivalent to this for loop.

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the slice approach looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants