Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97982: Reuse PyUnicode_Count in unicode_count #98025

Merged
merged 10 commits into from
Oct 12, 2022

Conversation

sobolevn
Copy link
Member

@sobolevn sobolevn commented Oct 7, 2022

This now makes unicode_count as optimized as PyUnicode_Count. There was a missing optimization, see c3cec78

However, I've decided not to touch anylib_count, because of (const Py_UCS1*)buf1 (and similar) casts and function ordering. We cannot directly call anylib_count from PyUnicode_Count. I don't think it is worth creating forward declarations / moving things just for that.

Objects/unicodeobject.c Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@vstinner
Copy link
Member

vstinner commented Oct 7, 2022

cc @serhiy-storchaka

@sobolevn
Copy link
Member Author

sobolevn commented Oct 7, 2022

To make bot happy:

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

@sobolevn
Copy link
Member Author

sobolevn commented Oct 7, 2022

Retriggering, this failure is listed in #97983

@sobolevn sobolevn closed this Oct 7, 2022
@sobolevn sobolevn reopened this Oct 7, 2022
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Please test in microbenchmarks that it does not cause regressions.

@sobolevn
Copy link
Member Author

sobolevn commented Oct 8, 2022

Benches

Old Version (main)

» pyperf timeit '"abcabd".count("ab")'
.....................
Mean +- std dev: 495 ns +- 7 ns

» pyperf timeit '":) тест 漢字".count("т")'
.....................
Mean +- std dev: 453 ns +- 13 ns

New version (issue-97982)

New optimization:

» pyperf timeit '"abcabd".count("ab")'
.....................
Mean +- std dev: 459 ns +- 9 ns

Old path:

» pyperf timeit '":) тест 漢字".count("т")'
.....................
Mean +- std dev: 457 ns +- 16 ns

@vstinner
Copy link
Member

Merging two functions which are basically the same is a good thing.

This now makes unicode_count as optimized as PyUnicode_Count. There was a missing optimization, see c3cec78

But I'm not convinced that asciilib_count() is different than ucs1lib_count(). It seems like the code is exactly the same, it doesn't see to use the property that the string is ASCII only.

» pyperf timeit '"abcabd".count("ab")'

  • before: Mean +- std dev: 495 ns +- 7 ns
  • after: Mean +- std dev: 459 ns +- 9 ns

I'm not convinced. You didn't explain how you built Python. Did you enable LTO and/or PGO? Which is your OS? Which compiler?

@vstinner
Copy link
Member

But I'm not convinced that asciilib_count() is different than ucs1lib_count().

I ran a benchmark: there is no difference. I would suggest to remove asciilib_count(). But merging the PyUnicode_Count() and unicode_count() is still interesting.

Micro-benchmark:

Mean +- std dev: [ref] 64.1 ns +- 0.1 ns -> [change] 62.3 ns +- 0.2 ns: 1.03x faster

Command:

$ ./configure --with-lto && make clean && make 
$ ./python -m venv env && ./env/bin/python -m pip install pyperf
$ ./env/bin/python -m pyperf timeit -o ../change.json -v '"abcabd".count("ab")'

Python built with LTO (without PGO), on Linux without CPU pinning. The change:

diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index bd169ed714..fd786fa3db 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -10885,10 +10885,16 @@ unicode_count(PyObject *self, PyObject *args)
     }
     switch (kind1) {
     case PyUnicode_1BYTE_KIND:
-        iresult = ucs1lib_count(
-            ((const Py_UCS1*)buf1) + start, end - start,
-            buf2, len2, PY_SSIZE_T_MAX
-            );
+        if (PyUnicode_IS_ASCII(self) && PyUnicode_IS_ASCII(substring))
+            iresult = asciilib_count(
+                ((const Py_UCS1*)buf1) + start, end - start,
+                buf2, len2, PY_SSIZE_T_MAX
+                );
+        else
+            iresult = ucs1lib_count(
+                ((const Py_UCS1*)buf1) + start, end - start,
+                buf2, len2, PY_SSIZE_T_MAX
+                );
         break;
     case PyUnicode_2BYTE_KIND:
         iresult = ucs2lib_count(

@vstinner
Copy link
Member

I ran a benchmark: there is no difference. I would suggest to remove asciilib_count().

I wrote PR #98164 for that.

@serhiy-storchaka
Copy link
Member

If asciilib_count and ucs1lib_count are the same, why there is a difference in #98025 (comment)?

@vstinner
Copy link
Member

If asciilib_count and ucs1lib_count are the same, why there is a difference in #98025 (comment)?

These timings are about 500 ns, whereas I got timings about 60 ns. Maybe the benchmark with 500 ns was run on Python built in debug mode? As I wrote, I would like to know more about this benchmark: OS name and version, compiler name and options, how was Python built (debug mode or not? etc.), ...

@vstinner
Copy link
Member

I remove asciilib_count() function in commit df3a6d9.

@sobolevn
Copy link
Member Author

Sorry for the long reply. Here's my initial setup:

  • MacOS X 10.14,
  • Python was built with ./configure --with-pydebug && make -j
  • CC.version: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1

Now, let's redo that once again with the exact same steps as @vstinner did.

New pyinfo CC.version: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 _decimal.__libmpdec_version__: 2.5.1 build.NDEBUG: ignore assertions (macro defined) build.Py_DEBUG: No (sys.gettotalrefcount() missing) build.Py_TRACE_REFS: No (sys.getobjects() missing) build.WITH_DOC_STRINGS: Yes build.WITH_DTRACE: No build.WITH_FREELISTS: Yes build.WITH_PYMALLOC: Yes build.WITH_VALGRIND: No builtins.float.double_format: IEEE, little-endian builtins.float.float_format: IEEE, little-endian config[_config_init]: 2 config[_init_main]: 1 config[_install_importlib]: 1 config[_is_python_build]: 1 config[_isolated_interpreter]: 0 config[argv]: ['-m'] config[base_exec_prefix]: '/usr/local' config[base_executable]: '/Users/sobolev/Desktop/cpython/python.exe' config[base_prefix]: '/usr/local' config[buffered_stdio]: 1 config[bytes_warning]: 0 config[check_hash_pycs_mode]: 'default' config[code_debug_ranges]: 1 config[configure_c_stdio]: 1 config[dev_mode]: 0 config[dump_refs]: 0 config[exec_prefix]: '/usr/local' config[executable]: '/Users/sobolev/Desktop/cpython/python.exe' config[faulthandler]: 0 config[filesystem_encoding]: 'utf-8' config[filesystem_errors]: 'surrogateescape' config[hash_seed]: 0 config[home]: None config[import_time]: 0 config[inspect]: 0 config[install_signal_handlers]: 1 config[int_max_str_digits]: 4300 config[interactive]: 0 config[isolated]: 0 config[malloc_stats]: 0 config[module_search_paths]: ['/usr/local/lib/python312.zip', '/Users/sobolev/Desktop/cpython/Lib', '/Users/sobolev/Desktop/cpython/build/lib.macosx-10.14-x86_64-3.12'] config[module_search_paths_set]: 1 config[optimization_level]: 0 config[orig_argv]: ['./python.exe', '-m', 'test.pythoninfo'] config[parse_argv]: 2 config[parser_debug]: 0 config[pathconfig_warnings]: 1 config[perf_profiling]: 0 config[platlibdir]: 'lib' config[prefix]: '/usr/local' config[program_name]: './python.exe' config[pycache_prefix]: None config[pythonpath_env]: None config[quiet]: 0 config[run_command]: None config[run_filename]: None config[run_module]: 'test.pythoninfo' config[safe_path]: 0 config[show_ref_count]: 0 config[site_import]: 1 config[skip_source_first_line]: 0 config[stdio_encoding]: 'utf-8' config[stdio_errors]: 'strict' config[stdlib_dir]: '/Users/sobolev/Desktop/cpython/Lib' config[tracemalloc]: 0 config[use_environment]: 1 config[use_frozen_modules]: 1 config[use_hash_seed]: 0 config[user_site_directory]: 1 config[verbose]: 0 config[warn_default_encoding]: 0 config[warnoptions]: [] config[write_bytecode]: 1 config[xoptions]: [] curses.ncurses_version: curses.ncurses_version(major=5, minor=7, patch=20081102) datetime.datetime.now: 2022-10-11 22:10:09.223993 expat.EXPAT_VERSION: expat_2.4.9 fips.openssl_fips_mode: 0 gdbm.GDBM_VERSION: 1.22.0 global_config[Py_BytesWarningFlag]: 0 global_config[Py_DebugFlag]: 0 global_config[Py_DontWriteBytecodeFlag]: 0 global_config[Py_FileSystemDefaultEncodeErrors]: 'surrogateescape' global_config[Py_FileSystemDefaultEncoding]: 'utf-8' global_config[Py_FrozenFlag]: 0 global_config[Py_HasFileSystemDefaultEncoding]: 0 global_config[Py_HashRandomizationFlag]: 1 global_config[Py_IgnoreEnvironmentFlag]: 0 global_config[Py_InspectFlag]: 0 global_config[Py_InteractiveFlag]: 0 global_config[Py_IsolatedFlag]: 0 global_config[Py_NoSiteFlag]: 0 global_config[Py_NoUserSiteDirectory]: 0 global_config[Py_OptimizeFlag]: 0 global_config[Py_QuietFlag]: 0 global_config[Py_UTF8Mode]: 0 global_config[Py_UnbufferedStdioFlag]: 0 global_config[Py_VerboseFlag]: 0 global_config[_Py_HasFileSystemDefaultEncodeErrors]: 0 locale.getencoding: UTF-8 os.cpu_count: 4 os.environ[CFLAGS]: -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include os.environ[CPPFLAGS]: -I/usr/local/opt/openssl/include os.environ[HOME]: /Users/sobolev os.environ[LANG]: ru_RU.UTF-8 os.environ[LC_ALL]: en_us.UTF-8 os.environ[LDFLAGS]: -L/usr/local/opt/openssl/lib os.environ[MACOSX_DEPLOYMENT_TARGET]: 10.14 os.environ[PATH]: /Users/sobolev/.cargo/bin:/Users/sobolev/.rbenv/shims:/Users/sobolev/.pyenv/shims:/Users/sobolev/.nvm/versions/node/v10.16.0/bin:/usr/local/share/npm/bin:/usr/local/opt/[email protected]/bin:/Users/sobolev/.poetry/bin:/Users/sobolev/.local/bin:/usr/local/opt/gpg-agent/bin:/usr/local/Cellar/zplug/2.4.2/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/opt/fzf/bin os.environ[PYTHONIOENCODING]: UTF-8 os.environ[SHELL]: /usr/local/bin/zsh os.environ[TERM]: xterm-256color os.environ[TMPDIR]: /var/folders/qn/2gssw9hx48g81chw0398hlrr0000gn/T/ os.getcwd: /Users/sobolev/Desktop/cpython os.getegid: 20 os.geteuid: 501 os.getgid: 20 os.getgrouplist: 20, 12, 61, 79, 80, 81, 98, 701, 33, 100, 204, 250, 395, 398, 399 os.getgroups: 20, 12, 61, 79, 80, 81, 98, 701, 33, 100, 204, 250, 395, 398, 399 os.getloadavg: (2.78564453125, 2.2626953125, 2.16162109375) os.getuid: 501 os.login: sobolev os.name: posix os.supports_bytes_environ: True os.supports_effective_ids: ['access'] os.supports_fd: ['chdir', 'chmod', 'chown', 'listdir', 'pathconf', 'scandir', 'stat', 'statvfs', 'truncate', 'utime'] os.supports_follow_symlinks: ['access', 'chflags', 'chmod', 'chown', 'link', 'stat', 'utime'] os.umask: 0o022 os.uname: posix.uname_result(sysname='Darwin', nodename='Mackbook-Nikita', release='18.7.0', version='Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64', machine='x86_64') platform.architecture: 64bit platform.platform: macOS-10.14.6-x86_64-i386-64bit platform.python_implementation: CPython pre_config[_config_init]: 2 pre_config[allocator]: 0 pre_config[coerce_c_locale]: 0 pre_config[coerce_c_locale_warn]: 0 pre_config[configure_locale]: 1 pre_config[dev_mode]: 0 pre_config[isolated]: 0 pre_config[parse_argv]: 1 pre_config[use_environment]: 1 pre_config[utf8_mode]: 0 pwd.getpwuid(501): pwd.struct_passwd(pw_name='sobolev', pw_passwd='********', pw_uid=501, pw_gid=20, pw_gecos='Nikita Sobolev', pw_dir='/Users/sobolev', pw_shell='/usr/local/bin/zsh') pymem.allocator: pymalloc readline._READLINE_LIBRARY_VERSION: EditLine wrapper readline._READLINE_RUNTIME_VERSION: 0x402 readline._READLINE_VERSION: 0x402 resource.RLIMIT_AS: (9223372036854775807, 9223372036854775807) resource.RLIMIT_CORE: (0, 9223372036854775807) resource.RLIMIT_CPU: (9223372036854775807, 9223372036854775807) resource.RLIMIT_DATA: (9223372036854775807, 9223372036854775807) resource.RLIMIT_FSIZE: (9223372036854775807, 9223372036854775807) resource.RLIMIT_MEMLOCK: (9223372036854775807, 9223372036854775807) resource.RLIMIT_NOFILE: (10240, 9223372036854775807) resource.RLIMIT_NPROC: (709, 1064) resource.RLIMIT_RSS: (9223372036854775807, 9223372036854775807) resource.RLIMIT_STACK: (67104768, 67104768) resource.pagesize: 4096 socket.hostname: Mackbook-Nikita sqlite3.sqlite_version: 3.24.0 ssl.HAS_SNI: True ssl.OPENSSL_VERSION: OpenSSL 1.1.1m 14 Dec 2021 ssl.OPENSSL_VERSION_INFO: (1, 1, 1, 13, 15) ssl.OP_ALL: 0x80000054 ssl.OP_NO_TLSv1_1: 0x10000000 ssl.SSLContext.maximum_version: -1 ssl.SSLContext.minimum_version: 771 ssl.SSLContext.options: 2186412116 ssl.SSLContext.protocol: 16 ssl.SSLContext.verify_mode: 2 ssl.default_https_context.maximum_version: -1 ssl.default_https_context.minimum_version: 771 ssl.default_https_context.options: 2186412116 ssl.default_https_context.protocol: 16 ssl.default_https_context.verify_mode: 2 ssl.stdlib_context.maximum_version: -1 ssl.stdlib_context.minimum_version: 771 ssl.stdlib_context.options: 2186412116 ssl.stdlib_context.protocol: 16 ssl.stdlib_context.verify_mode: 0 subprocess._USE_POSIX_SPAWN: True sys.api_version: 1013 sys.builtin_module_names: ('_abc', '_ast', '_codecs', '_collections', '_functools', '_imp', '_io', '_locale', '_operator', '_signal', '_sre', '_stat', '_string', '_symtable', '_thread', '_tokenize', '_tracemalloc', '_warnings', '_weakref', 'atexit', 'builtins', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'posix', 'pwd', 'sys', 'time') sys.byteorder: little sys.dont_write_bytecode: False sys.executable: /Users/sobolev/Desktop/cpython/python.exe sys.filesystem_encoding: utf-8/surrogateescape sys.flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0, dev_mode=False, utf8_mode=0, warn_default_encoding=0, safe_path=False, int_max_str_digits=4300) sys.float_info: sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1) sys.float_repr_style: short sys.hash_info: sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash13', hash_bits=64, seed_bits=128, cutoff=0) sys.hexversion: 51118240 sys.implementation: namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=0, releaselevel='alpha', serial=0), hexversion=51118240, _multiarch='darwin') sys.int_info: sys.int_info(bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640) sys.maxsize: 9223372036854775807 sys.maxunicode: 1114111 sys.path: ['/Users/sobolev/Desktop/cpython', '/usr/local/lib/python312.zip', '/Users/sobolev/Desktop/cpython/Lib', '/Users/sobolev/Desktop/cpython/build/lib.macosx-10.14-x86_64-3.12'] sys.platform: darwin sys.platlibdir: lib sys.prefix: /usr/local sys.stderr.encoding: utf-8/backslashreplace sys.stdin.encoding: utf-8/strict sys.stdout.encoding: utf-8/strict sys.thread_info: sys.thread_info(name='pthread', lock='mutex+cond', version=None) sys.version: 3.12.0a0 (heads/main:5ecf961640, Oct 11 2022, 21:43:44) [Clang 11.0.0 (clang-1100.0.33.16)] sys.version_info: sys.version_info(major=3, minor=12, micro=0, releaselevel='alpha', serial=0) sysconfig[CC]: gcc sysconfig[CFLAGS]: -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include sysconfig[CONFIG_ARGS]: '--with-lto' 'PKG_CONFIG_PATH=/usr/local/opt/openssl/lib/pkgconfig' 'CFLAGS=-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include' 'LDFLAGS=-L/usr/local/opt/openssl/lib' 'CPPFLAGS=-I/usr/local/opt/openssl/include' sysconfig[HOST_GNU_TYPE]: x86_64-apple-darwin18.7.0 sysconfig[MACHDEP]: darwin sysconfig[MULTIARCH]: darwin sysconfig[OPT]: -DNDEBUG -g -fwrapv -O3 -Wall sysconfig[PY_CFLAGS]: -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include sysconfig[PY_CFLAGS_NODIST]: -flto -std=c11 -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal sysconfig[PY_CORE_LDFLAGS]: -L/usr/local/opt/openssl/lib -L/usr/local/opt/openssl/lib -flto -Wl,-export_dynamic -Wl,-object_path_lto,"$@".lto -g sysconfig[PY_LDFLAGS]: -L/usr/local/opt/openssl/lib -L/usr/local/opt/openssl/lib sysconfig[PY_LDFLAGS_NODIST]: -flto -Wl,-export_dynamic -Wl,-object_path_lto,"$@".lto -g sysconfig[PY_STDMODULE_CFLAGS]: -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -flto -std=c11 -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I. -I./Include -I/usr/local/opt/openssl/include -I/usr/local/opt/openssl/include sysconfig[Py_DEBUG]: 0 sysconfig[Py_ENABLE_SHARED]: 0 sysconfig[SHELL]: /bin/sh sysconfig[SOABI]: cpython-312-darwin sysconfig[prefix]: /usr/local test_socket.HAVE_SOCKET_ALG: False test_socket.HAVE_SOCKET_BLUETOOTH: False test_socket.HAVE_SOCKET_CAN: False test_socket.HAVE_SOCKET_CAN_ISOTP: False test_socket.HAVE_SOCKET_CAN_J1939: False test_socket.HAVE_SOCKET_HYPERV: False test_socket.HAVE_SOCKET_QIPCRTR: False test_socket.HAVE_SOCKET_RDS: False test_socket.HAVE_SOCKET_UDPLITE: False test_socket.HAVE_SOCKET_VSOCK: False test_support._is_gui_available: False test_support.check_sanitizer(address=True): False test_support.check_sanitizer(memory=True): False test_support.check_sanitizer(ub=True): False test_support.python_is_optimized: True time.altzone: -10800 time.daylight: 0 time.get_clock_info(monotonic): namespace(implementation='mach_absolute_time()', monotonic=True, adjustable=False, resolution=1e-09) time.get_clock_info(perf_counter): namespace(implementation='mach_absolute_time()', monotonic=True, adjustable=False, resolution=1e-09) time.get_clock_info(process_time): namespace(implementation='clock_gettime(CLOCK_PROCESS_CPUTIME_ID)', monotonic=True, adjustable=False, resolution=1.0000000000000002e-06) time.get_clock_info(thread_time): namespace(implementation='clock_gettime(CLOCK_THREAD_CPUTIME_ID)', monotonic=True, adjustable=False, resolution=1e-09) time.get_clock_info(time): namespace(implementation='clock_gettime(CLOCK_REALTIME)', monotonic=False, adjustable=True, resolution=1.0000000000000002e-06) time.time: 1665515409.3162298 time.timezone: -10800 time.tzname: ('MSK', 'MSK') zlib.ZLIB_RUNTIME_VERSION: 1.2.11 zlib.ZLIB_VERSION: 1.2.11

commit before df3a6d9

The actual commit I am testing is: 7ec2e27

Steps:

  1. ./configure --with-lto && make clean && make
  2. ./python.exe -m venv env && ./env/bin/python.exe -m pip install pyperf
  3. ./env/bin/python.exe -m pyperf timeit -v '"abcabd".count("ab")'

Output:

Mean +- std dev: 156 ns +- 6 ns

Changed

Change is taken from #98025 (comment)

diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index bd169ed714..fd786fa3db 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -10885,10 +10885,16 @@ unicode_count(PyObject *self, PyObject *args)
     }
     switch (kind1) {
     case PyUnicode_1BYTE_KIND:
-        iresult = ucs1lib_count(
-            ((const Py_UCS1*)buf1) + start, end - start,
-            buf2, len2, PY_SSIZE_T_MAX
-            );
+        if (PyUnicode_IS_ASCII(self) && PyUnicode_IS_ASCII(substring))
+            iresult = asciilib_count(
+                ((const Py_UCS1*)buf1) + start, end - start,
+                buf2, len2, PY_SSIZE_T_MAX
+                );
+        else
+            iresult = ucs1lib_count(
+                ((const Py_UCS1*)buf1) + start, end - start,
+                buf2, len2, PY_SSIZE_T_MAX
+                );
         break;
     case PyUnicode_2BYTE_KIND:
         iresult = ucs2lib_count(

Exactly the same commands:

Mean +- std dev: 155 ns +- 8 ns

@vstinner
Copy link
Member

On your new benchmark, there is now basically the same performance:

  • Before: Mean +- std dev: 156 ns +- 6 ns
  • After: Mean +- std dev: 155 ns +- 8 ns

For me, the difference of 1 ns is just pure noise. It confirms that the change has no impact on performance. That's why I removed asciilib_count().

But again, please rebase your PR on the main branch and update it to move code which is in a common in unicode_count_impl() and try to use anylib_count(). Also, your tests are valuable! Just omit any mention on performance impact ;-)

@sobolevn
Copy link
Member Author

Will do, thank you :)

Py_UNREACHABLE();
}
result = anylib_count(kind1,
str, buf1 + start, end - start,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please double check that we don't need to use any casts here? I don't quite understand what (const Py_UCS1*)buf1) was used for.

Copy link
Member Author

@sobolevn sobolevn Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed: D:\a\cpython\cpython\Objects\unicodeobject.c(9016,44): error C2036: 'const void *': unknown size

It works on MacOS without it though.

Copy link
Member Author

@sobolevn sobolevn Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner which solution is better:

  • Using anylib_count with explicit casts?
const void* new_buf = NULL;

    switch (kind1) {
    case PyUnicode_1BYTE_KIND:
        new_buf = (const Py_UCS1*)buf1 + start;
        break;
    case PyUnicode_2BYTE_KIND:
        new_buf = (const Py_UCS2*)buf1 + start;
        break;
    case PyUnicode_4BYTE_KIND:
        new_buf = (const Py_UCS4*)buf1 + start;
        break;
    default:
        Py_UNREACHABLE();
    }

    result = anylib_count(kind1,
                          str, new_buf, end - start,
                          substr, buf2, len2,
                          PY_SSIZE_T_MAX);
  • or just removing anylib_count call and keeping things as-is?

The first one removes the duplication, but complicates code a bit more.

The second one is the opposite.

I'm happy with both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only use anylib_count() if it makes the code simpler. If calling it requires to add a switch, it's not worth it.

@vstinner vstinner merged commit ccab67b into python:main Oct 12, 2022
@vstinner
Copy link
Member

Merged, thank you :-)

@sobolevn
Copy link
Member Author

Thank you, I've learned quite a lot while working on this :)

@vstinner
Copy link
Member

unicodeobject.c is a giant beast who deserves some love.

carljm added a commit to carljm/cpython that referenced this pull request Oct 14, 2022
* main: (38 commits)
  pythongh-98251: Allow venv to pass along PYTHON* variables to pip and ensurepip when they do not impact path resolution (pythonGH-98259)
  Bpo-41246: IOCP Proactor avoid callback code duplication (python#21399)
  bpo-46364: Use sockets for stdin of asyncio only on AIX (python#30596)
  pythongh-98178: syslog() is not thread-safe on macOS (python#98213)
  Mark all targets in `Doc/Makefile` as `PHONY` (pythonGH-98189)
  pythongh-97982: Factorize PyUnicode_Count() and unicode_count() code (python#98025)
  pythongh-96265: Formatting changes for faq/general (python#98129)
  tutorial: remove "with single quotes" (python#98204)
  pythongh-97669: Remove Tools/scripts/startuptime.py (python#98214)
  signalmodule.c uses _PyErr_WriteUnraisableMsg() (python#98217)
  pythongh-97669: Fix test_tools reference leak (python#98216)
  pythongh-97669: Create Tools/patchcheck/ directory (python#98186)
  pythongh-65046: Link to logging cookbook from asyncio docs (python#98207)
  Formatting fixes in contextlib docs (python#98111)
  pythongh-95276: Add callable entry to the glossary (python#95738)
  pythongh-96130: Rephrase use of "typecheck" verb for clarity (python#98144)
  Fix some incorrect indentation around the main switch (python#98177)
  pythongh-98172: Fix formatting in `except*` docs (python#98173)
  pythongh-97982: Remove asciilib_count() (python#98164)
  pythongh-95756: Free and NULL-out code caches when needed (pythonGH-98181)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Oct 14, 2022
* main: (37 commits)
  pythongh-98251: Allow venv to pass along PYTHON* variables to pip and ensurepip when they do not impact path resolution (pythonGH-98259)
  Bpo-41246: IOCP Proactor avoid callback code duplication (python#21399)
  bpo-46364: Use sockets for stdin of asyncio only on AIX (python#30596)
  pythongh-98178: syslog() is not thread-safe on macOS (python#98213)
  Mark all targets in `Doc/Makefile` as `PHONY` (pythonGH-98189)
  pythongh-97982: Factorize PyUnicode_Count() and unicode_count() code (python#98025)
  pythongh-96265: Formatting changes for faq/general (python#98129)
  tutorial: remove "with single quotes" (python#98204)
  pythongh-97669: Remove Tools/scripts/startuptime.py (python#98214)
  signalmodule.c uses _PyErr_WriteUnraisableMsg() (python#98217)
  pythongh-97669: Fix test_tools reference leak (python#98216)
  pythongh-97669: Create Tools/patchcheck/ directory (python#98186)
  pythongh-65046: Link to logging cookbook from asyncio docs (python#98207)
  Formatting fixes in contextlib docs (python#98111)
  pythongh-95276: Add callable entry to the glossary (python#95738)
  pythongh-96130: Rephrase use of "typecheck" verb for clarity (python#98144)
  Fix some incorrect indentation around the main switch (python#98177)
  pythongh-98172: Fix formatting in `except*` docs (python#98173)
  pythongh-97982: Remove asciilib_count() (python#98164)
  pythongh-95756: Free and NULL-out code caches when needed (pythonGH-98181)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants