Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-44895: Introduce PYTHONDUMPREFSFILE variable for refcount dumping #27767

Merged
merged 9 commits into from
Aug 17, 2021

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Aug 14, 2021

@corona10 corona10 changed the title bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping [WIP] bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping Aug 14, 2021
@corona10 corona10 changed the title [WIP] bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping Aug 14, 2021
@corona10 corona10 requested a review from vstinner August 14, 2021 11:15
@corona10 corona10 requested a review from pablogsal August 14, 2021 11:38
@corona10
Copy link
Member Author

corona10 commented Aug 14, 2021

@vstinner @pablogsal
if you have better ideas about variable naming, please let me know.
Another option might be providing option as PYTHONDUMPREFS=2.

@pablogsal
Copy link
Member

Hummm, I would prefer not to have more environment variables. In this case is very easy to just dump the contents to a file via shell redirection ( 2> my_file). Maybe @vstinner thinks otherwise :)

@corona10
Copy link
Member Author

corona10 commented Aug 14, 2021

shell redirection ( 2> my_file)

Hmm, I feel inconvenient while investigating this issue lol (It can be relative to each person. It can be only me feeling ;))
I got the idea from the JVM heap dump extracting feature and I thought that it can be convenient to people who want to extract refcount information likewise JVM people doing.

Out of curiosity, how do you extract the dump when you have to execute the python multiple times with repeat command.
For example


 repeat 10 ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory

Sorry I am not shell expert :(

@corona10
Copy link
Member Author

corona10 commented Aug 14, 2021

And if we feel the adding variable is a burden how about separating PYTHONDUMPREFS option?

PYTHONDUMPREFS=1 -> stderr
PYTHONDUMPREFS=2 -> local file dump

But if we feel a burden about codebases, it's another discussion :)

if (dump_file) {
char dump_file_name[255];
time_t now = time(NULL);
sprintf(dump_file_name, "cpython-tracerefs-%ld.dump", now);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use the pid rather than the timestamp. An alternative is that the environment variable contains the filename, so the user handles filename conflict. The problem is that sometimes, the correct directory is removed once the process completes. It is the case when a regrtest worker process completes.

PYTHONDUMPFILE=/path/to/dump.txt

Copy link
Member Author

@corona10 corona10 Aug 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pid looks better, I would like to use PYTHONDUMPDIR instead of PYTHONDUMPFILE.
So the file will be created as PYTHONDUMPDIR/cpython-tracerefs-<:pid>.txt

char dump_file_name[255];
time_t now = time(NULL);
sprintf(dump_file_name, "cpython-tracerefs-%ld.dump", now);
fp = fopen(dump_file_name,"w");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how dump_refs is set to non-zero.

I would prefer to call _Py_PrintReferences(fp); fclose(fp); here. So both options are not exclusive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have to call _Py_PrintReferenceAddresses(fp) also.
And python_dump_dir is optional.

@pablogsal
Copy link
Member

Out of curiosity, how do you extract the dump when you have to execute the python multiple times with repeat command.

I use a shell for loop:

for i in seq 10; do ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 2>./dump_$i; done

@corona10 corona10 changed the title bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping bpo-44895: Introduce PYTHONDUMPDIR variable for refcount dumping Aug 14, 2021
@pablogsal
Copy link
Member

And if we feel the adding variable is a burden how about separating PYTHONDUMPREFS option?

How do you specify the name of the file then?

I am not fully opposed to this, but I see the dumping to file option much more constrained, specially given that this is a very niche option that is not used a lot. An example is precisely what you pointed out. What is an easy way to dump to different files in a repeat or loop run? You need to modify the environment variable every time, which feels a bit odd to me.

On the other hand, if others find this useful, then is fine to me.

@corona10
Copy link
Member Author

corona10 commented Aug 14, 2021

How do you specify the name of the file then?

My first implementation was creating a single file based on timestamp, which now is based on PID and I intended to create a file on where the interpreter run. This was what I intended when I suggested separating PYTHONDUMPREFS.
Since Victor suggests providing a directory option, now the file will be created on the designated directory.

IMHO, I don't prefer to designate a specific file name because the user feels inconvenient to set the name.
So I prefer generating names automatically which can distinguish the purpose.

@corona10 corona10 requested a review from vstinner August 14, 2021 16:42
@vstinner
Copy link
Member

What is an easy way to dump to different files in a repeat or loop run? You need to modify the environment variable every time, which feels a bit odd to me.

I don't see what's wrong with setting an env var to invoke a command. You can reuse shell variable, like the loop counter. You cannot do that if the filename is generated.

for i in seq 10; do PYTHONDUMPREFS=loop-$i.txt ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2; done

If the filename is generated, how do you use it in your loop? Example:

for i in seq 10; do
  output=/tmp/loop-$i.txt
  PYTHONDUMPREFS=$output ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2
  grep "leak" $output && break
done

@corona10
Copy link
Member Author

corona10 commented Aug 16, 2021

I don't see what's wrong with setting an env var to invoke a command. You can reuse shell variable, like the loop counter. You cannot do that if the filename is generated.

Hmm, I try to reduce bash level manipulation.

If the filename is generated, how do you use it in your loop? Example:

Yeah, in that case, we can not grep the file name. but we may be able to print filename through stderr instead in that case we can grep it :)


Now dumps the file .... $(PYTHONDUMPDIR)/cpython-tracerefs-<pid>.txt

@corona10
Copy link
Member Author

corona10 commented Aug 16, 2021

My intention is like this

export PYTHONDUMPREFS=2 # if we do not use PYTHONDUMPDIR variable.
for i in seq 10; do
  ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2
done

@corona10
Copy link
Member Author

corona10 commented Aug 16, 2021

bash

#!/bin/bash
export PYTHONDUMPREFS=1 # if we do not use PYTHONDUMPDIR variable.
export PYTHONDUMPDIR="./log"
for i in 1 2 3 4 5 6 7 8 9 10
  do
    ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2
  done

output

0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 419 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37503.txt
0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 342 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37523.txt
0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 342 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37529.txt
0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 345 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37530.txt
0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 365 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37531.txt
0:00:00 load avg: 2.56 Run tests sequentially
0:00:00 load avg: 2.56 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 345 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37533.txt
0:00:00 load avg: 2.51 Run tests sequentially
0:00:00 load avg: 2.51 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 381 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37534.txt
0:00:00 load avg: 2.51 Run tests sequentially
0:00:00 load avg: 2.51 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 403 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37535.txt
0:00:00 load avg: 2.51 Run tests sequentially
0:00:00 load avg: 2.51 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 371 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37545.txt
0:00:00 load avg: 2.51 Run tests sequentially
0:00:00 load avg: 2.51 [1/1] test_exceptions
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 367 ms
Tests result: SUCCESS
Now dumps the file: ./log/python-tracerefs-37546.txt

@vstinner
Copy link
Member

I don't know any program which gives the choice where a generated filename is created. Either the program requires a filename, or it generates a filename.

Example: valgrind --log-file=<filename> expects a filename. By default, output is written into stdout (or stderr, I don't recall).

@vstinner
Copy link
Member

Example: strace -o FILENAME.

Hum, I don't recall which programs generate filenames for the output. It is not common on Unix. Maybe there is a reason? ;-)

@corona10
Copy link
Member Author

corona10 commented Aug 16, 2021

It is not common on Unix.

Hmm okay, let's change it to use designated file name :)

@corona10 corona10 changed the title bpo-44895: Introduce PYTHONDUMPDIR variable for refcount dumping bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping Aug 16, 2021
Doc/using/cmdline.rst Outdated Show resolved Hide resolved
Include/cpython/initconfig.h Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
@corona10 corona10 changed the title bpo-44895: Introduce PYTHONDUMPFILE variable for refcount dumping bpo-44895: Introduce PYTHONDUMPREFSFILE variable for refcount dumping Aug 17, 2021
Doc/using/cmdline.rst Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left a minor suggestion for the doc.

Doc/using/cmdline.rst Outdated Show resolved Hide resolved
Co-authored-by: Victor Stinner <[email protected]>
@corona10 corona10 merged commit c2c857b into python:main Aug 17, 2021
@corona10 corona10 deleted the bpo-44895-dump branch August 17, 2021 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants