Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP657 Column position of raised exceptions #1099

Closed
tonybaloney opened this issue Oct 25, 2022 · 20 comments
Closed

PEP657 Column position of raised exceptions #1099

tonybaloney opened this issue Oct 25, 2022 · 20 comments
Labels
enhancement New feature or request

Comments

@tonybaloney
Copy link

In Debugpy, exceptions raised in Python 3.11 are missing the fine-grained position:

For example, in the following code:

d = {
    "x": {
        "y": {
            "z": {
                "theta": 1
            }
        }
    }
}

result = d["x"]["y"]["z"]["beta"]

On the CLI I get the position of the KeyError offset in the line underlined using PEP657 API:

$ python exploding_kitten.py
...
  File "/Users/anthonyshaw/tmp/exploding_kitten.py", line 11, in <module>
    result = d["x"]["y"]["z"]["beta"]
             ~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'beta'

But in VS Code it only highlights the line and not the position of the exception, making it harder to work out where the error came from.

screenshot 2022-10-25 at 13 33 54

Please can PEP657 be implemented in debugpy so that the Stopped event sent back via DAP include the position (start col, end col) of the raised exception.

@int19h int19h added the enhancement New feature or request label Oct 25, 2022
@pablogsal
Copy link

pablogsal commented Oct 25, 2022

Notice that the traceback module implements the highlighting so you can use it if you don't want to make a custom highlighter:

import traceback

d = {
    "x": {
        "y": {
            "z": {
                "theta": 1
            }
        }
    }
}

try:
    result = d["x"]["y"]["z"]["beta"]
except Exception as e:
    f = e

print("\n".join(traceback.extract_tb(f.__traceback__).format()))

This prints:

  File "example.py", line 14, in <module>
    result = d["x"]["y"]["z"]["beta"]
             ~~~~~~~~~~~~~~~~^^^^^^^^

Note: this needs to be executed from a file as source is not displayed in the REPL

@tonybaloney
Copy link
Author

screenshot 2022-11-04 at 07 56 27

screenshot 2022-11-04 at 07 56 44

Thanks for the update. Is there a way to propagate the col offset through DAP to the stop-point message? That way, the arrow inside the editor will point to the column and not highlight the whole line

@fabioz
Copy link
Collaborator

fabioz commented Nov 3, 2022

It seems that the stack trace from the DAP does indeed have a column. I'll reopen to investigate further.

@fabioz fabioz reopened this Nov 3, 2022
@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

Getting the actual column where the frame is stopped seems to have a number of intricacies related to it...

@pablogsal I see that there's a way to format the traceback, but following the code it seems there's a lot of internal things to map the bytecode offset to the character (going from bytes -> char, parsing using the ast to find a BinOp or Subscript and then computing anchors based on that...).

I guess I can copy all that code to get the start column for the current pos in a frame or analyze the formatted string looking for ^, but I was wondering if there was a better API to deal with that somewhere...

@pablogsal
Copy link

I am not sure I follow... why you need to copy that code when you can just execute it? If you format the trace back with the traceback module, you will get all those things computed for you.

I am clearly missing something here. Could you please explain in detail what is what you are trying to do and why you cannot leverage existing APIs?

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

I'd like to get the column (because I want to show the executing column in the editor and not just in the traceback string).

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

@pablogsal the image below shows the use-case (the traceback module does many additional operations to infer the final position during the traceback printing).

image

@pablogsal
Copy link

I am afraid you are going to need to explain the problem with more details because I am missing a ton of context here. What exactly are you trying to achieve?

Precisely because the traceback module does all these extra things, I am asking you why you cannot just use the APIs provided in the traceback module and just print the traceback?

@pablogsal
Copy link

If you are just trying to get the column you can just use the co_positions attribute in the code object in combination with the instruction offset from the exception or the frame.

@pablogsal
Copy link

So basically this:

https://github.com/python/cpython/blob/2844aa6a8eb1d486b5c432f0ed33a2082998f41e/Lib/traceback.py#L367

The instruction index is taken from the frame or the traceback.

Check the PEP for details but basically co_positions return you a generator of 4 tuples representing start and end line numbers and start and end column offsets for every instruction. Once you know the instruction that failed you can get the position easily from there.

Is this what you need ?

1 similar comment
@pablogsal
Copy link

So basically this:

https://github.com/python/cpython/blob/2844aa6a8eb1d486b5c432f0ed33a2082998f41e/Lib/traceback.py#L367

The instruction index is taken from the frame or the traceback.

Check the PEP for details but basically co_positions return you a generator of 4 tuples representing start and end line numbers and start and end column offsets for every instruction. Once you know the instruction that failed you can get the position easily from there.

Is this what you need ?

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

Humm, ok. The debugger does 2 things:

  1. Print the traceback and show it to the user (this works well with the current APIs).
  2. Show the line and column where the code is stopped. In this case the debugger needs the actual column where it's stopped.

The debugger is using co_positions with the instruction offset from the exception of the frame to get the current columns:

a2a3328#diff-6458f9e28ef139ef8e23d8fff5936d26eebe5ee46a5b095c100f35f6ccc882a3R240

-- that is later on passed on to the traceback.FrameSummary

a2a3328#diff-096f2487391f489de3955c8d78e1506c3d655de5267ed8ec9300a321c7882b41R1472

Now, using that column in the example:

d = {
    "x": {
        "y": {
            "z": {
                "theta": 1
            }
        }
    }
}

result = d["x"]["y"]["z"]["beta"]

Actually provides a column pointing to = d (column 9) when the column wanted is ["beta"]. The endcol seems fine though.

So, after that, to get to the actual start column, the traceback.FrameSummary needs to do many additional operations when formatting (because the endcol is correct but the start col is not), so, the mapping ends up mapping that col 9 end col 43 to the following anchor:

line: result = d["x"]["y"]["z"]["beta"]
frame_summary.colno 9
frame_summary.end_colno 43
anchors _Anchors(left_end_offset=16, right_start_offset=24, primary_char='~', secondary_char='^')

i.e.: it converts from utf-8 to chars (https://github.com/python/cpython/blob/2844aa6a8eb1d486b5c432f0ed33a2082998f41e/Lib/traceback.py#L479) and checks whether it's in a BinOp or Subscript to get the actual columns (https://github.com/python/cpython/blob/2844aa6a8eb1d486b5c432f0ed33a2082998f41e/Lib/traceback.py#L590).

So, my question is whether this additional processing to compute the anchors based on the actual code is available in an easier to use API.

@pablogsal
Copy link

So, after that, to get to the actual start column, the traceback.FrameSummary needs to do many additional operations when formatting (because the endcol is correct but the start col is not), so, the mapping ends up mapping that col 9 end col 43 to the following anchor:

But the start column is correct because that instruction is operating over the result of the previous getitems so the span of the instruction starts at "d" and ends at the corresponding dictionary access.

This is the same thing when you do x + b + c. If the last addition fails then the span will be the whole line because the operation is adding (a+b)+c and as the left operand is (a+b) it needs to highlight the whole line.

@pablogsal
Copy link

pablogsal commented Nov 4, 2022

So, my question is whether this additional processing to compute the anchors based on the actual code is available in an easier to use API.

No, and is unlikely that there will be. The anchors are just a way to offer better indications but they are not part of the public API. Indeed, the way we display these anchors and how many of them can and will change between versions of the interpreter. The only public apis are the one's provided by the traceback module and the ones described in the PEP.

Obviously you are welcomed to re implement that based on the co_positions output but copy paste that code is something I would not recommend because the internals can and will change.

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

But the start column is correct because that instruction is operating over the result of the previous getitems so the span of the instruction starts at "d" and ends at the corresponding dictionary access.

I get that it's semantically correct, but it's the same for all the accesses at that line, so, it's not that helpful (showing the range could be better, but the debug adapter protocol can only show one column not really a range and I think showing only the end column may be a bit strange for users).

Edit: actually, it seems I didn't read the DAP properly. It can show a range (but VSCode doesn't really act on it, it just shows the start col, so, not that helpful right now).

No, and is unlikely that there will be. The anchors are just a way to offer better indications but they are not part of the public API. Indeed, the way we display these anchors and how many of them can and will change between versions of the interpreter. The only public apis are the one's provided by the traceback module and the ones described in the PEP.

Roger that ;)

Obviously you are welcomed to re implement that based on the co_positions output but copy paste that code is something I would not recommend because the internals can and will change.

Maybe using the end column and just trying to back up based on some word heuristics would be reasonable -- I also have the feeling that the current implementation used on the traceback will change... maybe it'd be interesting to have some API which could provide those anchors even if the anchors themselves change and heuristics get better, but yeah, I get that right now this isn't an option, so, off to doing some other code to handle that ;)

@pablogsal
Copy link

pablogsal commented Nov 4, 2022

I will discuss this with other core devs to see if we can consider a better APIs for the future, but as you can imagine a lot of users won't be happy if we allow the API to change between versions. The problem of offering apis in the standard library is that the amount of consumers is really high and every one has different stability requirements, constraints and needs so even if your this use case will be fine, the maintenance cost for us can be unbounded once other consumers are involved so we need to be very conservative and offer building blocks.

I understand this answer is somewhat disappointing and I apologise but unfortunately we need to ensure the maintenance cost on our side stays controlled because we have seen too many instances of how hard is to change or even fix anything once is exposed as public.

Thanks for your understanding and you fantastic work with debugpy!

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

I understand this answer is somewhat disappointing and I apologise but unfortunately we need to ensure the maintenance cost on our side stays controlled because we have seen too many instances of how hard is to change or even fix anything once is exposed as public.

Oh, no need to apologize here. I understand well the constraints in which CPython devs must operate on (I can even copy the whole internal code related to that -- the only downside is that if it gets better I won't get the upgrades for free).

Thanks for the great work in CPython ;)

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2022

@pablogsal I actually created a copy and started doing some tests on my side to make sure the integration is ok, but it seems I found an issue related to unicode in the code in traceback.py: python/cpython#99103.

@pablogsal
Copy link

This is for the report, we will take a look soon 👍

fabioz added a commit to fabioz/debugpy that referenced this issue Nov 10, 2022
@fabioz fabioz closed this as completed in 04403dd Nov 11, 2022
@am1ter
Copy link

am1ter commented Jun 13, 2023

@fabioz, Hello!
I have encountered a very similar bug, but unfortunately, I cannot reproduce it in a new environment. I would like to report it just in case someone else encounters the same issue in the future. I only get an error in the case where the first exception is raised, but all subsequent exceptions work as expected..

Traceback (most recent call last):
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 252, in _on_run
    self.process_net_command_json(self.py_db, json_contents)
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_process_net_command_json.py", line 193, in process_net_command_json
    cmd = on_request(py_db, request)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_process_net_command_json.py", line 981, in on_stacktrace_request
    self.api.request_stack(py_db, request.seq, thread_id, fmt=fmt, start_frame=start_frame, levels=levels)
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_api.py", line 207, in request_stack
    if internal_get_thread_stack.can_be_executed_by(get_current_thread_id(threading.current_thread())):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 655, in can_be_executed_by
    self._cmd = py_db.cmd_factory.make_get_thread_stack_message(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command_factory_json.py", line 349, in make_get_thread_stack_message
    colno, endcolno = line_col_info.map_columns_to_line(line_text)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_frame_utils.py", line 93, in map_columns_to_line
    colno = _utf8_byte_offset_to_character_offset(original_line, self.colno)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/amiter/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_frame_utils.py", line 135, in _utf8_byte_offset_to_character_offset
    if byte_offset > offset:
       ^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'

A little bit information from the last calls of the _LineColInfo.map_columns_to_line() function

original_line="            raise EmailValidationError from e", colno=12, end_colno=45
original_line="            parse_obj_as(EmailStr, value)", colno=12, end_colno=41
original_line="    return model_type(__root__=obj).__root__, colno=None, end_colno=None

Information about my env:

  • Python: 3.11.2
  • VSC: 1.79.0

Please let me know if I could give you any other information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants