-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pydocstyle] Escaped docstring in docstring (D301 ) #12192
[pydocstyle] Escaped docstring in docstring (D301 ) #12192
Conversation
@@ -69,8 +69,35 @@ pub(crate) fn backslashes(checker: &mut Checker, docstring: &Docstring) { | |||
// Docstring contains at least one backslash. | |||
let body = docstring.body(); | |||
let bytes = body.as_bytes(); | |||
let mut backslash_index = 0; | |||
let escaped_docstring_backslashes_pattern = b"\"\\\"\\\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But D301 is based on double quotes, do we need to cover single-quote docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal
ruff/crates/ruff_linter/src/docstrings/extraction.rs
Lines 6 to 15 in 7d16f83
/// Extract a docstring from a function or class body. | |
pub(crate) fn docstring_from(suite: &[Stmt]) -> Option<&ast::ExprStringLiteral> { | |
let stmt = suite.first()?; | |
// Require the docstring to be a standalone expression. | |
let Stmt::Expr(ast::StmtExpr { value, range: _ }) = stmt else { | |
return None; | |
}; | |
// Only match strings. | |
value.as_string_literal_expr() | |
} |
84ba7f5
to
0d94f8e
Compare
let escaped_triple_quotes = | ||
&bytes[position.saturating_add(1)..position.saturating_add(4)]; | ||
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" { | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic if what comes after the \
is shorter than 3 characters. I would rewrite this to something like
let escaped_triple_quotes = | |
&bytes[position.saturating_add(1)..position.saturating_add(4)]; | |
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" { | |
return false; | |
} | |
let after_first_backslash = &bytes[position + 1..]; | |
let is_escaped_triple = after_first_backslash.starts_with(b"\"\"\"") | |
|| after_first_backslash.starts_with(b"\'\'\'"); | |
if is_escaped_triple { | |
return false; | |
} |
// For the `"\"\"` pattern, each iteration advances by 2 characters. | ||
// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any
will be called twice for \\
, once for each backslash position but the offsets aren't to indices apart.
What I understand is that you want to track if you're at the beginning of an escape sequence.
This is not fully fledged out, but I think we may have to rewrite the entire loop
while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
let after_escape = &body[position + 1..];
let next_char_len = after_escape.chars().next().unwrap_or_default();
let Some(escaped_char) = &after_escape.chars().next() else {
break;
};
if matches!(escaped_char, '"' | '\'') {
let is_escaped_triple =
after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");
if is_escaped_triple {
// don't add a diagnostic
}
if position != 0 && offset == position {
// An escape sequence, e.g. `\a\b`
}
}
offset = position + escaped_char.len_utf8();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This helps a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Summary
This PR updates D301 rule to allow inclduing escaped docstring, e.g.
\"""Foo.\"""
or\"\"\"Bar.\"\"\"
, within a docstring.Related issue: #12152
Test Plan
Add more test cases to D301.py and update the snapshot file.