-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag string use-case: language marker for syntax highlight #18
Comments
Typically I think this sort of thing ends up being handled by plugins. For example, out of the box VSCode does not highlight Javascript template literals. Instead people tend to use plugins like lit-html which, in addition to syntax highlighting, provides IntelliSense. With that said, it would be beneficial to supply plugins for popular editors (probably VSCode and PyCharm) by the time the Python version containing these changes is released. Ultimately, if those plugins became popular enough it could make sense for the behavior to get built-in into PyCharm or the Python extension for VSCode. Before then though, it seems like the main action we would take with respect to this PEP would be to mention that editors could supply features of this nature. |
I'm with PyCharm and I'm hopeful that (a) this all lands and (b) it makes it easier to support something basic out-of-the-box for IDEs and (c) more custom usages through plugins. For my component-thingy, not sure if the proposal would fit exactly, as it isn't anything on the list, as @rmorshea hints at. But likely that could be fixed. |
I'm in a bit of a lull at work right now so I think I may have some time to work on this over the holidays. |
@rmorshea Do you mean time to land tagstr in cpython main? That would be awesome :) |
Unfortunately no, this is very much a work in progress. At the moment, we don't have a complete draft proposal to share and get feedback on. I'm not super familiar with CPython's release timelines, but my guess is that we'd be shooting to get this into CPython 3.13. |
Does that mean there's a group of people working on this atm? I see no activity in this repo. |
At the moment it's @jimbaker and myself with support from @gvanrossum who, in addition to providing feedback and ideas, has contributed a branch of CPython with an initial implementation of tag-strings that we've been using to test out our work. With that said, both Jim and myself have gotten busy as of late. Right now we each have draft PRs up that respectively, contain an initial specification, and a tutorial on how tag strings could be used to render HTML templates. There's a lot more work to be done though. |
@rmorshea Thanks for following up, I was out of the loop during the holidays. @EmilStenstrom It's a really good idea, I'm just trying to think how a tag function could declare itself as supporting a specific syntax. We all "know" that Related is that we will be working in the context of https://peps.python.org/pep-0701/ - work that also was discussed at PyCon back in April. The formalization of f-strings is very much related to the formalization we need to do for tag strings, and consequently any general syntax support. |
@jimbaker I opened an issue with the vscode folks here for their opinions: microsoft/pylance-release#3874 - including some different ways this could work. I think the most straightforward way of doing this is to simply use the language identifiers that editors already use, and automatically apply syntax highlighting in that language, when a tag string with that name is used. This would make the feature discoverable in a way where things just work when you use a tag string which would be wonderful. If you would like avoid mistakes I think using annotations for this is nice. Annotated["html", str] could be specified in the tag function to make it highlight like you want. |
Speaking for PyCharm, I'm also interested. I sponsored a PyCharm plugin to explore the kinds of things I wanted from my htm.py-based component stuff. I could share some opinions but it's likely bike shedding. |
I think we don't even have the available colors picked out, so it cannot be bikeshedding just yet. 😁 One eventual possibility is a DSL registry, similar to https://www.schemastore.org/json/ Maybe that's easier because of the standard JSON Schema, but it does seem to solve the bootstrapping problem - how to connect a tag function with the syntax it supports. In the interim, we can also do something similar to using using a setting with a plugin would work - for this qualified name that can be imported from some Python package (private or on PyPI), it corresponds to this DSL, such as HTML or SQL. On the JSON schema side, I have used this setup for internally developed JSON schemas:
Note that while DSLs may vary, the interpolations themselves do not change except with respect to their syntactic placement and of course semantic meaning. See @rmorshea's recent additions in https://github.com/jimbaker/tagstr/blob/main/tutorial.rst for some more discussion, where we would want to have some strictness about placement, in this case raising |
I think a great solution for this would be if all editors by default shipped with a list like Linguists registry, or PyCharms language identifiers, and automatically applied syntax highlighting to tag strings based on the tag name. Optionally there could be settings to say "in this project, I want this tag to highlight in this language". Advantages:
Disadvantages:
|
Let's assume a registry like this was supported by editors. Let's recall that tag names are just standard names in Python, bound (presumably) to some callable that supports the tag function protocol. Such callables can be imported and defined as usual. Python IDEs like PyCharm and VSCode can readily track their use and definition, much like other object usage. So then it just becomes a question of registering the callable as supporting a specific DSL, like HTML or SQL. Especially in the case of SQL this also applies to dialect, such as SQLite vs Postgres. We just have to figure out how to do this registration.
Yes, this would be ideal.
There are two possible ways for the IDE to not properly recognize the desired language, assuming this registry:
|
Some updates: The Python extension people with VSCode are currently thinking about adding support for highlighting of python strings with other languages in them. The exact syntax is not decided, but there is technical triage going on about how to embed other languages inside python. Great news, and something I think would align nicely with this proposal. Actually, I think this would greatly enhance working with Python overall, in the worlds most popular code editor. I'm excited! :) It seems to get the ball rolling they need to see that people are excited about this. This is measured by the number of upvotes on this ticket: https://www.github.com/microsoft/pylance-release/issues/3874 - can I hope for some upvotes for you and some close friends of yours? :) |
My suggestion would be to map the tag name directly to the language with that name. But maybe that would be too aggressive? And yeah, that prohibits the use of r, since that's reserved as you say. Something that's come up from some people I've talked to is using the typing system for this. So that if you type the callable as a i.e. Annotated[str, "html"] editors would know what you mean. What are your thoughts on that? I think using the language identifiers as defined by PyCharm would make this really easy to use. |
I think the use of PEP 593's There are still conventions we need to use here for the metadata, but this approach seems to be the best. Thanks for the suggestion! |
Hmm... one issue with using Annotated is that it's not backwards compatible. There's still a lot of code out there that's on 3.6, and I'd guess editors would like to support them to. Annotated is from 3.9 if I understand the PEP above correctly. |
@EmilStenstrom as new language syntax, this functionality is not backwards compatible. Of course there are workarounds, similar to polyfills in JavaScript. @rmorshea mentions one, a transpiler - #20 - and as @gvanrossum pointed out there, this can be implemented with codecs, which allow for arbitrary source code rewriting. There are a number of packages that use this approach. See for example https://github.com/pyxl4/pyxl4 and its predecessor packages. |
There's always the old type comments: my_html = "<h1>Hello, World!</h1>" # type: Annotated[str, "html"] Not quite as slick, but I'm pretty sure editors will treat this the same as a normal type annotation. |
@rmorshea excellent workaround. I’m sold on using Annotated :) |
I'm currently missing how IDEs would apply (assumingly existent) highlighting for html/sql/js/css/whatever if those contain any thunks:
Additional complexity is multiple extensions like htm |
@arogozhnikov I'm with PyCharm so I can only comment on it. First, I can "language inject" a string and tell it that it has HTML inside of it. I can prefix The Lots of alternatives that people could investigate, if you didn't want the magical FWIW my ViewDOM package (actually, the underlying |
I think you missed my point. It is not about how to implement this syntax
› Lots of alternatives that people could investigate...
And they likely will. And there will be several dialects of tag strings for
html/SQL/PRQL/yaml/whatever, as it is currently in JS. And each syntax
needs syntax check and highlighting, because they are not compatible.
I see no way for IDEs too support all of that without a standard way to
expose syntax/highlight functions by tags themselves. Having a separate
extension for every IDE (or even just two of them) for every package is a
ton of work and maintenance - extension maintainer depends both on IDE and
package, and two languages, plus he still needs to implement syntax
analysis. With rare heroic exceptions it will not be maintained for long.
Tag's statically inferable type should provide an additional function to
run check/highlight, and IDE extension should invoke it.
…On Sun, 14 May 2023, 05:25 Paul Everitt, ***@***.***> wrote:
@arogozhnikov <https://github.com/arogozhnikov> I'm with PyCharm so I can
only comment on it.
First, I can "language inject" a string and tell it that it has HTML
inside of it. I can prefix f to make it an f-string with HTML. This would
work with IDEs that support this PEP aka Python 3.13.
The ** part isn't part of Python f-strings. You want a spread syntax as
in your ...${attributes}. In the implementation it appears as a dict with
** finishing the previous string. So it is something you could implement
in your html tag function.
Lots of alternatives that people could investigate, if you didn't want the
magical ... spread syntax.
FWIW my ViewDOM package (actually, the underlying htm.py package) does a
spread approach
<https://viewdom.readthedocs.io/en/latest/examples/components.html?highlight=spread#spread-props>
and it is very convenient.
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQGVW3DLLDESVP6H6BBY2LXGDFNJANCNFSM6AAAAAATCF2UEA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
True that every-possible-idea-not-yet-conceived will be hard to keep up with, as in JS. But a big chunk gets transferred out of custom-template-language (the Python status quo) into normal Python f-string and expression semantics and thus supported. IMO, a REALLY big chunk. But you're right, there's room for some |
@arogozhnikov you raise some very good points. Lit HTML (https://lit.dev/docs/templates/expressions/) and React JSX (https://react.dev/learn/javascript-in-jsx-with-curly-braces) take different approaches here. (Technically JSX doesn't build on tagged template literals, because it predates that adoption; for our purposes, let's just assume it does.) Lit has a richer syntax, not that different from your use of splat, with If we simply the use interpolation type, then we can assume that for at least a target DSL like HTML, per the Lit doc,
and further simplified by the fact that there's no context sigil for the expression. If that's the case, we can syntax highlight HTML in a standard way across implementing tags; and even typecheck standard DOM elements and their attributes (style should be a dict, hidden or checked should be a boolean, etc). Let me think about other DSLs, but even then I think this convention can hold - if the interpolating expressions were replaced by an empty value, and the DSL is still well-formed, then the syntax highlighter should still work as expected. I found https://code.visualstudio.com/api/language-extensions/embedded-languages interesting, along with https://www.jetbrains.com/help/idea/using-language-injections.html (which uses Java annotations to mark embedded usage). I'm sure there's a lot more we can look at. |
Interesting idea.
Agree, need to check how that works for SDLs.
<{tagname} {attr}={value} >content</>
# simplest
select a from table where c = {c_value}
# with dynamic columns
select a, {field}, b from table where c = {c_value} order by {order_col} |
Let's try it out. I find returning the args with an identity function useful for thinking about what tag functions do:
So we get an alternation of the (raw) text and thunks for each expression. Let's substitute in for each thunk the text
Interestingly, we take this approach in this example when actually parsing the HTML template, to simplify working with the underlying the use of
So we can try this as well. This is well-formed SQL:
and likewise true with this example:
So from a straightforward syntactic analysis, we should be able to preserve the well-formed quality of the input text with suitable chosen placeholders, possibly as simple as the one I used. There's a separate question, which is it possible to do deeper type checking on this? For example, could we type check that the interpolations would produce valid values for expressions in something like |
Also something that one could do is to annotate the expressions with content in the formatspec in the thunk. This of course looks just like a type annotation for a function:
The idea is that a type checker could verify that the interpolated expressions type accordingly, including avoid truthiness/other type coercions (so familiar in JavaScript). I don't know if this simply being too clever, or actually useful in real code, so putting this possibility out there. As can be seen, parsing works as "expected", becasue of the operator precedence of
|
Slightly tangential: There seems to be a library called python-inline-source that uses types for do inline code highlighting in VS Code. Seems to work in practice today! |
I'm surprised this would be considered legal. A format spec string value of If not, then wow, my dependency injector could work on a template string without a wrapping function. |
@pauleveritt it's already possible to use use arbitrary format specs with f-strings, so there's nothing new here:
Obviously this formatspec doesn't use standard format specifiers, https://peps.python.org/pep-3101/#standard-format-specifiers, but it is allowable, per https://peps.python.org/pep-3101/#controlling-formatting-on-a-per-type-basis There are some nice code examples out there, see for example https://nedbatchelder.com/blog/202204/python_custom_formatting.html, where Ned discusses a Lat Long type and a specific mini formatting DSL for that. Lastly, VSCode is perfectly happy with it being an arbitrary string in the formatspec position; and it will even helpfully highlight |
In PyCharm it's possible to write # language=html before a string, and have that string be syntax highlighted as html (or any other supported language). It's not supported by any other editor. It's clunky because it's not obvious what happens if the comment is on the same line as other code, if there's a newline in between, and so on.
What if there was a way to tag a string, so the code editor knows how to syntax highlight that string. Oh wait, that's what you're working on! :)
My use-case
A component in my library is a combination of python code, html, css and javascript. Currently I glue things together with a python file, where you put the paths to the html, css and javascript. When run, it brings all of the files together into a component. But for small components, having to juggle four different files around is cumbersome, so I've started to look for a way to put everything related to the component in the same file. This makes it much easier to work on, understand, and with fewer places to make path errors.
Example:
Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages. This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings. You saw the missing quote in js_string right? :)
If I instead use separate files, I get syntax highlighting and auto-completion for each file, because editors set language based on file type. But should I really have to choose?
Proposal
Would it be compatible with your work, to recommend editors to syntax highlight strings in python that has markers that correspond to this list of language identifiers? https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers
This would make the developer experience even better for developers, and make your example tags in this repo even more powerful.
html'<span class="calendar">{content}</span>'
would be highlighted as proper HTML (because the name html was used).Advantages:
Even without my suggestion above, I really like what you're doing here.
Let me know what you think!
The text was updated successfully, but these errors were encountered: