Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to determine whether integrated terminal has focus #117980

Closed
pokey opened this issue Mar 2, 2021 · 53 comments
Closed

API to determine whether integrated terminal has focus #117980

pokey opened this issue Mar 2, 2021 · 53 comments
Assignees
Labels
accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues api info-needed Issue requires more information from poster terminal General terminal issues that don't fall under another label under-discussion Issue is under discussion for relevance, priority, approach
Milestone

Comments

@pokey
Copy link
Contributor

pokey commented Mar 2, 2021

As part of implementing improved voice coding support, we are developing a VSCode extension that enables improved communication with talon. One thing we need is to be able to determine that the terminal pane currently has focus, so that we can activate commands for interacting with the terminal. To do so, we'd basically need two things:

  • A function to check whether the terminal has focus
  • An event that is fired when the terminal gains / loses focus
@pokey
Copy link
Contributor Author

pokey commented Mar 15, 2021

Fwiw I think half of this issue could be solved with #118722 by using the terminalFocus context

@Tyriar
Copy link
Member

Tyriar commented Mar 18, 2021

Right now our API doesn't really have the concept of focus I think by design. We'll see how #118722 turns out but I think the right thing to do here is to hook into the operating system's accessibility APIs and check what is being focused in the accessibility tree.

@Tyriar Tyriar added the under-discussion Issue is under discussion for relevance, priority, approach label Mar 18, 2021
@pokey
Copy link
Contributor Author

pokey commented Mar 18, 2021

@TylerLeonhardt thanks for your response. Not sure I understand, though. How come your design encourages exposing focus as a when-clause context, but not as an API? Thanks for your help

@Tyriar
Copy link
Member

Tyriar commented Mar 19, 2021

This is what I was told:

the API doesn’t really have the concept of focus, because focus if often in elements that have no API representation

@pokey
Copy link
Contributor Author

pokey commented Mar 20, 2021

Ok, well if your source has a response to my question, I'd be interested to understand the answer, but I guess hopefully #118722 will get merged and at least solve half the issue here

@daanzu
Copy link

daanzu commented Mar 22, 2021

@Tyriar Having this would be tremendously useful for accessibility. It is quite disappointing to hear that there is push back to implementing it.

@daanzu
Copy link

daanzu commented Mar 22, 2021

@pokey I hope such an extension could be made general enough to be used by voice software other than talon. I have been wanting to work on something similar for my own setup using dragonfly/kaldi-active-grammar. Specifically support for: executing VSCode commands, retrieving VSCode context, and retrieving VSCode's buffer text around the cursor for priming the dictation language model. Is your work being done on github?

@pokey
Copy link
Contributor Author

pokey commented Mar 22, 2021

Hi @daanzu glad to hear the interest! None of my extensions are talon-specific. I have a few extensions that I've developed for voice coding, but you'll probably be most interested in https://marketplace.visualstudio.com/items?itemName=pokey.command-server. Lmk if you have any troubles getting it set up / linked with kaldi and I'd be happy to hop on a zoom and pair up!

@Tyriar
Copy link
Member

Tyriar commented Mar 22, 2021

Why can't you use the OS' a11y APIs to do this? I would have expected that's possible as it's how screen readers get all their information.

@daanzu
Copy link

daanzu commented Mar 22, 2021

@Tyriar It should be possible, but the OS accessibility API tends to be quite complicated, slow, and prone to breaking. If I am understanding it correctly, the VSCode API already has the concept for key bindings, which has the benefit of being easily discoverable for users. It seems silly to then make it unavailable as one simple API query.

@pokey Thanks for the info! I'll take a look.

@lunixbochs
Copy link

lunixbochs commented Mar 25, 2021

Why can't you use the OS' a11y APIs to do this? I would have expected that's possible as it's how screen readers get all their information.

On Mac, VSCode appears to the accessibility subsystem by default as an opaque rectangle. In Electron, you can manually set a flag on the window to enable enhanced accessibility, but that doesn't work for everything, and when it does work it sort of exports the whole DOM which typically comes with a huge performance and resource hit (which is why it's disabled by default in Chromium and Electron - they aren't exactly light on resources to begin).

The simplest way to surface context information (e.g. which part of the app is active) to voice control systems is to put a simple string in the window title (e.g. "terminal" "file picker" "command bar"). The most robust way is to have some kind of plugin api for querying context, and events when context changes.

@sourishkrout
Copy link

This is what I was told:

the API doesn’t really have the concept of focus, because focus if often in elements that have no API representation

Would it alternatively be possible to fire vscode.window.onDidChangeActiveTerminal when the terminal comes into focus? That way one could imply focus if, let's say, a vscode.window.onDidChangeActiveTextEditor was fired last. Again, it's implicit but would be consistent with how vscode.window.onDidChangeActiveTextEditor works. This event fires again if the focus goes back to the same file; even works when both files are exposed side by side (E.g. view column 1 & 2).

My use-case requires me to reasonably triangulate if you're currently active in the editor or terminal. focused analogous to vscode.window.onDidChangeWindowState would be ideal, however, firing vscode.window.onDidChangeActiveTerminal every time the focus changes to active on a terminal would be sufficent.

Thoughts @pokey @Tyriar? Thanks!

@Tyriar
Copy link
Member

Tyriar commented Apr 2, 2021

@sourishkrout I think that blurs the lines a little too much better active and focus, right now it only fires when the actual active terminal is changed (last focused terminal instance within the active tab). I tested vscode.window.onDidChangeActiveTextEditor with the below and it doesn't fire on focus, just active like the terminal API?

vscode.window.onDidChangeActiveTextEditor(e => {
  vscode.window.showInformationMessage('focus text editor ' + e?.document.uri.fsPath);
});

@sourishkrout
Copy link

Hm. That's odd. I've used your code snippet and get below's result, @Tyriar. Fires every time. Does this match what you're seeing?

I'm open to other approaches: E.g. a property/method on vscode.window.activeTerminal or vscode.window.terminals - would that be an option? I'd really appreciate a path forward or a workaround.

events

@Tyriar
Copy link
Member

Tyriar commented Apr 2, 2021

But there the active editor is changing, just as if you had 2 terminals side by side the terminal API would do that as well?

@sourishkrout
Copy link

But there the active editor is changing, just as if you had 2 terminals side by side the terminal API would do that as well?

That's fair. I do see how the active terminal events fires semantically works the same as active editor if I use split terminals.
Thinking about alternatives (e.g. "how do i know the terminal's active?") I've stumbled over #79063. Going to drop a comment on that.

@lunixbochs
Copy link

Bumping this thread for follow-up discussion because #118722 was closed by @jrieken and the accessibility discussions there don't have a resolution or path forward yet.

#118722 (comment) - why getContext isn't the answer
#118722 (comment) - why VSCode can't completely own the when-clause

It's a huge limitation if accessibility tools must act as a one-way event source. We have large context-specific command sets that can change based on stuff like the current language, whether you're in a terminal, current terminal working directory, etc.

An example voice command use of this, (which wouldn't work at all as a one-way event source):

  • We can expose a list of the files in the current terminal working directory as part of a command, so you can e.g. git add filename by saying the filename instead of needing to spell the filename the hard way.
  • We can't really create the filename list unless we know you're in a terminal and we know the working directory.
  • Ideally we even have an event source telling us when you focus a UI component like a terminal, so we can decide at that point whether to asynchronously go look up filenames and such to prepare for a command.

VSCode is a complex enough application that we can't treat it as just "editor for current language". There are many different UI components that need slightly different treatment, and it's best if the accessibility tool can make decisions about what to do based on the specific kind of UI it's currently automating.

Anecdotally, I know some accessibility users of VSCode who are entirely avoiding the builtin terminal for these reasons.

@isidorn
Copy link
Contributor

isidorn commented Jun 22, 2021

This issue is better than #118722 since this is one is much more scoped to the actual problem. Exposing getContext is too broad for this particular use case.

Using the OS accessibility API approach as proposed by @Tyriar sounds reasonable to me.
Did you maybe try to make a prototype with this approach so we can all try it out and see the limitations? Screen readers use this approach successfully.

@lunixbochs
Copy link

lunixbochs commented Jun 22, 2021

this is one is much more scoped to the actual problem

I disagree, this issue is not the core problem, it's just the most annoying instance of it. There's more modal UI in VSCode than the Terminal. I will re-highlight from my previous response:

VSCode is a complex enough application that we can't treat it as just "editor for current language". There are many different UI components that need slightly different treatment, and it's best if the accessibility tool can make decisions about what to do based on the specific kind of UI it's currently automating.

As for accessibility:

Using the OS accessibility API approach as proposed by @Tyriar sounds reasonable to me.

I don't think we can rely on accessibility for this unless the performance issues are sorted out, and using the accessibility tree for this is a performance minefield - it's easy to regress. I found this issue about Mac accessibility performance which was closed for inactivity: #110764 (It's possibly related to this issue that is still open? #41423)

IMO the focused context is such a relatively small piece of information that it's incredibly wasteful to build and repeatedly query a whole-app accessibility tree just to figure it out.


If getContext() is no good because @jrieken wants VSCode's context query parser to own the context information, can we instead solve it like this?

  1. Have a queryContext(...) -> bool function where we pass in a hotkey-like context and vscode best-effort tells us if it's going to be active right now.
  2. Possibly some kind of "some kind of context switch may have just happened" callback into our code?
  3. I know that kind of context switch callback could feel expensive to allow, so a way to punt that work to the extension side would be a monotonically increasing contextSerial() -> int counter that we can query to see if something may have changed since the last time we evaluated our queryContext() list.

It doesn't matter so much if the solution isn't 100% perfect, e.g. has slight race conditions in case of a rapidly updating UI, because accessibility users are far more likely to be in a "paused" state when this needs to be evaluated (like listening to their screen reader, or pausing between voice commands), and are also more likely to be willing to hit undo and keep going than a normal user, because all of the alternatives are worse right now.

@pokey
Copy link
Contributor Author

pokey commented Sep 28, 2021

Any thoughts on this one @Tyriar? Please see @lunixbochs's suggestions above if you haven't already

@Tyriar Tyriar added api terminal General terminal issues that don't fall under another label labels Oct 6, 2021
@Tyriar
Copy link
Member

Tyriar commented Oct 18, 2021

Assigning @isidorn as this is an accessibility-related API request, afaik this goes against some product-wide API rules that would need to be broken and I don't think I'm the right one to fight for that.

@Tyriar Tyriar assigned isidorn and unassigned Tyriar Oct 18, 2021
@isidorn isidorn added this to the Backlog milestone Oct 18, 2021
@sbungartz
Copy link

@mqnc yes just like @pokey said, it only worked with key presses which turned out to have very undesireable effects as listed in my previous post, so I gave it up rather quickly. I am now in a place where I just have the terminal-tag and some others always active in vscode (see below), and I replaced the prefix git with source for all the VS-Code-Specific git commands, so I can say source push to trigger VS Code's built in push command but git push to write that into wherever I am.
This obviously means that sometimes shell commands are written into my code by accident, so the ability to accurately detect terminal focus would still be highly appreciated.

Talon Terminal active when in VS Code
app: vscode
-
tag(): terminal
tag(): user.generic_unix_shell
tag(): user.git

# Since terminal needs different copy-paste commands than the editor, we need to use this when we are in the terminal:
copy term: key(ctrl-shift-c)
(pace | paste) term: key(ctrl-shift-v)

@mqnc
Copy link

mqnc commented Jul 11, 2023

@sbungartz Thanks! I have a similar setup, it's highly unsatisfactory... Still kudos for the hacky idea above!

@pokey
Copy link
Contributor Author

pokey commented Jul 11, 2023

@sbungartz you might try doing insertion via the VSCode command that inserts text in the terminal directly. Then it wouldn't end up in your editor. Can't remember command ID but it's mentioned in VSCode terminal docs

Another possible workaround is to use terminal editors, Ie the ones that appear in an editor group instead of in the bottom panel. Those do show up in the window title

And finally, you can experiment with enabling the full electron accessibility tree, but keep in mind that has performance implications

Fwiw the approach I've taken personally is to add VSCode tasks for my most common terminal commands and map those to voice commands, then just use a keyboard for the rest 😔

@mrob95
Copy link

mrob95 commented Jul 11, 2023

May be an obvious point but you can also just alt tab to a separate terminal program. I don't find it too much of a burden to switch between them as necessary - it's one voice command whether you are switching within VSCode or switching to a different window.

@Tyriar Tyriar added the accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues label Jul 11, 2023
@Tyriar
Copy link
Member

Tyriar commented Jul 11, 2023

@meganrogge this would be a good a11y-related request to look into at some point. The main blocking thing is that we don't know is the API shape. I see there are some focus-related APIs now:

/**
* Whether the panel is active (focused by the user).
*/
readonly active: boolean;

export interface WindowState {
/**
* Whether the current window is focused.
*/
readonly focused: boolean;
}

@mqnc
Copy link

mqnc commented Jul 11, 2023

@pokey, @mrob95 if terminal editors or external terminals can be integrated as nicely as the default terminal (and for instance be used by default for debugging and everything without having to write "useExternalTerminal": true into a million config files), then that's actually a nice option. I will look into this. Thanks!

@pokey
Copy link
Contributor Author

pokey commented Jul 11, 2023

@Tyriar @meganrogge fwiw even just having a way to make it so that the window title reflects whether the terminal is focused would be great; arguably that would be even easier to leverage, as talon has really good support for watching the window title. I believe there's a separate issue for that one

@Tyriar
Copy link
Member

Tyriar commented Jul 11, 2023

@pokey I guess that ask is for a new "focused view" variable in window.title? That sounds like it would be fairly simple and uncontroversial

@pokey
Copy link
Contributor Author

pokey commented Jul 11, 2023

That would be amazing yeah. I believe that's tracked by #97790 fwiw

@mrob95
Copy link

mrob95 commented Jul 12, 2023

@pokey I guess that ask is for a new "focused view" variable in window.title? That sounds like it would be fairly simple and uncontroversial

That would be awesome! If the mechanism could be more general and support things like the file explorer, search/find, source control etc then that would also be extremely useful. The terminal is the most acute pain point by far though.

@pokey
Copy link
Contributor Author

pokey commented Jul 13, 2023

Yeah would also be useful if it could indicate that an editor has focus, as often we want to know "am I typing into an editor or not?". If the answer is "yes" we can be slick, eg use snippets, but fall back to simulating keypresses when the answer is "no"

@mqnc
Copy link

mqnc commented Jul 24, 2023

@Tyriar @meganrogge fwiw even just having a way to make it so that the window title reflects whether the terminal is focused would be great; arguably that would be even easier to leverage, as talon has really good support for watching the window title. I believe there's a separate issue for that one

What has to happen so this gets implemented? Is it worth looking into this myself and making a PR? (I have never looked at the vscode source but I have some experience with JS and TS).

@meganrogge
Copy link
Contributor

I am not sure what would be needed here, so you might have to poke around yourself. It might be as simple as displaying the context key of terminalFocus in a new variable as mentioned above

I guess that ask is for a new "focused view" variable in window.title? That sounds like it would be fairly simple and uncontroversial

@Tyriar
Copy link
Member

Tyriar commented Jul 24, 2023

@mqnc implementing this is quite easy, it needs a champion on the team to go to the API sync (again) to advocate and talk through what the actual API will be. I think you're talking about #97790 so that wasn't relevant. For #97790 @meganrogge is planning to look into it in August.

@meganrogge
Copy link
Contributor

Now that we have #97790, is there a use case for this API?

@meganrogge
Copy link
Contributor

@pokey pls lmk if you still think we need to do this

@meganrogge meganrogge added the info-needed Issue requires more information from poster label Oct 23, 2023
@vscodenpa
Copy link

This issue has been closed automatically because it needs more information and has not had recent activity. See also our issue reporting guidelines.

Happy Coding!

@vscodenpa vscodenpa closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
@pokey
Copy link
Contributor Author

pokey commented Nov 2, 2023

@pokey pls lmk if you still think we need to do this

Yeah agreed let's park this one for now; it's not urgently necessary for us. The window var PRs were huge 🙌

@ClemensDinkel
Copy link

With #97790 resolved, we still don't have have access to the current focus programmatically, do we ?
I'm currently trying to write an extension to insert a string into the focused text editor or terminal, depending on what's currently focused. For simplicity I wan't both operations to run on the same command. In the when clause I can simply say "when": "editorTextFocus || terminalFocus" but in my extension code, when I'm registering the command I have no way of knowing which one is currently focused. vscode.window.activeTextEditor and vscode.window.activeTerminal are both defined, since they return the last active instance, even if they're not currently in focus.

@Tyriar
Copy link
Member

Tyriar commented Feb 22, 2024

@ClemensDinkel you could have 2 commands and use different when clauses for each?

@ClemensDinkel
Copy link

Yes that's what I'm doing atm and they share the same shortcut, since they will never activate both at the same time. This gets close, but I still need to list seperate commands which feels a bit unsatisfying. Not a big issue, but still wanted to mention there is use cases for this scenario.

@valerybugakov
Copy link

Hey folks, do you have any updates on this? We've got a use case for this API. We want to process text document edits that users make (onDidChangeTextDocument), but only when the editor is focused. This would help us avoid handling changes triggered by version control systems and other third-party tools that might be activated from the terminal.

valerybugakov added a commit to sourcegraph/cody that referenced this issue Oct 18, 2024
Updated `CharacterLogger` to gain more insights into the unexpectedly
high number of insertions that affect our metrics.

- Group document changes by size.
- Rely on `vscode.window.activeTextEditor` to track edits from the
active window.
- Rely on `activeTextEditor.visibleRanges` to mark edits as outside of
the viewport.
- There's no way to detect if the active text editor is focused using
the VS Code API. Here's [the GitHub
issue](microsoft/vscode#117980) where I asked
for updates. It means we cannot reliably detect cases where users
switched to the integrated terminal window or to the sidebar where they
can perform actions that would result in a high number of
insertions/deletions (e.g., by interactive with version control CLI or
UI).
- To gain more information about such changes we group edits by document
selection staleness (if the cursor didn't move for the last 5 seconds)
and by the speed of changes (mark them as rapid if they occurred in less
than 15 ms which is the threshold I got from local testing
find-and-replace/refactor commands).
valerybugakov added a commit to sourcegraph/cody that referenced this issue Oct 19, 2024
Updated `CharacterLogger` to gain more insights into the unexpectedly
high number of insertions that affect our metrics.

- Group document changes by size.
- Rely on `vscode.window.activeTextEditor` to track edits from the
active window.
- Rely on `activeTextEditor.visibleRanges` to mark edits as outside of
the viewport.
- There's no way to detect if the active text editor is focused using
the VS Code API. Here's [the GitHub
issue](microsoft/vscode#117980) where I asked
for updates. It means we cannot reliably detect cases where users
switched to the integrated terminal window or to the sidebar where they
can perform actions that would result in a high number of
insertions/deletions (e.g., by interactive with version control CLI or
UI).
- To gain more information about such changes we group edits by document
selection staleness (if the cursor didn't move for the last 5 seconds)
and by the speed of changes (mark them as rapid if they occurred in less
than 15 ms which is the threshold I got from local testing
find-and-replace/refactor commands).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accessibility Keyboard, mouse, ARIA, vision, screen readers (non-specific) issues api info-needed Issue requires more information from poster terminal General terminal issues that don't fall under another label under-discussion Issue is under discussion for relevance, priority, approach
Projects
None yet
Development

No branches or pull requests