Skip to content
Ilya Sher edited this page Mar 30, 2024 · 10 revisions

UI Chain Design

This document is WIP and for discussion. 2024-03.

"Chain" in NGS UI refers to related commands run in sequence. "Timeline", the central UI part for interacting with NGS contains commands which are organized into "chains".

TODO: elaborate about chains and their construction.

The Frustration

Seing objects on the screen (which are not even objects in a typicall shell, they are nothing, just text) and being unable to interact with them is driving me nuts. This is just wrong. Yet, that's exactly the experience in the shell. I refuse to believe that an interface which interacts with the user on a single line is the pinnacle of engineering and can be more productive than one that uses the whole screen. NGS is an attempt to fix this horrible UX.

Around 1974-1975, a new capability was added to terminals: cursor movement. The terminal stopped being 1:1 with printer and paper. Interaction with the whole screen became possible. In response to the new capability, Bill Joy released vi, in 1976. It brought us text editing as we know it today - using the whole screen. See anyone using ex or ed today? I don't. I think it's because whole-screen interaction won.

Shells, until this day, treat outputs of programs as if printed on paper - zero interactivity possible.

Frustration Example

You have just listed your AWS EC2 instances. The output is on your screen:

Instance ID Name Security Groups ...
i-1234 prod-web sg-5678 (HTTP from LB) ...
i-9abc temp sg-def1 (SSH from anywhere) ...
... ... ... ...

One of the instances has the tag Name with the value temp. You want to terminate that instance. Today, you are faced with two bad choices.

  • Copy the Instance ID value from the output above and paste it into the new command that you are constructing: aws ec2 terminate-instances --instance-ids PASTE-HERE. This creates unreproducible command. Tomorrow's temp instance will have different Instance ID. While you will have this command in your history, it won't be usable as is. (Side note: how much the shell doesn't care? The copy+paste functionality is implemented by the terminal, not the shell)
  • Pipes! Of course pipes! Construct a piece of code which would list the instances, find the one (or all) instances with Name being temp, and feed the extracted IDs to the terminate command. You are being pushed by the shell to start coding instead of focusing on your goal. It might feel "normal" to you today because that's what you had your whole life. It isn't.

The Solution

The shell must parse program's outputs and "understand" them in order to provide a meaningful interaction. Such interaction, needs to be captured on the semantic level (i.e. the meaning of what's happening) to allow recording for later replaying.

What if your shell does not understand the output?

Properly constructed software (where we aim here) should handle this situation well. Like AWS CDK exposes L1-L3 constructs, such layered approach is appropriate here too. The more the shell "understands" the more useful it is. You should be no worse off than other shell when the NGS doesn't. The answer is definitely not "we just add parser to NGS" for each command in the universe, like command line completion doesn't cover every possible program.

Note that "partial understanding" is an option. For example auto-detected JSON could allow some interactions with the data.

Interaction with objects on the screen... So, you are describing the web.

No. The common web UI is fine for one-off tasks. It's inadequate to serve as a shell's UI: it doesn't help reproducing the work. There is no appropriate history (navigation history is inadequate for serving as shell's history) nor facilities for recording/replaying.

AWS EC2 console recently (2024, I think) added something that helps with reproducibility. One can generate CloudFormation or CDK code from the operations that have just been performed in the UI. This functionality does have some resemblance to what NGS is attempting to do while the vision differs. For example navigating between resources (not for creation but for debugging) and parametrization is planned in NGS while it doesn't seem (my guess) that AWS would support that - it's way beyond your "normal" web UI.

The envisioned UI is somewhat similar to the web but keeping the history of what happened clear and mostly unmodifiable. There shouldn't be a single click or keypress that is lost and is not visisble in an interaction record.

Solution Example

Very coarse example of the main part of the UI.

command 1 (ex: list AWS CodePipelines)
output 1 (ex: Table of AWS CodePipelines)

interaction record 1 (ex: "You have selected the CodePipeline with 'Status' field 'Failed'")

command 2 (ex: Constructed from previous interaction "Show the failed execution of the selected CodePipeline")
output 2 (ex: the failed CodePipeline execution)

interaction record 2 (ex: "You have selected execution stage with 'Status' field 'Failed'")

...

Understanding of Program's Outputs

The shell is not supposed to do that

Many shells do have command line completion. It is a powerful, productive, and loved feature. It's based on understanding of program's inputs. What's puzzling for me is how the above argument draws the line right between understanding of inputs and understanding of outputs of a given program.

That's too much work!

It is quite a bit of work. Command line completion looks like the same order of magnitude and it is being done.

Interaction with Program's Outputs

The shell should allow interaction with objects on the screen. Given "understanding" from the section above, it is possible to provide meaningful interaction, including plugin system where each plugin defines with which objects it works and which operations it supports (could be used for building context menu for example).

Record / Replay

Record / replay functionaliy is essential for automation. Current shells do not provide this capability. Capturing history as it's done by shells today is inadequate (unless you code every step in the shell and practivally use it to run small programs).

One can think of "recording" as semantically meaningful history.

  • The interaction must be recorded
  • Recording must be displayed to the user
  • The user must be able to change recording when the shell gets the semantics wrong
  • Recording must have largest possible amount of information and context
  • Recording must support parametrization
  • To consider: "refine" previous command (TODO: explain)

A bit of history. In the beginning I thought that capturing the interaction should be by generating equivalent code on the spot. With time I changed my mind to structured data capturing instead. Generating code has following issues:

  • It doesn't capture the whole available information (or it's really ugly chunky code)
  • It does not allow for easy programmatic modification in case you want to modify the recording

TODO: elaborate on record/replay.