Skip to content
Ilya Sher edited this page Apr 1, 2024 · 10 revisions

UI Chain Design

This document is WIP and for discussion. 2024-03.

"Chain" in NGS UI refers to related commands run in sequence. "Timeline", the central UI part for interacting with NGS contains commands which are organized into "chains".

TODO: elaborate about chains and their construction.

The Problem

The shell has a CLI problem: the shell continues to use the CLI until this day while everything else switched to more productive interfaces that use the whole screen and allow interactions with objects on the screen. In case of the shell, that would be outputs of programs.

The Frustration

Seing objects on the screen (which are not even objects in a typicall shell, they are nothing, just text) and being unable to interact with them is driving me nuts. This is just wrong. Yet, that's exactly the experience in the shell. I refuse to believe that an interface which interacts with the user on a single line is the pinnacle of engineering and can be more productive than one that uses the whole screen. NGS is an attempt to fix this horrible UX.

Around 1974-1975, a new capability was added to terminals: cursor movement. The terminal stopped being 1:1 with printer and paper. Interaction with the whole screen became possible. In response to the new capability, Bill Joy released vi, in 1976. It brought us text editing as we know it today - using the whole screen. See anyone using ex or ed today? I don't. I think it's because whole-screen interaction won.

Shells, until this day, treat outputs of programs as if printed on paper - zero interactivity possible.

Frustration Example

You have just listed your AWS EC2 instances. The output is on your screen:

Instance ID Name Security Groups ...
i-1234 prod-web sg-5678 (HTTP from LB) ...
i-9abc temp sg-def1 (SSH from anywhere) ...
... ... ... ...

One of the instances has the tag Name with the value temp. You want to terminate that instance. Today, you are faced with two bad choices.

  • Copy the Instance ID value from the output above and paste it into the new command that you are constructing: aws ec2 terminate-instances --instance-ids PASTE-HERE. This creates unreproducible command. Tomorrow's temp instance will have different Instance ID. While you will have this command in your history, it won't be usable as is. (Side note: how much the shell doesn't care? The copy+paste functionality is implemented by the terminal, not the shell)
  • Pipes! Of course pipes! Construct a piece of code which would list the instances, find the one (or all) instances with Name being temp, and feed the extracted IDs to the terminate command. You are being pushed by the shell to start coding instead of focusing on your goal. It might feel "normal" to you today because that's what you had your whole life. It isn't.

The Solution

The shell must parse program's outputs and "understand" them in order to provide a meaningful interaction. Such interaction, needs to be captured on the semantic level (i.e. the meaning of what's happening) to allow recording for later replaying.

What if your shell does not understand the output?

Properly constructed software (where we aim here) should handle this situation well. Like AWS CDK exposes L1-L3 constructs, such layered approach is appropriate here too. The more the shell "understands" the more useful it is. You should be no worse off than other shell when the NGS doesn't. The answer is definitely not "we just add parser to NGS" for each command in the universe, like command line completion doesn't cover every possible program.

Note that "partial understanding" is an option. For example auto-detected JSON could allow some interactions with the data.

Interaction with objects on the screen... So, you are describing the web.

No. The common web UI is fine for one-off tasks. It's inadequate to serve as a shell's UI: it doesn't help reproducing the work. There is no appropriate history (navigation history is inadequate for serving as shell's history) nor facilities for recording/replaying.

AWS EC2 console recently (2024, I think) added something that helps with reproducibility. One can generate CloudFormation or CDK code from the operations that have just been performed in the UI. This functionality does have some resemblance to what NGS is attempting to do while the vision differs. For example navigating between resources (not for creation but for debugging) and parametrization is planned in NGS while it doesn't seem (my guess) that AWS would support that - it's way beyond your "normal" web UI.

The envisioned UI is somewhat similar to the web but keeping the history of what happened clear and mostly unmodifiable. There shouldn't be a single click or keypress that is lost and is not visisble in an interaction record.

Solution Example

Very coarse example of the main part of the UI.

command 1 (ex: list AWS CodePipelines)
output 1 (ex: Table of AWS CodePipelines)

interaction record 1 (ex: "You have selected the CodePipeline with 'Status' field 'Failed'")

command 2 (ex: Constructed from previous interaction "Show the failed execution of the selected CodePipeline")
output 2 (ex: the failed CodePipeline execution)

interaction record 2 (ex: "You have selected execution stage with 'Status' field 'Failed'")

...

Understanding of Program's Outputs

The shell is not supposed to do that

Many shells do have command line completion. It is a powerful, productive, and loved feature. It's based on understanding of program's inputs. What's puzzling for me is how the above argument draws the line right between understanding of inputs and understanding of outputs of a given program.

That's too much work!

It is quite a bit of work. Command line completion looks like the same order of magnitude and it is being done.

Interaction with Program's Outputs

The shell allows interaction with objects on the screen. Given "understanding" from the section above, it is possible to provide meaningful interaction, including plugin system where each plugin defines with which objects it works and which operations it supports (could be used for building context menu for example).

Record / Replay

Record / replay functionaliy is essential for automation. Current shells do not provide this capability. Capturing history as it's done by shells today is inadequate (unless you code every step in the shell and practivally use it to run small programs).

One can think of "recording" as semantically meaningful history.

  • The interaction must be recorded
  • Recording must be displayed to the user
  • The user must be able to change recording when the shell gets the semantics wrong
  • Recording must have the largest possible amount of information and context
  • Recording must support parametrization
  • To consider: "refine" previous command (TODO: explain)

A bit of history. In the beginning I thought that capturing the interaction should be by generating equivalent code on the spot. With time, I changed my mind to structured data capturing instead. Generating code has the following issues:

  • It doesn't capture the whole available information (or it's really ugly chunky code)
  • It does not allow for easy programmatic modification in case you want to modify the recording

TODO: elaborate on record/replay.

Design

Background: Programs' Outputs

Outputs of programs are one of the following categories:

  • unstructured text. For unstructured text, the shell can provide almost no interactivity. The shell could potentially guess (regex heuristics) what's in the text and provide interactivity based on that. Postponing this part of design as "less interesting".
  • structured data. For structured data (ex: JSON, YAML), the shell can provide common operations on structured data such as filtering, extracting, etc. Postponing this part of design as "less interesting" too.
  • semantically meaningful (to the shell) objects, SMOs. This document will focus on this type of objects. Ex: AWS CodePipeline pipeline, AWS CodeBuild project, AWS EC2 instance, File. In the foreseeable future, outputs of various programs will be parsed by shell's plugins into the SMOs.

Objects on the Screen

object on the screen in this document refers to object's textual or graphical representation on the screen. The link between representation on the screen and the "real" object is tracked by the UI, similar to how browser tracks HTML elements. Interaction with the representation on the screen is forwarded to the "real" object.

Interaction with objects on the screen is never the only way. Command line (text input) must always be available. The way text input and interaction with objects on the screen work must be unified as much as possible, the difference must minimized and must not be a hindrance to using the UI.

Interactive Objects

Interactive object, in our context is any object on the screen with which any kind of interaction is possible. Examples: a link, a button, a checkbox. Interactive here means to contrast from other "things" on the screen, typically text.

Actions on an Object

Given the type of object (and maybe some context, TBD), the shell "knows" which operations are possible on the object.

TBD: likely plugins register ahead of time which types of objects they support.

TBD: maybe plugins register supported operations (the alternative is to query the plugins each time, in: object, out: operations)

TODO: normalize "action" vs "operation" terminology. Likely: action - ui, operation - the "real" thing.

Default Action for an Object

For each interactive object, there should be a default action. That's the most likely action to be taken on the object. Examples:

  • Directory - change into the directory
  • A reference to AWS Security Group (ex: sg-5678 (HTTP from LB) above) - navigate and show that security group

Tables

The design will start with interaction with a table on the screen as it's extremely common data layout.

Example 1: AWS EC2 Instances

Instance ID Name Security Groups ...
i-1234 prod-web sg-5678 (HTTP from LB) ...
i-9abc temp sg-def1 (SSH from anywhere) ...
... ... ... ...

Example 2: AWS CodePipelines

name status last_execution_id approvals last_executed progress revisions
app1 Succeeded uuidv4-1 a1: Succeeded, a2: Succeeded (date) 19/19 Source1: commit, Source2: commit
app2 Failed uuidv4-2 (date) 15/17 Source1: commit, Source2: commit

Note that both status and last_execution_id are both pointing to the last execution (as of displaying of this table).

  • Table content
    • A row of a table represents an object, "the row object".
    • Columns represent various fields of an object
    • A cell represents a specific field of a specific object
    • TBD: having an id column with row number
    • TBD: additional types of interactive objects
  • Informative, follows from the above: any field/cell displayed in a UI can contain a reference(s) to another object. In the example above, sg-5678 (HTTP from LB) and sg-def1 (SSH from anywhere) can be links to the corresponding security groups.
  • Each table row must have a link to the object it represents
    • How to enforce? Convention? Likely, at least for now.
    • The link should be in the first column (unless proven otherwise).
    • The title of the column must have some hint that it represents the link to the row object (TBD: design)
    • The default action on the link would be to navigate to the object and show (more) detailed view of the object.
    • What happens of there is no meaningful id?
    • What happens if there is no "details" to show when navigating to the object?
  • Filtering
    • There should be a way for the user to convey that they are interested in a subset of the objects in the table.
      • For known/understandable commands this might result in "augmentation" of the command that outputted the table. Ex: add --filter XXX argument.
    • TODO: Design UI for filtering
      • TODO: How to convey field-name -> value(s)/patterns to filter on?
      • TODO: If the object is also a reference, how to distinguish between filtering intent and navigation intent?
      • TODO: Handle more complex filtering. Ex: "has tag" in Tags field.
      • TODO: Challenge - interaction with a cell which only consists of an interactive object
      • TODO: Challenge - filter + navigate operation

Tables - Scratchpad

Temporary notes to self

Default action should work as-is. The question is how to filter before. Options:

  • By the field of the reference
  • By another field
    • select a field with unique value?
    • select a field where the value is min() or max(), especially when sorted by that field
    • select combination of fields based on the criteria above?
    • select all fields?
  • Stop and ask
    • Use the answer for this time only or always or for some condition (ex: object type)
  • The recording should show how the filtering was guessed. Ex: unique value in column X.
  • There should be a way to modify how the guess is being made.
  • Suggest alternative guesses + allow full control over filtering, including scripting (preferably, simply defining NGS pattern)

Note that it's OK to guess because the recording is editable.

Some fields, like Instance ID should be rated as low probability to filter on because they are meaningless.

Combining filters

Interaction Outcome

After interaction with the object took place the following happens:

  • A recording is shown (see below)
    • A recording can consist of multiple logical operations, as long as they were result of single UI interaction. Ex: filter instances to only include ones with Name tag temp, assert there is only one in the new list, navigate to it.
  • The operation is executed and the output (likely interactive too) is displayed

Recording

Any interaction (typed command, click, menu + item selection, etc) must be recorded and presented to the user. It is unacceptable to lose track of how current state was achieved.

When recording is edited, a new copy is made. Never ruin the history of what actually happened.

Replaying

TODO