-
Notifications
You must be signed in to change notification settings - Fork 43
UI Chain Design
This document is WIP and for discussion. 2024-03.
"Chain" in NGS UI refers to related commands run in sequence. "Timeline", the central UI part for interacting with NGS contains commands which are organized into "chains".
TODO: elaborate about chains and their construction.
The shell has a CLI problem: the shell continues to use the CLI until this day while everything else switched to more productive interfaces that use the whole screen and allow interactions with objects on the screen. In case of the shell, that would be outputs of programs.
Seing objects on the screen (which are not even objects in a typicall shell, they are nothing, just text) and being unable to interact with them is driving me nuts. This is just wrong. Yet, that's exactly the experience in the shell. I refuse to believe that an interface which interacts with the user on a single line is the pinnacle of engineering and can be more productive than one that uses the whole screen. NGS is an attempt to fix this horrible UX.
Around 1974-1975, a new capability was added to terminals: cursor movement. The terminal stopped being 1:1 with printer and paper. Interaction with the whole screen became possible. In response to the new capability, Bill Joy released vi
, in 1976. It brought us text editing as we know it today - using the whole screen. See anyone using ex
or ed
today? I don't. I think it's because whole-screen interaction won.
Shells, until this day, treat outputs of programs as if printed on paper - zero interactivity possible.
You have just listed your AWS EC2 instances. The output is on your screen:
Instance ID | Name | Security Groups | ... |
---|---|---|---|
i-1234 | prod-web | sg-5678 (HTTP from LB) | ... |
i-9abc | temp | sg-def1 (SSH from anywhere) | ... |
... | ... | ... | ... |
One of the instances has the tag Name
with the value temp
. You want to terminate that instance. Today, you are faced with two bad choices.
- Copy the
Instance ID
value from the output above and paste it into the new command that you are constructing:aws ec2 terminate-instances --instance-ids PASTE-HERE
. This creates unreproducible command. Tomorrow'stemp
instance will have differentInstance ID
. While you will have this command in your history, it won't be usable as is. (Side note: how much the shell doesn't care? The copy+paste functionality is implemented by the terminal, not the shell) -
Pipes! Of course pipes! Construct a piece of code which would list the instances, find the one (or all) instances with
Name
beingtemp
, and feed the extracted IDs to the terminate command. You are being pushed by the shell to start coding instead of focusing on your goal. It might feel "normal" to you today because that's what you had your whole life. It isn't.
The shell must parse program's outputs and "understand" them in order to provide a meaningful interaction. Such interaction, needs to be captured on the semantic level (i.e. the meaning of what's happening) to allow recording for later replaying.
What if your shell does not understand the output?
Properly constructed software (where we aim here) should handle this situation well. Like AWS CDK exposes L1-L3 constructs, such layered approach is appropriate here too. The more the shell "understands" the more useful it is. You should be no worse off than other shell when the NGS doesn't. The answer is definitely not "we just add parser to NGS" for each command in the universe, like command line completion doesn't cover every possible program.
Note that "partial understanding" is an option. For example auto-detected JSON could allow some interactions with the data.
Interaction with objects on the screen... So, you are describing the web.
No. The common web UI is fine for one-off tasks. It's inadequate to serve as a shell's UI: it doesn't help reproducing the work. There is no appropriate history (navigation history is inadequate for serving as shell's history) nor facilities for recording/replaying.
AWS EC2 console recently (2024, I think) added something that helps with reproducibility. One can generate CloudFormation or CDK code from the operations that have just been performed in the UI. This functionality does have some resemblance to what NGS is attempting to do while the vision differs. For example navigating between resources (not for creation but for debugging) and parametrization is planned in NGS while it doesn't seem (my guess) that AWS would support that - it's way beyond your "normal" web UI.
The envisioned UI is somewhat similar to the web but keeping the history of what happened clear and mostly unmodifiable. There shouldn't be a single click or keypress that is lost and is not visisble in an interaction record.
Very coarse example of the main part of the UI.
command 1 (ex: list AWS CodePipelines)
output 1 (ex: Table of AWS CodePipelines)
interaction record 1 (ex: "You have selected the CodePipeline with 'Status' field 'Failed'")
command 2 (ex: Constructed from previous interaction "Show the failed execution of the selected CodePipeline")
output 2 (ex: the failed CodePipeline execution)
interaction record 2 (ex: "You have selected execution stage with 'Status' field 'Failed'")
...
The shell is not supposed to do that
Many shells do have command line completion. It is a powerful, productive, and loved feature. It's based on understanding of program's inputs. What's puzzling for me is how the above argument draws the line right between understanding of inputs and understanding of outputs of a given program.
That's too much work!
It is quite a bit of work. Command line completion looks like the same order of magnitude and it is being done.
The shell allows interaction with objects on the screen. Given "understanding" from the section above, it is possible to provide meaningful interaction, including plugin system where each plugin defines with which objects it works and which operations it supports (could be used for building context menu for example).
Record / replay functionaliy is essential for automation. Current shells do not provide this capability. Capturing history as it's done by shells today is inadequate (unless you code every step in the shell and practivally use it to run small programs).
One can think of "recording" as semantically meaningful history.
- The interaction must be recorded
- Recording must be displayed to the user
- The user must be able to change recording when the shell gets the semantics wrong
- Recording must have the largest possible amount of information and context
- Recording must support parametrization
- To consider: "refine" previous command (TODO: explain)
A bit of history. In the beginning I thought that capturing the interaction should be by generating equivalent code on the spot. With time, I changed my mind to structured data capturing instead. Generating code has the following issues:
- It doesn't capture the whole available information (or it's really ugly chunky code)
- It does not allow for easy programmatic modification in case you want to modify the recording
TODO: elaborate on record/replay.
Outputs of programs are one of the following categories:
- unstructured text. For unstructured text, the shell can provide almost no interactivity. The shell could potentially guess (regex heuristics) what's in the text and provide interactivity based on that. Postponing this part of design as "less interesting".
- structured data. For structured data (ex: JSON, YAML), the shell can provide common operations on structured data such as filtering, extracting, etc. Postponing this part of design as "less interesting" too.
- semantically meaningful (to the shell) objects, SMOs. This document will focus on this type of objects. Ex: AWS CodePipeline pipeline, AWS CodeBuild project, AWS EC2 instance, File. In the foreseeable future, outputs of various programs will be parsed by shell's plugins into the SMOs.
object on the screen in this document refers to object's textual or graphical representation on the screen. The link between representation on the screen and the "real" object is tracked by the UI, similar to how browser tracks HTML elements. Interaction with the representation on the screen is forwarded to the "real" object.
Interaction with objects on the screen is never the only way. Command line (text input) must always be available. The way text input and interaction with objects on the screen work must be unified as much as possible, the difference must minimized and must not be a hindrance to using the UI.
Interactive object, in our context is any object on the screen with which any kind of interaction is possible. Examples: a link, a button, a checkbox. Interactive here means to contrast from other "things" on the screen, typically text.
Given the type of object (and maybe some context, TBD), the shell "knows" which operations are possible on the object.
TBD: likely plugins register ahead of time which types of objects they support.
TBD: maybe plugins register supported operations (the alternative is to query the plugins each time, in: object, out: operations)
TODO: normalize "action" vs "operation" terminology. Likely: action - ui, operation - the "real" thing.
For each interactive object, there should be a default action. That's the most likely action to be taken on the object. Examples:
- Directory - change into the directory
- A reference to AWS Security Group (ex:
sg-5678 (HTTP from LB)
above) - navigate and show that security group
The design will start with interaction with a table on the screen as it's extremely common data layout.
Example 1: AWS EC2 Instances
Instance ID | Name | Security Groups | ... |
---|---|---|---|
i-1234 | prod-web | sg-5678 (HTTP from LB) | ... |
i-9abc | temp | sg-def1 (SSH from anywhere) | ... |
... | ... | ... | ... |
Example 2: AWS CodePipelines
name | status | last_execution_id | approvals | last_executed | progress | revisions |
---|---|---|---|---|---|---|
app1 | Succeeded | uuidv4-1 | a1: Succeeded, a2: Succeeded | (date) | 19/19 | Source1: commit, Source2: commit |
app2 | Failed | uuidv4-2 | (date) | 15/17 | Source1: commit, Source2: commit |
Note that both status
and last_execution_id
are both pointing to the last execution (as of displaying of this table).
- Table content
- A row of a table represents an object, "the row object".
- Columns represent various fields of an object
- A cell represents a specific field of a specific object
- TBD: having an
id
column with row number - TBD: additional types of interactive objects
- Informative, follows from the above: any field/cell displayed in a UI can contain a reference(s) to another object. In the example above,
sg-5678 (HTTP from LB)
andsg-def1 (SSH from anywhere)
can be links to the corresponding security groups. - Each table row must have a link to the object it represents
- How to enforce? Convention? Likely, at least for now.
- The link should be in the first column (unless proven otherwise).
- The title of the column must have some hint that it represents the link to the row object (TBD: design)
- The default action on the link would be to navigate to the object and show (more) detailed view of the object.
- What happens of there is no meaningful id?
- What happens if there is no "details" to show when navigating to the object?
- Filtering
- There should be a way for the user to convey that they are interested in a subset of the objects in the table.
- For known/understandable commands this might result in "augmentation" of the command that outputted the table. Ex: add
--filter XXX
argument.
- For known/understandable commands this might result in "augmentation" of the command that outputted the table. Ex: add
-
TODO: Design UI for filtering
- TODO: How to convey field-name -> value(s)/patterns to filter on?
- TODO: If the object is also a reference, how to distinguish between filtering intent and navigation intent?
- TODO: Handle more complex filtering. Ex: "has tag" in
Tags
field. - TODO: Challenge - interaction with a cell which only consists of an interactive object
- TODO: Challenge - filter + navigate operation
- There should be a way for the user to convey that they are interested in a subset of the objects in the table.
Temporary notes to self
Default action should work as-is. The question is how to filter before. Options:
- By the field of the reference
- By another field
- select a field with unique value?
- select a field where the value is min() or max(), especially when sorted by that field
- select combination of fields based on the criteria above?
- select all fields?
- Stop and ask
- Use the answer for
this time only
oralways
or for some condition (ex: object type)
- Use the answer for
- The recording should show how the filtering was guessed. Ex:
unique value in column X
. - There should be a way to modify how the guess is being made.
- Suggest alternative guesses + allow full control over filtering, including scripting (preferably, simply defining NGS pattern)
Note that it's OK to guess because the recording is editable.
Some fields, like Instance ID
should be rated as low probability to filter on because they are meaningless.
Combining filters
After interaction with the object took place the following happens:
- A recording is shown (see below)
- A recording can consist of multiple logical operations, as long as they were result of single UI interaction. Ex: filter instances to only include ones with
Name
tagtemp
, assert there is only one in the new list, navigate to it.
- A recording can consist of multiple logical operations, as long as they were result of single UI interaction. Ex: filter instances to only include ones with
- The operation is executed and the output (likely interactive too) is displayed
Any interaction (typed command, click, menu + item selection, etc) must be recorded and presented to the user. It is unacceptable to lose track of how current state was achieved.
When recording is edited, a new copy is made. Never ruin the history of what actually happened.
TODO
NGS official website is at https://ngs-lang.org/