turn screen recording to runnable scripts
https://github.com/reworkd/tarsier
Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True.
https://github.com/wuba/Picasso
https://github.com/nico1008/paint2code
https://github.com/MulongXie/Screen-Recognition
https://github.com/MulongXie/UIED/fork
https://github.com/MulongXie/UI-Captioning/fork
https://github.com/MulongXie/GUI-Perceptual-Grouping
https://github.com/MulongXie/UI2CODE
https://github.com/google-research-datasets/screen2words/blob/main/screen_summaries.csv