Skip to content

wanghaisheng/screen-recording-tell

Repository files navigation

screen-recording-tell

turn screen recording to runnable scripts

https://github.com/reworkd/tarsier

Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True.

https://github.com/wuba/Picasso

https://github.com/nico1008/paint2code

https://github.com/MulongXie/Screen-Recognition

https://github.com/MulongXie/UIED/fork

https://github.com/MulongXie/UI-Captioning/fork

https://github.com/MulongXie/GUI-Perceptual-Grouping

https://github.com/MulongXie/UI2CODE

https://github.com/google-research-datasets/screen2words/blob/main/screen_summaries.csv