Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TagUI for Desktop Applications --> use visual automation #113

Closed
ArulKarthickKuppusamy opened this issue Apr 3, 2018 · 26 comments
Closed
Assignees
Labels

Comments

@ArulKarthickKuppusamy
Copy link

Hi Ken,

We are trying to Implement TagUI for automating Desktop Applications. It would be great if we have some documents/videos related to automating desktop applications.

If Possible please share the details to [email protected]

Thanks,
Arul

@kensoh
Copy link
Member

kensoh commented Apr 3, 2018

Hi Arul, thanks for asking this. TagUI relies on visual recognition to automate desktop applications.

More details here. Steps that support visual automation are click, hover, type, select, read, show, save, snap. For example, below automation flow tries to send an email through Outlook, by looking for best matches of images of respective UI elements.

Attaching the sample images for reference - samples.zip. They won't work on Windows Outlook (or macOS Outlook of different versions), as the UI icons will look different for different OS and versions. But will be good to see to get a better idea how visual recognition is used to control UI actions.

Helper function visible() can also be used to detect whether an image is visible.

click outlook-icon.png
click new-email.png
enter mail-body.png as Hi Whoever,\n\nAttached are the M1 numbers.\n\nRegards,\nKen
enter subject-field.png as M1 Lucky Numbers
enter to-field.png as [email protected]
click attach-button.png
click numbers-icon.png
click choose-button.png
click send-button.png

Below is another example of visual automation

Using OCR to grab text from PDF (alternatively use Python libraries or other CLI tools), followed by typing and printing thank-you letter from MS Word. Attached images - word_samples.zip

click minimize.png  
dclick receipt.png
wait 2 seconds
read page.png to receipt_text
write receipt_text to receipt.txt
click close_pdf.png

dclick letter.png
wait 8 seconds
dclick address.png
type page.png as John Lim[enter]123 ABC Street[enter]Singapore 1234567[clear]
dclick name.png
type page.bmp as John
dclick amount.png
type page.png as $123.00
click file.png
click print.png
click confirm.png
click close_word.png
click dontsave.png

@kensoh
Copy link
Member

kensoh commented Apr 12, 2018

Attaching the sample images for reference. They won't work on Windows Outlook (or macOS Outlook of different versions), as the UI icons will look different for different OS and versions. But will be good to see as samples to get a better idea how visual recognition is used to control UI actions - samples.zip

@kensoh
Copy link
Member

kensoh commented May 9, 2018

Re-posting a great comment from @adegard here since this issue is related to AHK and is still open.


Thank you @kensoh for your answer. I'm a beginner devlopper so virtual display seems to me a little bit complicate for now...

Aboout AutoHotKey, I would like to make a little tool for editing Tagui script if it is possible... Can I share with you a repository to work on it? it is a little menu to remember mains comands in english (activated by using crtl+left click), it is not completed, but I could share it to you and other user: https://github.com/adegard/tagui_scripts

I read about AI Singapore and other blogs on RPA, it seems that for beginners UIpath is a bit complicate and RPA express too much big program to install... So in my opinion Tagui is a very good alternative, simple and leight. Please continue your project, even it's so hard to maintain ;-)

@kensoh
Copy link
Member

kensoh commented May 10, 2018

Hi @adegard wow looks cool! I've just tried out your AHK TagUI commands helper. I think I have to create a new section on TagUI home page to link to tools and stuffs that the community create 😄

PS - thanks very much for your feedback and encouragement! Yes TagUI will continue to be maintained to make RPA accessible to a broader user community than large organizations with deep pockets.

ahk_helper

@adegard
Copy link
Contributor

adegard commented May 10, 2018

OK thank @kensoh
so I will complete the helper tool.. for my personal use. I need it to don't remember all commands in 6 months!! so I will copy all your example in it to render it more friendly.

I ' m not using TagUI on server but on my personal PC, so for me headless script combines with cron (like z-cron tool) is very important! but at the same time, I need some tool to "accelerate" the process of script production, because we have a lot of things to automatize... !
I will do my best to complete the ahk script!
thanks again

@kensoh
Copy link
Member

kensoh commented Jun 14, 2018

I'm only looking at 3 items in pipeline for TagUI before hitting maintenance mode.

  1. integrating with desktop apps - TagUI for Desktop Applications --> use visual automation #113
  2. assistant for writing scripts - TagUI Writer 1.01 : helper tool for coding (most useful commands for beginners) #188
  3. for loop break and continue - For loop bugfix - explore enabling break and continue within for loops #216

May reach out to other open-source RPA software maintainers to look at collaboration. Was thinking yesterday if can make a great open-source RPA tool and pass on to @microsoft or another large tech company to maintain, can put pressure on commercial RPA tools to raise the quality and ease-of-use of their free versions. That should lead to the largest impact on the RPA ecosystem.

@kensoh kensoh changed the title How to Implement TagUI for Desktop Applications Using TagUI for Desktop Applications - review possible Sikuli visual automation enhancements Jun 15, 2018
@kensoh kensoh changed the title Using TagUI for Desktop Applications - review possible Sikuli visual automation enhancements TagUI for Desktop Applications - review possible Sikuli visual automation enhancements Jun 15, 2018
@kensoh
Copy link
Member

kensoh commented Jun 15, 2018

Besides the example above on outlook, using vision step, users can send custom commands to Sikuli to do things like typing complex keystroke sequences. There also seems to be a trend towards using computer vision for UI automation of desktop apps. This is happening for commercial RPA software and also startups such as http://www.intellibot.io.

Furthermore, I can't see a sensible way to harmonize the steps API for AutoHotkey or RoroScript with TagUI. They are all different powerful tools, but to try to force an integration for the sake of integrating is senseless. Users will be better off writing the automation flows directly in those software and using run step or api step to invoke those part of the automation, if they still want to manage the whole flow from within TagUI.

Because of this, have decided to abandon efforts on trying to integrate natively with AHK or RoroScript but instead use the effort to review possible ways to improve Sikuli's visual automation integration. Folks who want integration with desktop apps, just give a shout here your use scenarios and let's see what can be done to run those automation workflows using TagUI-Sikuli's native integration.

CC @Aussiroth @lohvht - we can discuss next week some examples of use scenarios for desktop apps, and explore ways to make it easy + accurate to run visual automation on them.

@kensoh
Copy link
Member

kensoh commented Jun 15, 2018

1 idea is make it super simple to create customized workflows for different desktop apps. For eg, having a 'module' for excel 20XX, a 'module' for outlook 20XX. Where each module is nothing more than folders with images of UI elements that we can either create ourselves or let users submit as PRs.

And perhaps coupled with that some automation flows that can be called via tagui steps to do some action. eg tagui excel/create_new_sheet (that also means tagui step need to support sending parameters as part of the step). @adegard's screen-capture tool will come in very handy 😄

@kensoh kensoh changed the title TagUI for Desktop Applications - review possible Sikuli visual automation enhancements TagUI for Desktop Applications - explore Sikuli visual automation enhancements Jun 15, 2018
@kensoh kensoh self-assigned this Jun 15, 2018
kensoh added a commit that referenced this issue Jun 18, 2018
- supports [enter] and [clear] keywords just like the standard type step for webpages
- trigger word is page.png and page.bmp, just like steps snap, read, show, save
@kensoh
Copy link
Member

kensoh commented Jun 18, 2018

above commit adds visual automation for type page.png as text

  • supports [enter] and [clear] keywords just like the standard type step for webpages
  • trigger word is page.png and page.bmp, just like the steps snap, read, show, save

prior to this, type step can only type into an UI element on screen, eg type search_bar.png as 123

@kensoh
Copy link
Member

kensoh commented Jun 18, 2018

Have looked through sikuli's doc. can't find anything else that should be implemented directly as part of tagui steps. for those niche custom commands, vision step can be used - more details of sikuli commands here - http://doc.sikuli.org and here - http://sikulix-2014.readthedocs.io/en/latest

Closing the issue for now, the screen capture utility to facilitate capturing image snapshots can be done as part of #188. The modules idea above is worth exploring when the time is ripe (for community contributed images of elements). also copying @Aussiroth @lohvht for further inputs.

@kensoh kensoh closed this as completed Jun 18, 2018
@kensoh kensoh changed the title TagUI for Desktop Applications - explore Sikuli visual automation enhancements TagUI for Desktop Applications --> use Sikuli visual automation Jun 21, 2018
@kensoh kensoh changed the title TagUI for Desktop Applications --> use Sikuli visual automation TagUI for Desktop Applications --> use visual automation Jun 21, 2018
@kensoh
Copy link
Member

kensoh commented Jul 4, 2018

User question - just to clarify, what is the page.png? and also what does the highlighted codes mean?

click minimize.png
dclick receipt.png
wait 2 seconds
read page.png to receipt_text
write receipt_text to receipt.txt
click close_pdf.png

dclick letter.png
wait 8 seconds
dclick address.png
type page.png as John Lim[enter]123 ABC Street[enter]Singapore 1234567[clear]
dclick name.png
type page.bmp as John
dclick amount.png
type page.png as $123.00
click file.png
click print.png
click confirm.png
click close_word.png
click dontsave.png


My reply

For visual automation, TagUI looks out for .png or .bmp names instead of element identifiers referring to webpage UI (user-interface) elements.

read page to xxx normally means read text contents of the webpage to variable xxx. read page.png to receipt_text uses visual recognition and OCR (optical character recognition) to read the text on whole screen to the variable receipt_text. it's trying to capture the text from the PDF file to save into a text file.

More details of the visual automation here -
https://github.com/kelaberetiv/TagUI#visual-automation

write receipt_text to receipt.txt saves the variable to a text file receipt.txt

More details of all the TagUI steps here -
https://github.com/kelaberetiv/TagUI#steps-description

@vijendra-impetus
Copy link

@kensoh ,

I tried the below steps:

dclick receipt.png
wait 2 seconds
read page.png to receipt_text
write receipt_text to receipt.txt
click close_pdf.png

But after below message it got stuck nothing happening I tried several times by clicking on different images in folder but its not clicking on any image.

tagui D:\TagUI_Windows\word_samples\pdfread
[starting sikuli process]

START - automation started - Thu Jul 05 2018 15:17:15 GMT+0530 (India Standard Time)

click D:/TagUI_Windows/word_samples/confirm.png

@kensoh
Copy link
Member

kensoh commented Jul 5, 2018

Hi @vijendra-impetus recently a user has a similar problem when using the visual automation on Windows. It just hangs after running, even when Sikuli and Java has been installed.

This is the solution that works for her, see here to see if it helps your situation - #229

If not, can you paste the contents of the tagui_windows.log file in src\tagui\tagui.sikuli here to see what is the error messages in backend?

@vijendra-impetus
Copy link

Hi @kensoh ,

As per the solution the logs were printing in log files are like below :

+++ running this Java
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
+++ trying to run SikuliX
+++ using: -Xms64M -Xmx512M -Dfile.encoding=UTF-8 -Dsikuli.FromCommandLine -jar c:\tagui\src\tagui.sikuli\sikulix.jar -r tagui.sikuli
Jul 03, 2018 11:40:41 AM java.util.prefs.WindowsPreferences
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
[tagui] START - listening for inputs

[tagui] FINISH - stopped listening

But when I check the log file as mentioned by you at location src\tagui\tagui.sikuli , the log file (tagui_windows.log) is completely empty. Nothing is there in log file.

@kensoh
Copy link
Member

kensoh commented Jul 13, 2018

Hi @vijendra-impetus can you check inside the tagui\src\tagui.sikuli folder, is there a runsikulix file? That file should be there is installation is completed.

For installation, see these steps (now in midst of updating visual automation in main documentation to have these details in tutorial) - https://github.com/kelaberetiv/TagUI/blob/master/src/media/RPA%20Workshop.md#visual-automation

@mathiasx88
Copy link

mathiasx88 commented Feb 14, 2019

Hi Kenson,

Recently i trying out the tagui web automation chrome extension, after the script are generated, when i try to run the script, it indicate that the web element are not found. Can assist to advice. Some of the element are able to get a response, but some does not.

https://www.google.com/ click .gLFyf.gsfi enter .gLFyf.gsfi as github[enter] click .aajZCb input:nth-child(1) click .bkWMgd:nth-child(1) .LC20lb

@kensoh
Copy link
Member

kensoh commented Feb 15, 2019

Hi @mathiasx88 the recording is not foolproof, you can try using XPath by inspecting directly from your web browser, for example - https://github.com/kelaberetiv/TagUI#find-xpath-of-web-element

After you copy XPath using the example in the link above, you can perform TagUI actions on the element using the familiar steps. Besides copying from browser, it is a good investment to learn XPath and writing your own XPath locator. It is very expressive and very useful for selecting web elements.

@mathiasx88
Copy link

Hi, @kensoh

I am able to run the tagui script directly from command prompt now. But when i try to run for firefox, i will encounter error. Below are the screenshot of the error. Will you be able to advice? Thanks alot.

image

@kensoh
Copy link
Member

kensoh commented Feb 24, 2019

Hi @mathiasx88 yes Firefox has an overhaul from v60 and SlimerJS is not compatible yet. More details here on using Firefox (for eg using older version or automating it visually) - #344 (comment)

@mathiasx88
Copy link

Hi @kensoh

May i check, i tried to use the following command to clear the text field that come with default value 65, but whenever i run the command, it does not clear. Can assist to advice.

type /html/body/div/section[4]/div/div[2]/div/form[1]/div[13]/div[1]/input as [clear]8938392[enter]

@kensoh
Copy link
Member

kensoh commented May 31, 2019

It might be the XPath is wrong or other reasons, but hard to take a look without replication steps.

@oai1228
Copy link

oai1228 commented Oct 16, 2019

@kensoh
Hi I have some questions

I wrote code below:
dclick /Users/desktop nate.png
wait 3
snap page
snap logo
snap page as nate_sample.png
snap logo as nate_sample2.png
wait 3

In cmd,
START - automation started - Wed Oct 16 2019 15:05:45 GMT+0900 (?�?쒕?援??쒖???

dclick /Users/議곗슦??desktop nate.png
....

does not working well
Can you explain about visual automation, and why that code dose not working

@kensoh
Copy link
Member

kensoh commented Oct 16, 2019

Hi @oai1228 I think there cannot a space in the file name - dclick /Users/desktop nate.png
Try using something simple like nate.png without space to see if it works.

Visual automation requires Java SDK (64-bit), see here for details -
https://github.com/kelaberetiv/TagUI#visual-automation

Finally, check the log files in tagui/src/tagui.sikuli folder to see what is the error message.

@oai1228
Copy link

oai1228 commented Oct 22, 2019

Hi ken, I have problem again,

START - automation started - Tue Oct 22 2019 17:24:52 GMT+0900 (?�?쒕?援??쒖???
dclick c:/TagUI/tagui/src/samples/ever.png

and then cmd dose not work.
I already download Java SDK (64-bit)

and I can't understand , Where do program click on the picture?
이미지 3

@kensoh
Copy link
Member

kensoh commented Oct 22, 2019

Thanks @oai1228, looks like no other users have encountered this problem before, some next steps to try -

  1. take image of your windows start button and name it as start.png
  2. in your automation script, write one line click start.png
  3. run automation and check the log file in tagui\src\tagui.sikuli

@yoga212121
Copy link

yoga212121 commented Jun 26, 2024

hey @kensoh is it possible that my flow is entering login credentials on some webpage simultaneously while i am writing a a report on another site, or does it have to be undisturbed during the flow, is it possible for the website actions such as click type etc to run in background while i am performing some other operation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants