Releases: ServiceNow/BrowserGym
Releases · ServiceNow/BrowserGym
v0.13.3: minor fixes
What's Changed
browsergym-core
- Optional method
AbstractBrowserTask.get_task_id()
#281 - Fixed
BrowserEnv
parameterresizeable_window
, now working as expected #281
browsergym-experiments
- Metadata column fix for visualwebarena #278
Full Changelog: v0.13.2...v0.13.3
v0.13.2: experiments updates
What's Changed
browsergym-experiments
- Experiment traces can now be exported into the TapeAgents format #238
- Installs weblinx_browsergym as a dependency #261
- WA/VWA full instance reset will only issue a warning instead of crashing if not properly set-up #272
- New debug benchmark
visualwebarena_tiny
#271
Full Changelog: v0.13.1...v0.13.2
v0.13.1: Many small fixes
What's Changed
browsergym-experiments
browsergym-core
- Fixed gym warnings "obs not within observation space" #251
- Trace downgrades from
INFO
toDEBUG
#252 - More robust
env.close()
, can now be used in a finally block even after reset failure #253 - Optional
AbstractBrowserTask.teardown()
method #255 - Browsergym's
register_task()
now supports both frozen, non-overrideabletask_kwargs
as well as overrideabledefault_task_kwargs
arguments #255 - More robust frame marking #256 #258
browsergym-assistantbench
- Refactored AssistantBench mechanism for saving test predictions to JSON files #242
browsergym-webarena
- Relaxed playwright<1.40 restriction #257
browsergym-visualwebarena
- Relaxed playwright<1.40 restriction #257
Full Changelog
v0.13.0: Minor updates
What's changed
browsergym-core
- More robust frame marking with lenient last try #245
- Tasks can now choose their own
locale
andtimezone_id
#244
browsergym-experiments
- Pre-download WebLINX data in prepare_backend() #226
- Increase AssistantBench max_steps to 30 #244
- Add
select_option
to webarena / visualwebarena default action set #247
browsergym-visualwebarena
- Hide huggingface progress bar when downloading the visual evaluation model #241
browsergym-assistantbench
- Set
locale="en-US"
andtimezone_id="America/New_York"
Full Changelog: v0.12.0...v0.13.0
v0.12.0: VisualWebarena / WebLINX bugfixes
Bugfixes
browsergym-experiments
- Fixes WebLINX task list #235
- Refactors experiment ID generation #236
- Adds VisualWebArena task dependencies #237 #239
browsergym-visualwebarena
- Fixes VisualWebArena tasks with visual validation (missing captioning_fn in evaluator) #240
- Adds a
torch
dependency (to run the captioning model) #240
Full Changelog: v0.11.3...v0.12.0
v0.11.3: Minor fixes
Bugfixes
- Fix duplicate depends_on in webarena metadata #228
Improvements
- Easier webarena / visualwebarena setup with (running
nltk.download()
at import time) #227 - More robust
full_reset()
for webarena / visualwebarena #230 - Removed ARIA extraction warnings #233
- New benchmark configuration
webarena_tiny
#232
Full Changelog: v0.11.2...v0.11.3
v0.11.2: Minor fix
v0.11.1: Benchmark update
New features
- Set max steps to 30 in webarena / visualwenarena benchmarks #214
- Benchmark dependency graph utilities #220
- Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224
Bugfixes
- Rename benchmark after subset_from_split() #221
- ExpArgs.exp_dir sanitization #222
- get_step_info() bugfix #223
Full Changelog: v0.11.0...v0.11.1
v0.11.0: WebLINX 🎉
New features
browsergym-experiments
browsergym-core
- New
hide_all_bids
option inflatten_dom_to_str()
andflatten_axtree_to_str()
#212 (thanks @imenelydiaker) - Leaner
Unicode()
gym space #218
Bugfixes
Benchmark.prepare_backends()
fixes #209
Full Changelog: v0.10.2...v0.11.0
v0.10.2: Benchmark update
New features
- New
Benchmark.prepare_backend()
method #204
Bugfixes
save_step_info()
bugfix whenobs==None
(truncated episode due toNone
action) #207
Full Changelog: v0.10.1...v0.10.2