Skip to content

Releases: ServiceNow/BrowserGym

v0.13.3: minor fixes

27 Nov 19:56
Compare
Choose a tag to compare

What's Changed

browsergym-core

  • Optional method AbstractBrowserTask.get_task_id() #281
  • Fixed BrowserEnv parameter resizeable_window, now working as expected #281

browsergym-experiments

  • Metadata column fix for visualwebarena #278

Full Changelog: v0.13.2...v0.13.3

v0.13.2: experiments updates

21 Nov 20:10
Compare
Choose a tag to compare

What's Changed

browsergym-experiments

  • Experiment traces can now be exported into the TapeAgents format #238
  • Installs weblinx_browsergym as a dependency #261
  • WA/VWA full instance reset will only issue a warning instead of crashing if not properly set-up #272
  • New debug benchmark visualwebarena_tiny #271

Full Changelog: v0.13.1...v0.13.2

v0.13.1: Many small fixes

15 Nov 18:26
Compare
Choose a tag to compare

What's Changed

browsergym-experiments

browsergym-core

  • Fixed gym warnings "obs not within observation space" #251
  • Trace downgrades from INFO to DEBUG#252
  • More robust env.close(), can now be used in a finally block even after reset failure #253
  • Optional AbstractBrowserTask.teardown() method #255
  • Browsergym's register_task() now supports both frozen, non-overrideable task_kwargs as well as overrideable default_task_kwargs arguments #255
  • More robust frame marking #256 #258

browsergym-assistantbench

  • Refactored AssistantBench mechanism for saving test predictions to JSON files #242

browsergym-webarena

  • Relaxed playwright<1.40 restriction #257

browsergym-visualwebarena

  • Relaxed playwright<1.40 restriction #257

Full Changelog

v0.13.0...v0.13.1

v0.13.0: Minor updates

07 Nov 20:35
Compare
Choose a tag to compare

What's changed

browsergym-core

  • More robust frame marking with lenient last try #245
  • Tasks can now choose their own locale and timezone_id #244

browsergym-experiments

  • Pre-download WebLINX data in prepare_backend() #226
  • Increase AssistantBench max_steps to 30 #244
  • Add select_option to webarena / visualwebarena default action set #247

browsergym-visualwebarena

  • Hide huggingface progress bar when downloading the visual evaluation model #241

browsergym-assistantbench

  • Set locale="en-US" and timezone_id="America/New_York"

Full Changelog: v0.12.0...v0.13.0

v0.12.0: VisualWebarena / WebLINX bugfixes

04 Nov 19:44
Compare
Choose a tag to compare

Bugfixes

browsergym-experiments

  • Fixes WebLINX task list #235
  • Refactors experiment ID generation #236
  • Adds VisualWebArena task dependencies #237 #239

browsergym-visualwebarena

  • Fixes VisualWebArena tasks with visual validation (missing captioning_fn in evaluator) #240
  • Adds a torch dependency (to run the captioning model) #240

Full Changelog: v0.11.3...v0.12.0

v0.11.3: Minor fixes

01 Nov 15:30
Compare
Choose a tag to compare

Bugfixes

  • Fix duplicate depends_on in webarena metadata #228

Improvements

  • Easier webarena / visualwebarena setup with (running nltk.download() at import time) #227
  • More robust full_reset() for webarena / visualwebarena #230
  • Removed ARIA extraction warnings #233
  • New benchmark configuration webarena_tiny #232

Full Changelog: v0.11.2...v0.11.3

v0.11.2: Minor fix

30 Oct 20:25
Compare
Choose a tag to compare

Bugfixes

  • Add incomplete ExpResult.status #225

Full Changelog: v0.11.1...v0.11.2

v0.11.1: Benchmark update

30 Oct 19:29
Compare
Choose a tag to compare

New features

  • Set max steps to 30 in webarena / visualwenarena benchmarks #214
  • Benchmark dependency graph utilities #220
  • Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224

Bugfixes

  • Rename benchmark after subset_from_split() #221
  • ExpArgs.exp_dir sanitization #222
  • get_step_info() bugfix #223

Full Changelog: v0.11.0...v0.11.1

v0.11.0: WebLINX 🎉

30 Oct 15:38
Compare
Choose a tag to compare

New features

browsergym-experiments

browsergym-core

  • New hide_all_bids option in flatten_dom_to_str() and flatten_axtree_to_str() #212 (thanks @imenelydiaker)
  • Leaner Unicode() gym space #218

Bugfixes

  • Benchmark.prepare_backends() fixes #209

Full Changelog: v0.10.2...v0.11.0

v0.10.2: Benchmark update

24 Oct 15:38
Compare
Choose a tag to compare

New features

  • New Benchmark.prepare_backend() method #204

Bugfixes

  • save_step_info() bugfix when obs==None (truncated episode due to None action) #207

Full Changelog: v0.10.1...v0.10.2