Skip to content

Releases: ServiceNow/BrowserGym

v0.10.1: Benchmark updates

23 Oct 19:06
Compare
Choose a tag to compare

Minor changes

  • train / test splits for WorkArena L2 and L3 tasks #203
  • More fine-grained per-benchmark action sets #202

Full Changelog: v0.10.0...v0.10.1

v0.10.0: AssistantBench! 🎉

23 Oct 14:50
Compare
Choose a tag to compare

New features

  • New BrowserGym benchmark AssistantBench, packaged as browsergym-assistantbench. Thanks @oriyor ! #186
    import browsergym.assistantbench
    
    env = gym.make("browsergym/assistantbench.validation.12")
    env = gym.make("browsergym/assistantbench.test.42")
  • Default train/test splits for all benchmarks
    miniwob = DEFAULT_BENCHMARKS["miniwob"]  # 125 tasks x 5 seeds
    miniwob_train = miniwob.subset_from_split("train")  # 62 tasks x 5 seeds
    miniwob_test = miniwob.subset_from_split("test")  # 63 tasks x 5 seeds

Breaking Changes

  • Various updates and refactors to the new Benchmark class #197 #198 #199

Fixes

  • Improved experiment logging #182

Full Changelog: v0.9.0...v0.10.0

v0.9.0: Benchmarks! 🎉

19 Oct 01:16
Compare
Choose a tag to compare

New features

  • Benchmarks with default config (tasks x seeds) and metadata #173 #191
    from browsergym.experiments import BENCHMARKS, Benchmark
    
    # make a custom benchmark
    benchmark = Benchmark(
      name="miniwob_click_test",
      high_level_action_set_args=HighLevelActionSetArgs(
        subsets=["bid"],
        multiaction=False,
        strict=False,
        retry_with_force=False,
        demo_mode="off",
      ),
      env_args_list=[
        EnvArgs(
          task_name="miniwob.click-test",
          task_seed=42,
          max_steps=5,
       )
      ],
    )
    
    # use a pre-existing benchmark
    miniwob = BENCHMARKS["miniwob_all"]()
    
    # use only a task subset
    miniwob_original = miniwob.subset_from_glob(
     column="miniwob_category", glob="original"
    )
  • New playwright key modifier "ControlOrMeta" #187
  • Global demo_mode flag #177
    import browsergym.core.action
    
    browsergym.core.action.set_global_demo_mode(True)  # boolean

Fixes

  • Multi-tab actions fix #188

Full Changelog: v0.8.1...v0.9.0

v0.8.1 - SoM bugfix

15 Oct 20:04
Compare
Choose a tag to compare

Fixes

browsergym-core

  • fixed a bug with set-of-marks line drawing #184 #185

v0.8.0: goal_object

08 Oct 21:40
Compare
Choose a tag to compare

browsergym-core

  • Breaking changes
    • goal refactor #110
      obs["goal_object"] now replaces the old obs["goal_image_urls"]
      obs["goal"] is now deprecated
      the new goal_object now contains a list of openai-style messages, which can include an arbitrary mix of text and / or images.

browsergym-visualwebarena

  • Breaking changes

    • goal refactor #110, the goal is now a list of openai-style messages with goal images as base64 image_url messages.
  • Fixes

    • goal images are now self-hosted as part of the homepage #171 #165

browsergym-experiments

  • Improvements
    • leaner trace files #169

other

  • the legacy demo agent has been removed
  • the basic demo agent has been leaned out and upgraded to support the new goal_object format #110
  • other minor changes #166 #164

v0.7.1: bugfixes

27 Sep 13:39
Compare
Choose a tag to compare

browsergym-core

  • Depencency bump
    • playwright dependency bumped from playwright>=1.32,<1.40 to playwright>=1.39,==1.* (fixes problems with 1.32.1) #159

browsergym-experiments

  • Bugfixes
    • #155 introduced a bug when agents returns action==None. This was fixed by #163

v0.7.0: changes in experiments

20 Sep 19:58
Compare
Choose a tag to compare

browsergym-experiments

  • Breaking change

    • save package versions to a separate file package_versions.txt instead of the logs #152
  • Bugfixes

v0.6.4: bugfix

19 Sep 13:50
Compare
Choose a tag to compare

browsergym-core

  • Bugfix
    • fixed bug in Frame has been detached catching #153

v0.6.3: more robust IFrame handling

17 Sep 18:50
2a171c4
Compare
Choose a tag to compare

browsergym-core

  • Improvements
    • ignoring Frame has been detached errors during unmarking #148 #147
    • more robust handling of missing frames in extract_merged_axtree #148 #146

v0.6.2: minor update

17 Sep 14:52
Compare
Choose a tag to compare

browsergym-experiments

  • New features
    • new field html_page: str in AgentInfo to enable agents to send an HTML page for visualization #13