-
Notifications
You must be signed in to change notification settings - Fork 32
1. Environment 🗺️
Intercode presents a framework for defining an interactive environment where the main form of interaction is code. In this environment, an agent must modify the execution environment and produce standard output to accomplish a task described by a natural language query. The Intercode framework adopts and builds upon the classic OpenAI gym
feedback loop.
In this setting, an agent is first presented with a natural language query that describes a coding task to complete within the context of the environment. The agent can then submit executable code as an action
to:
- Explore and understand the given context
- Get standard output and feedback from executing specific commands
Upon receiving a line of code from the agent, the Intercode environment will execute the command within the given context and respond to the agent with an observation
, a reward
, and miscellaneous info
.
-
observation
is the standard output from the execution of the agent's action -
reward
is a value between 0 and 1 that quantifies the correctness of the standard output and environment's state so far with respect to accomplishing the task described by the natural language instruction. -
info
is a dictionary that serves as an additional, optional store of information for environment signals that fall outside the purpose ofobservation
andreward
. For instance, in abash
system, the current working directory might be reflected here.
At a high level, the task formulation and feedback loop presented by the Intercode framework is akin to the development practices of programmers and software engineers.
Given the potential wealth of interaction and reasoning-related challenges of coding tasks, along with an ever-present and trending interest in the decision-making capabilities of digital agents, Intercode aims to be a tool and testbed for training, evaluating, and augmenting such abilities.
The engineering approaches for creating an interactive coding environment can often be complex and variegated. Initial attempts built with a specific task, coding language, or execution context in mind may not work when one of these settings is changed. This results in a landscape of benchmarks and tasks that are hard to compare with one another.
Intercode aims to unify the formulation for and abstract away the foundational engineering challenges of such benchmarks, making it easier for practitioners to focus on designing worthwhile code understanding and reasoning challenges via unique, customizable settings and datasets.
The primary deliverable for this goal is the IntercodeEnv
abstraction. IntercodeEnv
inherits from the OpenAI gym
package to frame code interaction as a action-observation loop. On top of this, IntercodeEnv
features logic underneath the hood to:
- Expressively define a coding environment via
Dockerfile
- Automate dataset management and logging
- Configure and contextualize the environment for each task.
- The
IntercodeEnv
class defines an abstraction that makes it easy to set up an interactive environment that can be configured via aDockerfile
to any coding language and execution context of your choice. - Intercode currently features
IntercodeEnv
-standardized environments forbash
andSQL
. -
IntercodeEnv
environments can be used in a variety of ways. This repository includes documentation for how to useIntercodeEnv
as:- A training + evaluation environment for NL-to-code generation agents
- A wrapper for connecting code agents to real world code tasks and settings
- A tool that language models can use for code-adjacent downstream tasks
The next several sections will discuss how to quickly set up an interactive code environment using InterCode, detail additional features of this framework, and demonstrate the variety of ways in which a IntercodeEnv
class can be used.