Skip to content

ASX Gym Action

James Shen edited this page Jun 13, 2020 · 1 revision

Agent-Environment Loop

Following the diagram from OpenAI doc

Gym Loop

This is just an implementation of the classic “agent-environment loop”. Each timestep, the agent chooses an action, and the environment returns an observation and a reward.

Action in ASX Gym

So what's an action look like for the ASX Gym? From the investor's point of view, when he/she invests in the stock market , the investor can choose to buy stock, sell a stock , or just hold the stock. following is the code snippet of the definition of action in asx_gym_env.py

self.action_space = spaces.Dict(
            self.action_space = spaces.Dict(
    {
        "company_count": spaces.Discrete(self.max_company_number),
        "company_id": spaces.MultiDiscrete([self.max_company_number]
                                            * self.max_company_number),
        "stock_operation": spaces.MultiDiscrete([5]
                                                * self.max_company_number),
        "volume": spaces.Box(np.float32(0),
                                high=np.float32(self.number_infinite),
                                shape=(self.max_company_number,),
                                dtype=np.float32),
        "price": spaces.Box(low=np.float32(0),
                            high=np.float32(self.max_stock_price),
                            shape=(self.max_company_number,),
                            dtype=np.float32),
        "end_batch": spaces.Discrete(2)
    }
)

Here stock_operation can be the following 5 operations:

  • HOLD_STOCK = 0. Do nothing forgiven stock,
  • BUY_STOCK = 1. Buy stock
  • SELL_STOCK = 2. Sell stock
  • TOP_UP_FUND = 3. Topup you bank account
  • WITHDRAW_FUND = 4. Withdraw funds from your bank account. (3,4 are not used at the moment)

price is the price you set to buy or sell a stock if your price is lower than the current bid price or your price is higher than the selling price. your transaction will not be fulfilled. but you can use a big number(1000) when buy or 0 when sell, it automatically chooses the current bid price or sell price for the stock. when buying stock, if your available fund is not enough, the transaction will also fail. (same effect as HOLD_STOCK)

volume is the volume you want to buy or sell the stock when buying stock if the volume is set to 0, the ASX Env uses all your available fund to buy that stock and calculate the max volume you can buy.

end_batch can be set True or False. ASX Env simulates stock's bid price and sells price every 15 mins, which means each day there will be 24 mini-steps (Trade starts at 10 am. ends 4 pm.). if your today's transaction was done, you can tell the ASX Gym to move to the next day sooner by set end_batch to True instead of till 4 pm when trades end.

Since OpenAI doesn't support a variable length of space type. company_count is used to tell the environment how many stocks in the action.

To ease the hassle to set proper parameters of action, ASX Gym provides simpler types:

class AsxTransaction:
    def __init__(self, company_id, stock_operation, volume, price):
        self.company_id = company_id
        self.stock_operation = stock_operation
        self.volume = volume
        self.price = price

    def to_json_obj(self):
        if self.stock_operation == BUY_STOCK:
            stock_operation = 'buy'
        elif self.stock_operation == SELL_STOCK:
            stock_operation = 'sell'
        elif self.stock_operation == HOLD_STOCK:
            stock_operation = 'hold'
        elif self.stock_operation == TOP_UP_FUND:
            stock_operation = 'top_up'
        elif self.stock_operation == WITHDRAW_FUND:
            stock_operation = 'withdraw'
        else:
            stock_operation = 'unknown'
        json_obj = {
            'company_id': int(self.company_id),
            'stock_operation': stock_operation,
            'volume': round(float(self.volume), 2),
            'price': round(float(self.price), 2)
        }
        return json_obj


class AsxAction:
    def __init__(self, end_batch):
        self.end_batch = end_batch
        self.transactions = []

    def add_transaction(self, transaction: AsxTransaction):
        self.transactions.append(transaction)

    def copy_to_env_action(self, action):
        company_count = len(self.transactions)
        action['company_count'] = company_count
        action['end_batch'] = self.end_batch
        for c in range(company_count):
            asx_transaction: AsxTransaction = self.transactions[c]
            action['company_id'][c] = asx_transaction.company_id
            action['volume'][c] = asx_transaction.volume
            action['price'][c] = asx_transaction.price
            action['stock_operation'][c] = asx_transaction.stock_operation
        return action

    def to_json_obj(self):
        json_obj = {
            'end_batch': int(self.end_batch),
            'transactions': []
        }
        for transaction in self.transactions:
            json_obj['transactions'].append(transaction.to_json_obj())
        return json_obj

    @staticmethod
    def from_env_action(action):
        company_count = action['company_count']
        end_batch = action['end_batch']
        asx_action = AsxAction(end_batch)
        for c in range(company_count):
            company_id = action['company_id'][c]
            volume = action['volume'][c]
            price = action['price'][c]
            stock_operation = action['stock_operation'][c]
            asx_transaction = AsxTransaction(company_id, stock_operation, volume, price)
            asx_action.add_transaction(asx_transaction)
        return asx_action

can help you to do the conversion.