You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've had some free time in the last few days and I probably figured out a bug where, during training, the agent cannot overcome the reward boundary of -602 in your Kundur 2-area case. The fact is that during training and testing in the environment (Kundur's scheme), short circuits are not simulated. I checked it out. That is, the agent learns purely on the normal operating conditions of the system. In this case, the optimal policy is never to apply the dynamic brake, i.e. actions are always 0, which corresponds to the specified value of the reward (-602 or 603).
I'm guessing it has something to do with the PowerDynSimEnvDef modifications. Initially, you used PowerDynSimEnvDef_v2, and now I am working with PowerDynSimEnvDef_v7.
The text was updated successfully, but these errors were encountered:
I've had some free time in the last few days and I probably figured out a bug where, during training, the agent cannot overcome the reward boundary of -602 in your Kundur 2-area case. The fact is that during training and testing in the environment (Kundur's scheme), short circuits are not simulated. I checked it out. That is, the agent learns purely on the normal operating conditions of the system. In this case, the optimal policy is never to apply the dynamic brake, i.e. actions are always 0, which corresponds to the specified value of the reward (-602 or 603).
I'm guessing it has something to do with the PowerDynSimEnvDef modifications. Initially, you used PowerDynSimEnvDef_v2, and now I am working with PowerDynSimEnvDef_v7.
The text was updated successfully, but these errors were encountered: