Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Episode length, ActionHead, Data collect, Next state as action #120

Open
vpnasdfghjkl opened this issue Jul 22, 2024 · 1 comment
Open

Comments

@vpnasdfghjkl
Copy link

Some issues while trying to replicate your impressive work.

  1. Can i use different episode length at same collect frequency?(episode length=800-1200)
  2. I found that the model is very sensitive to the image. Could it be that my proprior settings are problematic? I used joint(-60~60°)+1grasp(0/1), and for the create head, I used L1ActionHead with 512 bins, -16(low)~16(high). How should I determine the ActionHead? Additionally, I found that using DiffusionHead is not very effective(my batch=16 for my 128G Memory).
  3. Do we need to ensure that the trend of changes for each axis is the same every time? The changes in the data collected for axes 1-4 are quite similar, but for the axes closer to the gripper (axes 5-7), we can't guarantee that the trend of actions is the same each time. This results in poor prediction for the latter three dimensions as well.
  4. Can I treat the next step's state as the current step's action if the following in teleoperation isn't very good?
@peter-mitrano-bg
Copy link

Non-author here but hopefully I can help.

  1. Can i use different episode length at same collect frequency?(episode length=800-1200)

Should work fine! Have a look at traj_transforms.chunk_act_obs to see how it takes the episodes and turns them into small chunks for training. Longer episodes should result in more chunks for training, which shouldn't be a problem. However, there are some things you should try to avoid that may happen in long episodes (in my experience)

  1. Pauses -- if the robot is just sitting still, that may confuse the policy. How does it know when to wait or not? if it's ambiguous, that's going to hurt the policy.
  2. Repetition -- if the episode contains doing the same thing multiple times, again it may be ambiguous what the policy should do because it has so little history/memory (essentially just one previous frame)
  1. I found that the model is very sensitive to the image. Could it be that my proprior settings are problematic? I used joint(-60~60°)+1grasp(0/1), and for the create head, I used L1ActionHead with 512 bins, -16(low)~16(high). How should I determine the ActionHead? Additionally, I found that using DiffusionHead is not very effective(my batch=16 for my 128G Memory).

I have also found that after fine tuning the policy behaves weirdly in response to irrelevant changes in the background of the image. Please share any additional findings if you figure this out! I am not sure how to advise on your action space.

  1. Do we need to ensure that the trend of changes for each axis is the same every time? The changes in the data collected for axes 1-4 are quite similar, but for the axes closer to the gripper (axes 5-7), we can't guarantee that the trend of actions is the same each time. This results in poor prediction for the latter three dimensions as well.

Not quite sure what you mean by "trend in changes", but the TFDS pipeline does normalization of the action dimensions (unless you explicitly turn it off), so that should take care of different magnitudes of different action dimensions.

  1. Can I treat the next step's state as the current step's action if the following in teleoperation isn't very good?

This is a subtle question I think, because it really depends on how your robot's controller behaves and how your teleoperation works. Can you be more specific about "teleoperation isn't very good? what does that mean? For most robot teleoperation cases, the robot doesn't stop and wait at each action because that would be very jerky. Instead, the robot is continuously trying but never succeeding to accomplish the latest action. In this case, using the next state as the action could result in different motion/path than using the original actions recorded during teleop. Personally I would say you should not use next state, but rather use the action recorded during teleop. But I think more details are before I can say anything confidently. A video showing the teleop at 1x speed (not sped up) would be very useful.

Hope this is helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants