Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iCubGenova04] Force feedback from WholeBodyDynamics starts minutes after running the yarprobotinterface #999

Closed
S-Dafarra opened this issue Oct 6, 2020 · 32 comments
Labels
iCubGenova04 (iRonCub1) S/N:031 stale This issue will be soon closed automatically

Comments

@S-Dafarra
Copy link

Brief description of the request/failure

When launching the yarprobotinterface, we noticed that it takes several minutes to WBD to start streaming the external forces.

Detailed context

We noticed this by running the iCubGUI immediately after running the robot. The external forces appear after a couple of minutes. We also checked the raw output of the sensors by checking the corresponding yarp port. The same for the inertial sensor. They all seem to output some reasonable, non-constant, output.

We are not sure at which level the problem is, if it is related to electrical issues, or it is a software problem.

Accompanying material

@traversaro @GiulioRomualdi @prashanthr05

@S-Dafarra S-Dafarra changed the title {iCubGenova04] Force feedbck from WholeBodyDynamics starts minutes after running the yarprobotinterface [iCubGenova04] Force feedbck from WholeBodyDynamics starts minutes after running the yarprobotinterface Oct 6, 2020
@traversaro
Copy link
Member

A first step could be of checking if only the force ports are not published, or if only the force are published with a delay. See https://github.com/robotology/whole-body-estimators/blob/master/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L1919 for a list of quantities published by the wbd. In particular, it would be interesting to know if /wholeBodyDynamics/contacts:o is indeed published or not.

@S-Dafarra S-Dafarra changed the title [iCubGenova04] Force feedbck from WholeBodyDynamics starts minutes after running the yarprobotinterface [iCubGenova04] Force feedback from WholeBodyDynamics starts minutes after running the yarprobotinterface Oct 6, 2020
@traversaro
Copy link
Member

Another test is to try to switch the robot to torque control mode or impedance interaction mode. If the robot faults, then the torques are actually not published.

@julijenv
Copy link
Collaborator

julijenv commented Oct 8, 2020

Hi @S-Dafarra , did you try to look into the log file to check if some problems with the FT sensors arae shown that could affect the time in between whcich your are not getting any data from them?
Is it the first time that you notice that behaviour? Is it the first time that you loook into that problem?
let us know

@GiulioRomualdi
Copy link
Member

Another test is to try to switch the robot to torque control mode or impedance interaction mode. If the robot faults, then the torques are actually not published.

The torque is not streamed and the contacts are not streamed too

@traversaro
Copy link
Member

Great. Then something is blocked either in https://github.com/robotology/whole-body-estimators/blob/master/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L1372 or https://github.com/robotology/whole-body-estimators/blob/master/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L2118 . Some good old debug prints may be useful in this case. It would also be useful to have the log related to wholeBodyDynamics (all the relevant log lines should start with wholeBodyDynamics, so it should be easy to find them) to understand if the calibration is correctly completed or not, and if the calibration ends correctly when the robot start streaming the forces.

@julijenv
Copy link
Collaborator

Great. Then something is blocked either in https://github.com/robotology/whole-body-estimators/blob/master/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L1372 or https://github.com/robotology/whole-body-estimators/blob/master/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L2118 . Some good old debug prints may be useful in this case. It would also be useful to have the log related to wholeBodyDynamics (all the relevant log lines should start with wholeBodyDynamics, so it should be easy to find them) to understand if the calibration is correctly completed or not, and if the calibration ends correctly when the robot start streaming the forces.

Hi @GiulioRomualdi , @S-Dafarra ,
did you by anychance try to check that?
thx in advance

@S-Dafarra
Copy link
Author

Hi @GiulioRomualdi , @S-Dafarra ,
did you by anychance try to check that?
thx in advance

Hi @julijenv, unfortunately we did not manage yet to run those tests.

@lrapetti
Copy link
Member

In the last days I have used extensively the robot and I have never experienced this problem. I don't know if anything changed in the configuration of the robot and the problem is now fixed.

I know in those days also @gabrielenava @CarlottaSartore are using the robot, please annotate here if you notice this problem.

@S-Dafarra
Copy link
Author

Indeed, there was a CAN problem on the arm (#1008 (comment)). I wonder if this could have caused one of the FTs to slow down the initialization of WBD 🤔

@gabrielenava
Copy link

gabrielenava commented Oct 29, 2020

I confirm that the problem seems solved. I noted that before (i.e. few weeks ago), the feet forces from wbd were streamed only several minutes after the robotinterface started. Now they are streamed right after the robot startup phase.

@traversaro
Copy link
Member

@Nicogene was experiencing a similar behavior due to some corner case interaction of yarpserver port registration, perhaps he can add something himself.

@Nicogene
Copy link
Member

I usually call it "IP theft", it happens when you are on a setup with more than one node, and after running a module that opens the port /foo you by accident opens the same port from another machine e.g. via yarp write /foo.

What happens is that in the nameserver the /foo port will have the IP and port number assigned to the yarp write because being on different machines no address conflict is triggered and the first /foo port is reachable only via the IP, because trying to contact via name we will contact actually the second /foo created.

This is a known issue about the ports registration in YARP, there is no check if the entry exists already because it is possible that the entry remained after the crash of one module, the only check is on the address conflict triggered by the OS.

cc @drdanz

@drdanz
Copy link
Member

drdanz commented Feb 23, 2021

This has been thoroughly discussed in several occasions in the past:

  • There is a general consensus that we don't want to run yarp clean every time an application crashes.
  • There are also issues relate to running yarp clean with ports registered manually or in other ways (e.g. the /ros port when running yarpserver --ros)
  • There is a strong opposition in having yarpserver "probe" the registered port before assigning it to another program.

As result, there is nothing we can do about this, it is just how it was decided that YARP should work.

If you want to discuss or to propose some changes to this behaviour (which I would probably agree to) feel free to open an issue or a discussion on the YARP repository.

@traversaro
Copy link
Member

I think that writing this down (even just here in this comment) is already quite useful, as I had no idea of this discussion in the past (I guess there was f2f, I don't remember reading about those).

@S-Dafarra
Copy link
Author

I was reading again the above comments by @Nicogene and @drdanz. Let me understand if I got it right. Two modules running on different machines are always free to open ports with the same name.

This is allowed because the OS is not triggering any address conflict error. At the same time, the name server cannot know if the second module is another instance of the first module that runs because the first instance has crashed. In addition, the name server cannot test this hypothesis because there is no "probe" and we cannot enforce the user to run a yarp clean every time an application crashes. Hence, the policy is to simply allow the second module to have the same port name, relying on the user to avoid doing these things.

So, the "IP theft" can always happen right?

@drdanz
Copy link
Member

drdanz commented Mar 12, 2021

So, the "IP theft" can always happen right?

Yes, both modules will continue to run, and existing connections will not be interrupted, but the server will start responding to queries returning the address of the last one started.

Just try running yarp server on one machine and yarp read /root on a different machine and see what happens...

@S-Dafarra
Copy link
Author

So, the "IP theft" can always happen right?

Yes, both modules will continue to run, and existing connections will not be interrupted, but the server will start responding to queries returning the address of the last one started.

Just try running yarp server on one machine and yarp read /root on a different machine and see what happens...

Ok. Just for discussing (even if it may not be the best place), could the name server use the address to spot the thieves? I mean, if the module crashed, there is quite a chance that it will restart on the same machine with the same address. Instead, if the same port is requested by two different addresses could be a good indication that something weird is going on. In that case, could the name server return an error, eventually suggesting to run yarp clean?

@Nicogene
Copy link
Member

The problem is that this kind of checks and the yarp clean will lower the performances by a lot. If all the ports you have to ping them before registering, modules that open a lot of ports (e.g. wbd) will takes a while for opening.

This kind of check is actually done by yarpviz(is the port alive? on which machine? to what is connected?), running it on a cluster with some modules running can give you an estimate of how this kind of check will impact.

@S-Dafarra
Copy link
Author

The problem is that this kind of checks and the yarp clean will lower the performances by a lot. If all the ports you have to ping them before registering, modules that open a lot of ports (e.g. wbd) will takes a while for opening.

No no, I was not suggesting to run yarp clean automatically, but only to suggest it in the error message. I was picturing an error like
"You requested port name that is already registered for a different address. The port name might be already used by another module. Try to run yarp clean"
Also, I was not suggesting to ping the port, but only to query the name server. I was expecting the list of ports to be saved somewhere with the corresponding address. So, if the query of the name returned an address different from the address of the module requesting to register the port, then it is possibly a case of IP theft 🤔

@S-Dafarra
Copy link
Author

Regarding the original delay problem, maybe that could be related to some other device that has some delay in starting, causing in turn the attach all phase of WBD to be delayed.

@traversaro
Copy link
Member

Regarding the original delay problem, maybe that could be related to some other device that has some delay in starting, causing in turn the attach all phase of WBD to be delayed.

If any other feedback from the robot works fine (even just encoders) in theory the attach phase of the robotinteraface should have been reached.

@S-Dafarra
Copy link
Author

Apparently the attach all of WBD seems to be called after the calibration of the hands finished. Maybe, if one of the calibrators hangs because of the boards has problems, then WBD will wait as well. I don't know under which condition the attachAll method is called 🤔

@traversaro
Copy link
Member

Apparently the attach all of WBD seems to be called after the calibration of the hands finished. Maybe, if one of the calibrators hangs because of the boards has problems, then WBD will wait as well. I don't know under which condition the attachAll method is called 🤔

See https://github.com/robotology/yarp/blob/master/src/yarprobotinterface/Module.cpp#L155 and https://github.com/robotology/yarp/blob/7098ab6219647603f96764cc0a7cd33a950bbfa6/src/libYARP_robotinterface/src/yarp/robotinterface/experimental/Robot.cpp#L624, probably we could double check the level of the attach action of the wholebodydynamics.

In the OP, you wrote:

We noticed this by running the iCubGUI immediately after running the robot. The external forces appear after a couple of minutes. We also checked the raw output of the sensors by checking the corresponding yarp port. The same for the inertial sensor. They all seem to output some reasonable, non-constant, output.

The point is that sensors such as FT get connected to the YARP port streaming their data in the attachAll phase, so if the sensor data was valid, that means that at least partically the attachAll phase is reached, and it was not blocked at an earlier phase. Unless as wrote earlier, there is something going on with levels.

@prashanthr05
Copy link

The point is that sensors such as FT get connected to the YARP port streaming their data in the attachAll phase, so if the sensor data was valid, that means that at least partically the attachAll phase is reached, and it was not blocked at an earlier phase. Unless as wrote earlier, there is something going on with levels.

I think I might have sometimes noticed these behaviors. I remember an instance while we were testing something with @fjandrad we were trying to configure some block of the code in the attachAll that depended on some buffers in open() and we were crashing continuously until we had identified it and moved all of the configuration within open().

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added stale This issue will be soon closed automatically and removed stale This issue will be soon closed automatically labels Jun 14, 2021
@pattacini pattacini added the stale This issue will be soon closed automatically label Jun 15, 2021
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Feel free to open it again if needed.

@traversaro
Copy link
Member

fyi @HosameldinMohamed @prashanthr05 , this was closed by the stale bot, but keep in my mind that it could be still there.

@pattacini
Copy link
Member

pattacini commented Jun 24, 2021

We can prevent the bot from handling stale issues by applying the label pinned.

@pattacini pattacini reopened this Jun 24, 2021
@pattacini pattacini added pinned This label prevents an issue from being closed automatically and removed stale This issue will be soon closed automatically labels Jun 24, 2021
@pattacini pattacini assigned Uboldi80 and unassigned julijenv Jul 8, 2021
@pattacini pattacini moved this to Triage in iCub Tech Support Dec 7, 2022
@pattacini pattacini moved this from Triage to Review/QA in iCub Tech Support Dec 7, 2022
@Fabrizio69 Fabrizio69 removed the pinned This label prevents an issue from being closed automatically label Dec 14, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it did not have recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale This issue will be soon closed automatically label Feb 13, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Feel free to open it again if needed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2023
@github-project-automation github-project-automation bot moved this from Review/QA to Done in iCub Tech Support Feb 21, 2023
@traversaro
Copy link
Member

@S-Dafarra @GiulioRomualdi @mebbaid do you still experienced this?

@GiulioRomualdi
Copy link
Member

Not anymore in ergocub and iCubGenova09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iCubGenova04 (iRonCub1) S/N:031 stale This issue will be soon closed automatically
Projects
Status: Done
Development

No branches or pull requests