Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link between srsenb and srsue goes up and down and after a while they both crash #738

Closed
akhila-rao opened this issue Oct 19, 2021 · 10 comments

Comments

@akhila-rao
Copy link

akhila-rao commented Oct 19, 2021

Issue Description

I can bring up the eNB and connect the UE to it. I start iperf traffic from the enb to the UE. It looks OK for a few seconds then I usually see this message at the eNB

[INFO] [UHD RF] Tx while waiting for EOB, timed out... 67.3145 >= 56.3065. Starting new burst…

But the network still works OK and traffic is going through.
Then I see that the UE disconnects and reconnects many times and each time this happens traffic stops for like 1-10s
@ue I see:
Warning: Detected Radio-Link Failure
RRC Connection Reestablishment to PCI=1, EARFCN=3350 (Cause: "otherFailure")
Random Access Transmission: seq=44, tti=5211, ra-rnti=0x2
Random Access Complete. c-rnti=0x4b, ta=1
Reestablishment OK
RRC Connected
Warning: Detected Radio-Link Failure
RRC Connection Reestablishment to PCI=1, EARFCN=3350 (Cause: "otherFailure")
Random Access Transmission: seq=5, tti=5541, ra-rnti=0x2
Random Access Complete. c-rnti=0x4c, ta=0
Reestablishment OK
RRC Connected
/home/bob/srsRAN/lib/src/phy/phch/ra_dl.c.199: Invalid RBG subset=3 for nof_prb=50 where P=3
/home/bob/srsRAN/lib/src/phy/phch/ra_dl.c.641: Configuring resource allocation
Warning: Detected Radio-Link Failure
RRC Connection Reestablishment to PCI=1, EARFCN=3350 (Cause: "otherFailure")
Random Access Transmission: seq=49, tti=3131, ra-rnti=0x2
Random Access Complete. c-rnti=0x4e, ta=1
Reestablishment OK
RRC Connected
Warning: Detected Radio-Link Failure
RRC Connection Reestablishment to PCI=1, EARFCN=3350 (Cause: "otherFailure")
Selected cell no longer suitable: Going to RRC IDLE
RRC IDLE

Finally after around 100s or so I see these error messages at the eNB and then the connection fully breaks down (I check this by seeing that there are no packets in the eth link between the USRP and host)

[ERROR] [X300] 193.10.65.34: x300 fw communication failure #1
EnvironmentError: IOError: x300 fw poke32 - reply timed out
[ERROR] [X300] 193.10.65.34: x300 fw communication failure #2
EnvironmentError: IOError: x300 fw poke32 - reply timed out
[ERROR] [X300] 193.10.65.34: x300 fw communication failure #3
EnvironmentError: IOError: x300 fw poke32 - reply timed out
[ERROR] [UHD] An unexpected exception was caught in a task loop.The task loop will now exit, things may not work.EnvironmentError: IOError: 193.10.65.34: x300 fw communication failure #3
EnvironmentError: IOError: x300 fw poke32 - reply timed out

and I then see many many instances of
/home/data/srsRAN/lib/src/phy/rf/rf_uhd_imp.cc.522: USRP reported the following error: EnvironmentError: IOError: Block ctrl (CE_01_Port_40) packet parse error - EnvironmentError: IOError: Expected packet index: 3495 Received index: 3505

Setup Details

srsRAN version 21.04.0
UHD version 3.15
Ubuntu version 18.04.1 for srsrenb, 20.04.2 for srsue
1GbE connection between USRP and host laptop
X310 hardware with UBX daughterboards for both enb and ue
SISO model of operation
I have set CPU to performance
I have set network buffers using
sudo sysctl -w net.core.rmem_max=2426666
sudo sysctl -w net.core.wmem_max=2426666

Expected Behavior

Continuously send traffic over the network
Remain stable without errors

Actual Behaviour

UE disconnects and reconnects many time resulting in around several 10s of seconds of no traffic, after which it comes back again. And then after a while (around 200s) errors are thrown by srsenb and srsue

Steps to reproduce the problem

I am using iperf UDP to send a 1Mbps data stream from enb to ue, but the same behaviour is seen irrespective of how I generate the traffic. The errors I see after a while happen even when I don't send any traffic. The required files are attached. The command I used to run enb is
sudo nice -n -15 srsenb --expert.rrc_inactivity_timer 36000000
The command I used to run ue is
sudo nice -n -15 srsue

Additional Information

I have tried playing around with increasing the network buffer size, increasing the send/recv frame size after increasing the MTU, increasing the srate value, reducing the #PRBs from 50 to 25, disabling the rrc_inactivity_timer that I set in the command, changing TX_gain at eNB and UE, but none of these have resulted in a stable setup.
The error messages that I see are also very variable and I do not always see the same errors.
There are several errors that I see that I have looked up in git issues and in the srsran users group and they have not received any responses.
I can provide more information and a list of all the different errors I see, some of which are infrequent in case that helps.
Thanks!
enb_conf.txt
epc_conf.txt
ue_conf.txt

@andrepuschmann
Copy link
Collaborator

This is an issue in the UHD driver we also see internally with the X310. There is not much we can do on the srsRAN side I am afraid.

@akhila-s-rao
Copy link

akhila-s-rao commented Oct 29, 2021

Hi, could I please get some more clarity before we close this thread. Does this mean that one cannot use X310 for a stable setup with the latest version of srsRAN ? Am I better off using the B210 if I have access to it ?

I even tried using the latest UHD version (4.2) to see if they have fixed any of these issues I am having with the X310 and it seems that they have removed references to device3 (from the file device3.hpp) and made a generic device.hpp, so when I try to compile srsRAN (version 21) with this UHD version (4.2) It fails because srsRAN is looking for device3.hpp
Will this latest version of UHD be supported in the next version of srsRAN (In case this version is the solution to the X310 issue) ?

Finally I want to know is you think that this issue is with the coming together of X310, the UHD version and srsRAN version ? or is it a problem with just the UHD and the X310 hardware ? This might help me decide if I should pursue trying to find a fix in the UHD community.

Thanks a ton if you read this far.

@andrepuschmann
Copy link
Collaborator

Hey,

when saying we see similar issues internally I was referring to:

[ERROR] [X300] 193.10.65.34: x300 fw communication failure #1
EnvironmentError: IOError: x300 fw poke32 - reply timed out

That's something I am not sure what it is and how to solve it but this is not an srsRAN issue. That's a bug in UHD. Regarding the UHD version, we've been testing with 3.15 and 4.1 and those compile and work reasonably well. We have not tried 4.2 yet.

The X310, in general, is a good device and once good streaming parameters have been found it's working reasonably well.

Thanks
Andre

@akhila-s-rao
Copy link

Thanks. So I have hope of getting the X310 devices working with latest srsRAN and UHD 3.15 is what I am hearing :). I have tried varying the MTU and send/recv frame sizes, the sampling rates and the network buffer sizes. But I am yet to get something stable. I have even given srsenb and srsue high priority when I start the process. I am also setting CPU to performance mode. Could you please tell me if I am missing the tuning of any other streaming parameters ? So that I have a list of all the knobs to try to tune. Again thanks a ton.

@andrepuschmann
Copy link
Collaborator

Those params are the right ones to play with. But frankly speaking giving definite advice here is difficult. Also I am not saying that with 3.15 everything works flawlessly. We do have see issue there too. To be honest I would prefer to use the latest stable UHD and ask Ettus to solve the issue there.

Here is a set of params we use type=x300,clock=external,sampling_rate=11.52e6,lo_freq_offset_hz=23.04e6,send_frame_size=8000,recv_frame_size=8000,num_send_frames=64,num_recv_frames=64 in one of the machines. But again, without warrenty.

@akhila-s-rao
Copy link

This is very helpful. Thanks. I shall try these and then reach out to the UHD community for additional help with getting X310 to behave. Currently I am unable to have a stable system that just stays up for even 100s. Last question. I am using the internal clock, I see that you have an external clock (GPS ?). Does this in any way affect the performance of the system ? I am not doing any MIMO.

@andrepuschmann
Copy link
Collaborator

Well if you don't see clock instabilities or issues when attaching commercial phones its probably fine. We use the OctoClock in our setups because some devices have a crazy offset and don't have good oscillators.

@akhila-s-rao
Copy link

I have never connected to a commercial phone. I only use another X310 srsue. Sorry to press, but how would one know if they have clock instabilities ? Do I need to play around with any additional parameters since I am connecting to a srsUE running with another X310 ?

@andrepuschmann
Copy link
Collaborator

check the CFO output on the srsUE stdout when attaching to the eNB

@akhila-s-rao
Copy link

Hmm. Checking the ue_metrics.csv file I see that the CFO is almost always around -4700. So 4.7KHz offset when each resource block is 180 KHz seem not bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants