Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net] First pass fixing ftpd on QEMU #1039

Merged
merged 2 commits into from
Dec 1, 2021
Merged

[net] First pass fixing ftpd on QEMU #1039

merged 2 commits into from
Dec 1, 2021

Conversation

ghaerr
Copy link
Owner

@ghaerr ghaerr commented Dec 1, 2021

Enhancements to allow easier operation of ftpd on QEMU, for testing.

@Mellvik, tell me what you think about this. I'm already finding a number of potential issues between ftpd and QEMU.
The following changes are made in this PR:

  • SO_REUSEADDR turned on for passive mode file transfers (both QEMU and real hardware). This should be safe, but still should be tested on real hardware at some point. SO_REUSEADDR should not yet be enabled elsewhere (pending more work on it).
  • Eliminate 1 second sleep in ftpd for socket timeout, uses SO_REUSEADDR and displays appropriate messages.
  • Added ability to set ftpd -q or ftpd -q in /bootopts: use "ftpd=-q" when running QEMU for testing. Use "ftpd=-d" for debug output at boot without having to kill and restart.
  • ftpd -q and -d options don't disable becoming a daemon; run ftpd -d -d for old behavior.
  • Added ftp port forwarding lines in qemu.sh. It would be nice to figure out how to do this without having to kluge ftpd to prepare a special passive mode string!
  • Forward 9 ports rather than 5 for more realistic emulation. (May be off-by-one error in ftpd on wrap).

In quick testing, I found the following issues (could be my mistakes):

  • managed to send files to ELKS, using "mput foo*", but can't get "mget foo*" to work to send from ELKS afterwards.
  • QEMU seems to stop working after a bit... the INT 0 timer stops, and ftpd stops working, only when using serial console. Very strange, but have seen this behavior before. I think its a QEMU problem. This can be duplicated by uncommenting the first two entries in /bootopts and running a few file transfers.
  • Sometimes ftpd just stops responding. Not sure if that's the above problem or not.
  • Any ideas on how active mode might be tested using QEMU?

Overall, this first pass makes things a lot easier for testing. @Mellvik, let me know if you're OK with commit.

@Mellvik
Copy link
Contributor

Mellvik commented Dec 1, 2021 via email

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 1, 2021

I guess this is a matter of taste. I'd prefer -D instead of -d -d - but I'm fine with this.

Go ahead and change it however you like. I went the easy route and didn't rewrite the option parsing, instead just use "debug < 2" as to when to fork(). My primary purpose was to try to get a working QEMU test platform for ftpd so I can more easily exercise it. I would like to move towards using net start/stop or other options to start/stop/debug with daemons so we don't have to debug with special "harnesses".

This hack (4+1) was to avoid ktcp crashing because too many sockets were lingering. If SO_REUSEADDR works, this should be OK.

SO_REUSEADDR seems to be working well, for this use case. I increased the number of usable ports so that the QEMU platform is more like real hardware and might fail in the same way. Also just added a QEMU_PORT to specify the outside port number for the time being, until we figure out how to do away with special qemu hacks.

I have noticed a couple of cases where netstat shows CLOSE_WAIT for some sockets. I don't think this is buggy in ktcp, as CLOSE_WAIT means FIN was received, but local socket not closed. This may be because of an error situation not yet handled properly by ftpd.

Now that QEMU can be easily used to test ftpd, we can more quickly track down possible errors in ftpd.

So what you'd have to do is 'mget .' (or some other directory)

I've now tested transfers both ways with QEMU, and things seems to be working well, at least for the case of ELKS is running single-user. I have looked into the ELKS multiuser case a bit more, and QEMU has known problems where the hardware timer stops ticking. This seems to manifest itself on ELKS whenever the serial port is emulated. So I have modified the /bootopts test line for QEMU to run in single user mode, and I haven't seen any ftpd "stopping" issues anymore.

Here's some discussion on the QEMU buggy hardware timer emulation issues:
zephyrproject-rtos/zephyr#14173
zephyrproject-rtos/zephyr#12553

Sometimes I think we might want to use a different emulator. When the bug hits, the ELKS "date" command shows time isn't incrementing. Being forced to use console-only mode to test ftpd means that lots of debug information has already scrolled off the screen.

I honestly don't think this is possible without a QEMU hacked ftp client which makes the port # predictable.

Kind of amazing, is this the case with all emulators? Say even for MSDOS or Windows emulated, one can't ftp in to a known port without MSDOS or Windows server being specially modified?

It would be really nice if we could keep special QEMU-cased hacks out of ELKS applications (and bootopts)!

@ghaerr ghaerr merged commit 273313e into ghaerr:master Dec 1, 2021
@Mellvik
Copy link
Contributor

Mellvik commented Dec 1, 2021 via email

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 1, 2021

This is good. I'll reenable SO_REUSEADDR if you haven't done so already.

It's already done, but only for the passive file transfer mode. It is required for QEMU because of the speed issues you ran into when getting ftpd to work with it. In almost all cases, due to the TCP state machine, (in this use case only), the previous socket used for passive mode is in the TIME_WAIT state, so it can be reused without harm.

The other SO_REUSEADDRs are still commented out, until the long-term fix in ktcp is made. That fix requires that all sockets in the active list be inspected, not just the first socket found as is currently done. It's kind of tricky to get right, which is why I'm delaying dealing with it until we get most of the other networking stabilized (again).

What are the alternatives btw? Is the qemu problem general (any serial port) or only serial console.

I don't know! It occurs randomly when using the serial port/console with multiuser enabled. It's deep down having to do with edge vs level PIC triggering in the emulator. I've known about it for a long time. Alternatives are we use another emulator possibly. We don't see it when using the serial console usually because typing on the serial console fixes it. It only happens when accessing ELKS without using the serial port, like from a remote ftp. Not to worry, just mentioning the problem, since this PR is for use with QEMU, in case you run into weird stuff.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 1, 2021

Btw, I've run a few multiuser tests w/o any particular issues, other than more retransmits. Will do more of that now that we're basically stable.

I would say we should be pretty stable. I'm interested in your testing with ftpget/put, since we now may be able to use those and ftpd to download a kernel automatically as you desired a while back. And it would be interesting to test that over QEMU as well.

I'm thinking about the idea of some preset file transfer regression tests for TCP. They would use a preloaded directory of the git repo, and could be run some shell scripts. Even if only over QEMU, or having shell scripts that worked from externally as well, these would go a long way to proving reliability with ELKS TCP moving forward. Ideas welcome!

@Mellvik
Copy link
Contributor

Mellvik commented Dec 2, 2021 via email

@Mellvik
Copy link
Contributor

Mellvik commented Dec 2, 2021

@ghaerr,
a quick heads up: While testing the new fpt client. I ran into an interesting situation:
FTP transfers into ELKS in QEMU using ftpd are not 100% reliable. Bytes get added and I haven't had time to look at it yet. Keep an eye on transferred file sizes.
Not repeatable on real hardware. More on this as soon as I get a chance.

Teaser: The ELKS ftp client is transferring files and directory listings in passive mode, the first cut is just around the corner - with some fun stuff for experimentation : select() multiplexing.

-M

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 2, 2021

FTP transfers into ELKS in QEMU using ftpd are not 100% reliable. Bytes get added and I haven't had time to look at it yet. Keep an eye on transferred file sizes.

Should this be a real problem, open a new issue. Please include exact details concerning the transfer: is this elks-to-elks, or remote-to-elks; only using the new ftp client, or repeatable with macOS client, and which way the file transfer is going send or receive from ftpd, etc. In this way I can understand whether this is related to the FIN received with unprocessed data that was recently a problem (and now has debug printf turned off), or another problem.

If the problem only occurs on QEMU, we will also need to see whether this has anything to do with SO_REUSEADDR sockets or not. Most of that debug printf is still turned on.

the first cut is just around the corner - with some fun stuff for experimentation : select() multiplexing.

Both telnet and ktcp use select multiplexing for their network I/O, in case you haven't already noticed.

@Mellvik
Copy link
Contributor

Mellvik commented Dec 3, 2021 via email

@ghaerr ghaerr deleted the qemu branch December 6, 2021 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants