-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[net] First pass fixing ftpd on QEMU #1039
Conversation
@ghaerr,
Thank you for diving right into this!
I concur with most of this - and have a few comments:
Enhancements to allow easier operation of ftpd on QEMU, for testing.
@Mellvik, tell me what you think about this. I'm already finding a number of potential issues between ftpd and QEMU.
The following changes are made in this PR:
SO_REUSEADDR turned on for passive mode file transfers (both QEMU and real hardware). This should be safe, but still should be tested on real hardware at some point. SO_REUSEADDR should not yet be enabled elsewhere (pending more work on it).
Eliminate 1 second sleep in ftpd for socket timeout, uses SO_REUSEADDR and displays appropriate messages.
Added ability to set ftpd -q or ftpd -q in /bootopts: use "ftpd=-q" when running QEMU for testing. Use "ftpd=-d" for debug output at boot without having to kill and restart.
ftpd -q and -d options don't disable becoming a daemon; run ftpd -d -d for old behavior.
I guess this is a matter of taste. I'd prefer -D instead of -d -d - but I'm fine with this.
Added ftp port forwarding lines in qemu.sh. It would be nice to figure out how to do this without having to kluge ftpd to prepare a special passive mode string!
I don't see how you can do this. Unless someone dives into QEMU and figures out how to configure it to use TUN/TAP and thus have its own interface on the system. The problem is - obviously - that the addresse the server sees are different from what the client sees. Always.
Forward 9 ports rather than 5 for more realistic emulation. (May be off-by-one error in ftpd on wrap).
This hack (4+1) was to avoid ktcp crashing because too many sockets were lingering. If SO_REUSEADDR works, this should be OK.
In quick testing, I found the following issues (could be my mistakes):
managed to send files to ELKS, using "mput foo*", but can't get "mget foo?" to work to send from ELKS afterwards.
Ftpd does not implement globbing (yet). So what you'd have to do is 'mget .' (or some other directory). This is on the todo list.
QEMU seems to stop working after a bit... the INT 0 timer stops, and ftpd stops working, only when using serial console. Very strange, but have seen this behavior before. I think its a QEMU problem. This can be duplicated by uncommenting the first two entries in /bootopts and running a few file transfers.
Interesting. I haven't seen this, but then again I haven't used qemu much ...
Sometimes ftpd just stops responding. Not sure if that's the above problem or not.
Hopefully we find a way to recreate this one. I haven't seen it, so it's likely the speed difference between systems that triggers something.
Any ideas on how active mode might be tested using QEMU?
I honestly don't think this is possible without a QEMU hacked ftp client which makes the port # predictable.
Overall, this first pass makes things a lot easier for testing. @Mellvik, let me know if you're OK with commit.
Go ahead, @ghaerr - this is great!
…--Mellvik
|
Go ahead and change it however you like. I went the easy route and didn't rewrite the option parsing, instead just use "debug < 2" as to when to fork(). My primary purpose was to try to get a working QEMU test platform for ftpd so I can more easily exercise it. I would like to move towards using
SO_REUSEADDR seems to be working well, for this use case. I increased the number of usable ports so that the QEMU platform is more like real hardware and might fail in the same way. Also just added a QEMU_PORT to specify the outside port number for the time being, until we figure out how to do away with special qemu hacks. I have noticed a couple of cases where netstat shows CLOSE_WAIT for some sockets. I don't think this is buggy in ktcp, as CLOSE_WAIT means FIN was received, but local socket not closed. This may be because of an error situation not yet handled properly by ftpd. Now that QEMU can be easily used to test ftpd, we can more quickly track down possible errors in ftpd.
I've now tested transfers both ways with QEMU, and things seems to be working well, at least for the case of ELKS is running single-user. I have looked into the ELKS multiuser case a bit more, and QEMU has known problems where the hardware timer stops ticking. This seems to manifest itself on ELKS whenever the serial port is emulated. So I have modified the /bootopts test line for QEMU to run in single user mode, and I haven't seen any ftpd "stopping" issues anymore. Here's some discussion on the QEMU buggy hardware timer emulation issues: Sometimes I think we might want to use a different emulator. When the bug hits, the ELKS "date" command shows time isn't incrementing. Being forced to use console-only mode to test ftpd means that lots of debug information has already scrolled off the screen.
Kind of amazing, is this the case with all emulators? Say even for MSDOS or Windows emulated, one can't ftp in to a known port without MSDOS or Windows server being specially modified? It would be really nice if we could keep special QEMU-cased hacks out of ELKS applications (and bootopts)! |
I went the easy route and didn't rewrite the option parsing, instead just use "debug < 2" as to when to fork(). My primary purpose was to try to get a working QEMU test platform for ftpd so I can more easily exercise it. I would like to move towards using net start/stop or other options to start/stop/debug with daemons so we don't have to debug with special "harnesses".
I'm fine with this. And option parsing is far down on my list too.
This hack (4+1) was to avoid ktcp crashing because too many sockets were lingering. If SO_REUSEADDR works, this should be OK.
SO_REUSEADDR seems to be working well, for this use case. I increased the number of usable ports so that the QEMU platform is more like real hardware and might fail in the same way. Also just added a QEMU_PORT to specify the outside port number for the time being, until we figure out how to do away with special qemu hacks.
This is good. I'll reenable SO_REUSEADDR if you haven't done so already.
I have noticed a couple of cases where netstat shows CLOSE_WAIT for some sockets. I don't think this is buggy in ktcp, as CLOSE_WAIT means FIN was received, but local socket not closed. This may be because of an error situation not yet handled properly by ftpd.
I'll keep an eye on this as testing contines.
Now that QEMU can be easily used to test ftpd, we can more quickly track down possible errors in ftpd.
So what you'd have to do is 'mget .' (or some other directory)
I've now tested transfers both ways with QEMU, and things seems to be working well, at least for the case of ELKS is running single-user. I have looked into the ELKS multiuser case a bit more, and QEMU has known problems where the hardware timer stops ticking. This seems to manifest itself on ELKS whenever the serial port is emulated. So I have modified the /bootopts test line for QEMU to run in single user mode, and I haven't seen any ftpd "stopping" issues anymore.
Here's some discussion on the QEMU buggy hardware timer emulation issues:
zephyrproject-rtos/zephyr#14173
zephyrproject-rtos/zephyr#12553
Sometimes I think we might want to use a different emulator. When the bug hits, the ELKS "date" command shows time isn't incrementing. Being forced to use console-only mode to test ftpd means that lots of debug information has already scrolled off the screen.
Well, this is an interesting line of thinking. What are the alternatives btw? Is the qemu problem general (any serial port) or only serial console. Maybe just too complicated, but now that ramdisks work, a dmesg like file in ram may possibly alleviate the problem...
Btw, I've run a few multiuser tests w/o any particular issues, other than more retransmits. Will do more of that now that we're basically stable.
I honestly don't think this is possible without a QEMU hacked ftp client which makes the port # predictable.
Kind of amazing, is this the case with all emulators? Say even for MSDOS or Windows emulated, one can't ftp in to a known port without MSDOS or Windows server being specially modified?
This is why passive mode is dominant. Active mode has seen very little use since NATs became the rule. Which was why passive mode was introduced in the first place. That said, and like I've mentioned before, the problem isn't the fact that we're emulating, it's how we emulate the network i/f. If qemu can use tun/tap and we can figure out how to use it, we're ok.
It would be really nice if we could keep special QEMU-cased hacks out of ELKS applications (and bootopts)!
Absolutely!!
Thank you.
…-M
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
It's already done, but only for the passive file transfer mode. It is required for QEMU because of the speed issues you ran into when getting ftpd to work with it. In almost all cases, due to the TCP state machine, (in this use case only), the previous socket used for passive mode is in the TIME_WAIT state, so it can be reused without harm. The other SO_REUSEADDRs are still commented out, until the long-term fix in ktcp is made. That fix requires that all sockets in the active list be inspected, not just the first socket found as is currently done. It's kind of tricky to get right, which is why I'm delaying dealing with it until we get most of the other networking stabilized (again).
I don't know! It occurs randomly when using the serial port/console with multiuser enabled. It's deep down having to do with edge vs level PIC triggering in the emulator. I've known about it for a long time. Alternatives are we use another emulator possibly. We don't see it when using the serial console usually because typing on the serial console fixes it. It only happens when accessing ELKS without using the serial port, like from a remote ftp. Not to worry, just mentioning the problem, since this PR is for use with QEMU, in case you run into weird stuff. |
I would say we should be pretty stable. I'm interested in your testing with ftpget/put, since we now may be able to use those and ftpd to download a kernel automatically as you desired a while back. And it would be interesting to test that over QEMU as well. I'm thinking about the idea of some preset file transfer regression tests for TCP. They would use a preloaded directory of the git repo, and could be run some shell scripts. Even if only over QEMU, or having shell scripts that worked from externally as well, these would go a long way to proving reliability with ELKS TCP moving forward. Ideas welcome! |
@ghaerr - this is indeed a new level for elks networking!
Apropos emulators, we may take a shot at virtualbox.
IIRC there is a writeup on that in the Wiki already...
Btw, I've run a few multiuser tests w/o any particular issues, other than more retransmits. Will do more of that now that we're basically stable.
I would say we should be pretty stable. I'm interested in your testing with ftpget/put, since we now may be able to use those and ftpd to download a kernel automatically as you desired a while back. And it would be interesting to test that over QEMU as well.
Yes, that's an interesting thought. The manual variant has been invaluable for the past year (or has it been two since we got the ethernet driver and basic ktcp stability going?), now automating it is indeed possible. BTW - the availability of ftpd has already helped: I can now ftp new binaries directly to elks instead of the previous 2step process - first scp to the raspi, then ftp or ftpget. Which reminds me - the File transfer wiki needs to be updated. I'm on it.
And - totally off topic - is a paragraph on using minix fsck for the Wiki on your (endless) list?
I'm thinking about the idea of some preset file transfer regression tests for TCP. They would use a preloaded directory of the git repo, and could be run some shell scripts. Even if only over QEMU, or having shell scripts that worked from externally as well, these would go a long way to proving reliability with ELKS TCP moving forward. Ideas welcome!
This is a good idea. FWIW - my test setup has been the ELKS binaries + a 200k file (created using dd) and 100 2k files created by splitting the 200k file using split(). And finally a couple of zero size files. Easy to see anomalies with the bare eye, easy to verify to the last bit (the created files are all zeroes - dd if=/dev/zero count=200 bs=1k ...etc.)
Thank you!
…-M
|
@ghaerr, Teaser: The ELKS ftp client is transferring files and directory listings in passive mode, the first cut is just around the corner - with some fun stuff for experimentation : select() multiplexing. -M |
Should this be a real problem, open a new issue. Please include exact details concerning the transfer: is this elks-to-elks, or remote-to-elks; only using the new ftp client, or repeatable with macOS client, and which way the file transfer is going send or receive from ftpd, etc. In this way I can understand whether this is related to the FIN received with unprocessed data that was recently a problem (and now has debug printf turned off), or another problem. If the problem only occurs on QEMU, we will also need to see whether this has anything to do with SO_REUSEADDR sockets or not. Most of that debug printf is still turned on.
Both telnet and ktcp use select multiplexing for their network I/O, in case you haven't already noticed. |
FTP transfers into ELKS in QEMU using ftpd are not 100% reliable. Bytes get added and I haven't had time to look at it yet. Keep an eye on transferred file sizes.
Should this be a real problem, open a new issue. Please include exact details concerning the transfer: is this elks-to-elks, or remote-to-elks; only using the new ftp client, or repeatable with macOS client, and which way the file transfer is going send or receive from ftpd, etc. In this way I can understand whether this is related to the FIN received with unprocessed data that was recently a problem (and now has debug printf turned off), or another problem.
If the problem only occurs on QEMU, we will also need to see whether this has anything to do with SO_REUSEADDR sockets or not. Most of that debug printf is still turned on.
Will do - and like I said, this i QEMU only.
the first cut is just around the corner - with some fun stuff for experimentation : select() multiplexing.
Both telnet and ktcp use select multiplexing for their network I/O, in case you haven't already noticed.
No, I wasn't aware of that. I was - for some reason - under the impression that select was not fully tested in networking. Have no idea where I got that. Great to know.
thank you!
—M
|
Enhancements to allow easier operation of ftpd on QEMU, for testing.
@Mellvik, tell me what you think about this. I'm already finding a number of potential issues between ftpd and QEMU.
The following changes are made in this PR:
In quick testing, I found the following issues (could be my mistakes):
Overall, this first pass makes things a lot easier for testing. @Mellvik, let me know if you're OK with commit.