Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bioshd] Implement park for ancient MFM/RLL disks #1977

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

vkoskiv
Copy link
Contributor

@vkoskiv vkoskiv commented Aug 26, 2024

This patch still requires work before merging IMO, but I wanted to file this PR now to gather some feedback.
I'm running two ST-412 hard disks on my IBM 5150+5161 system so I don't ever have to worry about storage again. I noticed that there doesn't seem to be any support in ELKS for parking hard disks of this vintage, so I took a stab at adding it.
There was some discussion about this in 2020, but I assume most people playing around with ELKS aren't running disks this old :]

It seems to me that the correct way to do this is to park the drive in bioshd_release().
This does work for unmounting non-root disks, but there is a special case for the root drive, where the drive is remounted as read-only instead, and bioshd_release never gets called.
So my hack to ensure all drives are parked when shutting down is to just park every drive in sys_reboot(), which feels very hacky.

I'd welcome any advice on a better approach, and any nits/comments on my implementation, as I'm not too familiar with writing kernel code.

Tangentially related, I observed that the userland reboot and shutdown commands don't seem to unmount all mounted volumes. I have a patch that makes them iterate all mounted volumes to unmount them before a shutdown/reboot, but I'll leave that for another PR, and pending more discussion. Shouldn't the kernel sync/unmount mounted filesystems before shutting down? Or is that generally left to the userspace to handle?

@ghaerr
Copy link
Owner

ghaerr commented Aug 26, 2024

Hello @vkoskiv,

Thanks for the PR! Your code looks well-written; I'll comment separately on various kernel- and ELKS-related coding issues next to the source lines for better clarity.

It seems to me that the correct way to do this is to park the drive in bioshd_release().

In general, yes, this is probably much better than calling specialized driver routines from other areas of the kernel. This is likely why the initial build is failing, as we build for IBM PC, PC-98 and embedded 8018x systems, each of which may use different configurations (or none) of he BIOS disk driver.

Shouldn't the kernel sync/unmount mounted filesystems before shutting down? Or is that generally left to the userspace to handle?
There was some #437 (comment),

The comment from 2020 suggests shutdown unmounting disks and then parking, while reboot would just unmount without parking. This might be a good way forward, although there are times when a reboot is wanted without having an unmount, as that can cause further I/O which may not be desired, and they should be a way to do that. Perhaps there should be an option to either of those commands that would not unmount or park all disks, etc. We should discuss this more. Keeping the parking and/or multi-filesystem unmount controlled only through command line programs will give the most flexibility.

Agreed that there should be a way to unmount all drives, and then (optionally?) park the drives in the normal case. This could be done by enhancing the umount system call and passing a flag to park or not programmatically, which would then get passed to the driver release entry point.

but there is a special case for the root drive, where the drive is remounted as read-only instead, and bioshd_release never gets called.

If the "park" flag is set for unmount, the remount of root as read-only could be skipped.

I observed that the userland reboot and shutdown commands don't seem to unmount all mounted volumes. I have a patch that makes them iterate all mounted volumes to unmount them before a shutdown/reboot

Perhaps that should join this PR, but its certainly OK to keep them separate. We can use the ustatfs system call in order to enumerate the mounted volumes (see elkscmd/sys_utils/mount.c).

Thank you!

@@ -24,6 +24,8 @@

static int C_A_D = 1;

void bioshd_park_all(void);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No forward declarations are allowed in .c files, all would need to go somewhere in the include/linuxmt or include/arch header files. In this case, the function will likely live as a static function within the BIOSHD driver.

The reason for the build failing is that the code below is calling a specific block driver function which isn't part of the configured build. We'll be better off passing a flag from a user mode application that can find its way to the standard block device interface functions, which will eliminate this problem.

@ghaerr
Copy link
Owner

ghaerr commented Aug 27, 2024

As I look through the kernel source a bit more, I don't yet see any easy way of passing a flag to the block driver release routine, and some thought needs to go into whether to change the parms of the umount syscall. Using an ioctl comes to mind, but that requires a file descriptor, something the umount command line program won't have. I'll continue to think about this a bit more.

@ghaerr
Copy link
Owner

ghaerr commented Aug 27, 2024

Another way this might be done would be to move the sync, unmount (all), sleep, and park to within the kernel sys_reboot function (calling park like you have designed it, actually). The user programs currently trying to do nearly the same thing include reboot, shutdown and poweroff, all barely different. These could be combined by hard linking to a single executable that calls sys_reboot which already has a flags parameter for the various slightly different functions. The bioshd_park_all call could be #ifdef CONFIG_BLK_DEV_BHD and thus only called when the BIOS HD driver is linked in and the park flag set. My apologies if you've already analyzed all this and have come to the same conclusion with this PR.

Some advantages of using sys_reboot are 1) we don't build up specialized flag-passing in the kernel for what will be an unused function on almost all systems, 2) the poweroff, reboot, and shutdown programs could be consolidated and save disk space for floppy systems, which is valuable, and 3) some of the current code could be reworked in the reboot function, as, for instance, there are no programs currently using the "change ctrl-alt-del function" flag of the reboot system call. Things would actually become simpler. I also don't think we need separate shutdown and poweroff programs.

Parking the disk(s) only during poweroff/shutdown is probably better than during driver release, because its not certain that the disk should be parked every time its unmounted, as they may be mounted again. What do you think, is it bad to park disks too much, or should this only be done before poweroff?

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 27, 2024

Thank you for the thorough review! I applied some of the suggested fixes, but I'm struggling with call_bios() for the time being. I seem to be getting random lockups and now a divide by zero panic on boot whenever the patch for that is applied, even though the corresponding code path isn't even being called.
When I had a struct biosparms as a local in bioshd_park, the system would just freeze on boot, or print a bunch of dots and capital F letters and then freeze. I had a hunch that it might be trashing the stack somehow, so I moved that to file scope as a static variable like it is in bios.c, and now it's throwing a divide by zero error instead where it was freezing before.
I'll attach the patch in case there is some obvious issue I just didn't spot, but I'm out of ideas for the time being.

Beyond that issue, I think the approach you suggested in the last comment there makes sense to me. While parking more than necessary doesn't really cause harm beyond the expected mechanical wear the drive encounters in normal use, it also doesn't serve much of a purpose. We only really care about parking when the disk is actually spinning down with the power removed, as the heads crash on the disk surface by design. I doubt any hardware configuration this old would have hot-swappable drives where parking when unmounting would be relevant, so just doing it right before shutting down makes the most sense to me.

Amusingly, Github doesn't support standard git patches as attachments, so here's the patch on pastebin instead: https://pastebin.com/y6ix8x5i

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

now a divide by zero panic on boot whenever the patch for that is applied, even though the corresponding code path isn't even being called.

That's very strange, but I happen to be in the middle of a somewhat nasty bug hunt for a suspected divide-by-zero problem over here. I don't yet have a full answer, but I think our new divide-by-zero fault handler works OK. I'll keep you posted on that one.

Github doesn't support standard git patches as attachments

It might be easier just to do another git commit -a; git push to push your changes in another commit, rather than using a force-push. That will allow us to see the changes easier, and we can always squash the merge commit later.

Looking at your change, I don't know why its happening. I am thinking that rather than declaring another static struct biosparms bdt in bioshd.c, the entire bioshd_park routine should be moved to bios.c. There's already a struct biosparms bdt over there, and frankly all the other call_bios calls are there too. Try that and see what happens.

When I had a struct biosparms as a local in bioshd_park, the system would just freeze on boot, or print a bunch of dots and capital F letters and then freeze.

Yes, using any but a very few locals in any function is calling for a stack overflow, long story short the kernel can use upwards of 500+ bytes stack doing disk I/O by the time the fullest stack depth of functions is executed.

The dots on boot are sector reads of the kernel /linux image by the boot sector, and the 'F' means a failed read for /bootopts on FAT, which is normal for MINIX images. The reason you don't see much else I think is very tricky - also being investigated in the above bug hunt link - it appears that QEMU runs serial and possibly console I/O asynchronously from the code being executed and a hard crash sometimes cuts output short. However, in this case it could just be that for some reason the kernel Image (/linux) is corrupted. I have seen this occur when the kernel is not fully rebuilt. Use make kclean; make to ensure it is. You're welcome to post a screenshot of the boot screen. After the 'F', setup.S normally displays tXXXX fXXXX dXXXX for the text, fartext and data segment values before jumping to the kernel itself, which should display "Direct Console..." on the next line.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

That's very strange, but I happen to be in the middle of a somewhat nasty bug hunt for a suspected divide-by-zero problem over here. I don't yet have a full answer, but I think our new divide-by-zero fault handler works OK. I'll keep you posted on that one.

Very interesting analysis on that issue! Though I'm doing all my testing on my physical 5150, so it's unlikely that the issue is caused by a QEMU bug, at least in my case.

The dots on boot are sector reads of the kernel /linux image by the boot sector, and the 'F' means a failed read for /bootopts on FAT, which is normal for MINIX images. The reason you don't see much else I think is very tricky - also being investigated in the above bug hunt link - it appears that QEMU runs serial and possibly console I/O asynchronously from the code being executed and a hard crash sometimes cuts output short. However, in this case it could just be that for some reason the kernel Image (/linux) is corrupted. I have seen this occur when the kernel is not fully rebuilt. Use make kclean; make to ensure it is. You're welcome to post a screenshot of the boot screen. After the 'F', setup.S normally displays tXXXX fXXXX dXXXX for the text, fartext and data segment values before jumping to the kernel itself, which should display "Direct Console..." on the next line.

I forgot to note that the dots and F letters issue happens quite far into kernel init, with a bunch of init output already on screen, then the cursor jumps up to roughly the center of the console output and overwrites a portion of that output with the dot/F mess. I guess something goes wrong, and it's jumping back to the initial bootstrap routine, perhaps? Again, this is on my physical 5150. I can grab a 'screenshot' today after work when I do more testing. The freezes/dots issue seemed to be triggered after the second hd*/fd* listing, but I'll confirm that with the screenshot later.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Looking at your change, I don't know why its happening. I am thinking that rather than declaring another static struct biosparms bdt in bioshd.c, the entire bioshd_park routine should be moved to bios.c. There's already a struct biosparms bdt over there, and frankly all the other call_bios calls are there too. Try that and see what happens.

I moved the parking implementation to bios.c now, and the whole thing seems to work with call_bios().

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Also renamed to bios_disk_park() to better match the other functions in biosparm.h

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

I moved the parking implementation to bios.c now, and the whole thing seems to work with call_bios().

I must say, that's absolutely strange. It might be worth diving in to see why two separate static declarations of the same variable (if that's even the real issue) caused this to break, but I'm glad to hear it is now working. I'll take a note of the previous issue for another day.

Extern declarations aren't allowed in .c files, the extern struct drive_infot drive_info[] will have to move to biosparms.h, along with the forward decl for bios_disk_park.

Given all the complexity discussed as well as the strange unsolved problem with the multiple static declarations, it might be better to add park capability enhancements in multiple PRs. I'm inclined to go with your original thoughts of parking the drives only in the poweroff/shutdown portions of sys_reboot(). The direct call to bios_disk_park from there can be #ifdef CONFIG_BLK_DEV_BHD. You'll also want to remove the park from the bios_release entry point, as our discussion is tending towards the idea of only parking at power off, not when a disk is unmounted. (There are also cases where a drive may be opened and released quickly by various programs, we would not want parking in those cases either).

Let me know when you're closer to finalizing this portion, and I'll turn back on the CI workflow for you.

Thank you!

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Addressed the forward declare in sys.c, it's in biosparm.h now like the rest of the bios.c stuff. I was wondering if I should mark this as BFPROC too, but I wasn't sure what the rationale for using that is. I'm assuming far calls are a bit slower, so seldom used functions are marked with that to save space in .text?

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

I'll do those changes you just mentioned too, and then ping you :]

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

I was wondering if I should mark this as BFPROC too,

Yes, nice catch. Mark it BFPROC. The reason is that we're very nearly completely out of .text space so groups of related functions are possibly placed in the single far section based on the configuration. Call speed is rarely an issue for disk-related functions, but FYI calling far from near is quick, but calling near from far requires thunking by the compiler to ensure the __cdecl passed arguments stay at their expected offsets.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Extern declarations aren't allowed in .c files, the extern struct drive_infot drive_info[] will have to move to biosparms.h, along with the forward decl for bios_disk_park.

Moving it there would require us to include genhd.h in biosparms.h for the struct drive_infot definition. Is that okay to do?

I'm inclined to go with your original thoughts of parking the drives only in the poweroff/shutdown portions of sys_reboot(). The direct call to bios_disk_park from there can be #ifdef CONFIG_BLK_DEV_BHD. You'll also want to remove the park from the bios_release entry point, as our discussion is tending towards the idea of only parking at power off, not when a disk is unmounted.

Agreed, having parking logic in multiple places isn't necessary, and adds unneeded complexity.

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

Moving it there would require us to include genhd.h in biosparms.h for the struct drive_infot definition. Is that okay to do?

No, we're trying to stay away from the complicatedness of having includes within includes. I'm not right looking at it now, perhaps the struct and extern definition should go in genhd.h instead then. If that can work and not be messy, including genhd.h in bios.c if it isn't already is fine.

In summary, its no problem to put the externs in the header file where they are declared, I just kind of jumped at the idea of having it in biosparms.h without much looking.

Sorry this is all more complicated than it should be - but we're getting there!

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Removing parking from bioshd.c also allows us to only expose bios_disk_park_all(), and leave bios_disk_park() static.
Now, the only place that uses it is sys_reboot() in sys.c

New finding: marking bios_disk_park_all() as BFPROC caused a freeze and just now a divide by zero when invoking shutdown, but leaving that out seems to make it work.

Sorry this is all more complicated than it should be - but we're getting there!

I've really enjoyed the thorough feedback and discussion, working on this PR has been quite enjoyable :]

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Moving the extern drive_info to genhd.h seems to work, though I did need to rename the conflicting drive_info anonymous struct in the ramdisk implementation in rd. That is now called rd_info, which I think fits a bit better, perhaps. Testing that on hardware, will then push that change too.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

One more note, earlier I forgot to call bios_disk_park_all() in the reset case in sys_reboot(), that is now fixed, so the disks get parked on reboot as well.
Still doing some more testing on my hardware, seems to be working quite well now!

@vkoskiv vkoskiv force-pushed the bioshd-park branch 2 times, most recently from e8ffddb to ad46cd7 Compare August 28, 2024 20:20
@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

Gated this thing behind CONFIG_BLK_DEV_BHD as well.

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

New finding: marking bios_disk_park_all() as BFPROC caused a freeze and just now a divide by zero when invoking shutdown,

Wow, another strange finding. Believe it or not, we don't usually have too much of that anymore. (Except for my current ongoing bug hunt in Mellvik/TLVC#76 which interestingly sometimes exhibits this behavior). It appears that when the CS segment gets mangled, this kind of thing happens on ELKS.

However, seeing new Divide by Zero faults could just be that we're now seeing what's actually happening, given that fault handler was very recently added to ELKS.

but leaving that out seems to make it work.

That's ok for now, but make entirely sure that the reason wasn't that a forward declaration was not BFPROC and the other was, or vice versa. This is actually the reason why forward decls aren't allowed in .c files - too easy to have this error and not reported by the compiler. Needless to say, there should be no warnings produced during kernel compilation.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

That's ok for now, but make entirely sure that the reason wasn't that a forward declaration was not BFPROC and the other was, or vice versa. This is actually the reason why forward decls aren't allowed in .c files - too easy to have this error and not reported by the compiler. Needless to say, there should be no warnings produced during kernel compilation.

I changed them both back and forth a few times when testing. That is, I added/removed BFPROC for bios_disk_park_all() in both bios.c and biosparm.h

I left BFPROC there for the static bios_disk_park() in bios.c, as that didn't seem to have any adverse effect.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

As for testing, I've only tested this on my 5150 system so far, and it is working as expected. Any other systems using the bioshd driver will now also perform this park, as long as the hard disk has <= 1024 cylinders. I don't foresee this causing any issues, it's just an extra seek for drives that feature built-in park. If there is a way to detect that a given drive requires parking with certainty, then that check could be added to bioshd_disk_park()

@ghaerr
Copy link
Owner

ghaerr commented Aug 28, 2024

If there is a way to detect that a given drive requires parking with certainty, then that check could be added to bioshd_disk_park()

Yes, I was thinking of just that and what how that might affect other users with unnecessary head movements. It might be worth some research... is there a min or maximum disk size typical of these very old disks? Perhaps only do it for very small hard drives? It's probably not worth writing a bunch of code to query the disk interface, unless it would be simple. Unfortunately, all our new kernel features have to sit core-resident even when not used.

After commit, I plan on testing playing with the BFPROC issue on QEMU or another emulator.

@vkoskiv
Copy link
Contributor Author

vkoskiv commented Aug 28, 2024

I tried to find out if there is a commonly accepted way to detect this, but nothing really came up. As far as I know, all IDE drives autopark, as well as a few of the later MFM/RLL drives.
The implementation of this PR is now complete I think, pending more testing and feedback, of course. Let me know if you find anything. I'll focus on some other neat stuff I have planned in the meantime :]

Really old drives, such as the two ST-412s in my IBM 5161, required
manual parking to ensure the r/w heads are over a safe location when the
disk spins down and the heads (intentionally!) crash onto the disk
surface.

In DOS land, this vital task was left up to the user, in the form of a
PARK command to be run before shutting down. In ELKS however, the kernel
should manage this automatically. An INT 13/AH=0Ch is invoked to seek
the disk to the last track when unmounted, and to all drives right
before halting the system.
@ghaerr ghaerr merged commit 33a7c93 into ghaerr:master Aug 29, 2024
2 checks passed
@ghaerr
Copy link
Owner

ghaerr commented Aug 29, 2024

Thank you @vkoskiv!

@tyama501
Copy link
Contributor

Hello @ghaerr and @vkoskiv

Is this park also called from PC-98 shutdown?
PC-98 does not have int13 so I haven't checked yet but it may have problem.

Reboot,
I haven't supported reboot command for PC-98 yet so it failed previously but it didn't stuck.
I may support reboot by IO control but it would be nice if we can keep it without stuck until then.

Thank you.

@ghaerr
Copy link
Owner

ghaerr commented Aug 29, 2024

Is this park also called from PC-98 shutdown?
PC-98 does not have int13 so I haven't checked yet but it may have problem.

Currently, yes if PC-98 defines CONFIG_DEV_BIOS_BHD, which I think it does. This will mean the defines in sys.c will need to change to:

#if defined(CONFIG_BLK_DEV_BHD) and !defined(CONFIG_BLK_DEV_BHD)

I will make this change, as otherwise the system may crash on reboot/shutdown/poweroff on PC-98.

I haven't supported reboot command for PC-98 yet so it failed previously but it didn't stuck.

I plan on a simplification/rewrite of the reboot() system call code, and will take this into account, allowing for you to submit a PC98-compatble way of handling reboot.

Thank you!

@tyama501
Copy link
Contributor

Hello @ghaerr

Does the shutdown for PC-98 still call the bios_disk_park_all?
The shutdown is doing sys call,
and sys_reboot in the kernel/sys.c is called right?
(It seems that the PC-98 emulator does not crash when shutdown but
if it is still calling int13, I would like to add ifdef somewhere.)

Thank you.

@tyama501
Copy link
Contributor

tyama501 commented Sep 16, 2024

(Or it may be calling int1B with wrong paramaters by call_bios.)

@ghaerr
Copy link
Owner

ghaerr commented Sep 16, 2024

Does the shutdown for PC-98 still call the bios_disk_park_all?

Yes, the disk park function is called during the sys_reboot syscall for all platforms that have the BIOS fd/hd driver configured. This performs an INT 13h function 0Ch disk seek function using call_bios to the last disk cylinder for IBM PC, and an INT 1Bh for PC-98, which is likely incorrect. I notice that that was coded using a literal 0x0C rather than a define in biosparm.h so it went unnoticed for PC-98 incompatibility.

Is there a BIOS track seek function for PC-98, or should the entire function be commented out on PC-98?

@ghaerr
Copy link
Owner

ghaerr commented Sep 16, 2024

(Or it may be calling int1B with wrong paramaters by call_bios.)

Yes - it is calling INT 1Bh function 0Ch for PC-98, since it is using call_bios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants