-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
z/VM 7.2 IPL'ing as guest of itself CCW Command Rejects Aaron says "quick fix" #572
Comments
I doubt it. I suspect he just worded things sloppily, but what he actually meant to say was: "the X'02' bit needs to be turned on in byte 6." Byte 6 of the z/VM RDCBK is defined as: *** Bytes defined for CKD/ECKD DASD RDC Features field 0006 6 Bitstring 1 RDCVFAC Program-visible facilities .... 1... RDCSSCSP X'08' RDCSSCSP Set System Characteristics is supported .... .1.. RDCSSCRC X'04' RDCSSCRC Set System Characteristics has been received for this Path Group .... ..1. RDCPRFX X'02' RDCPRFX Prefix CCW supported & enabled If you just blindly change byte 6 to the hard coded value X'D2', you might end up turning some of its bits on that were previously off and vice-versa. For example: if the the value in byte 6 happens to be X'DC', then you need to change it to X'DE', not X'D2'! You need to be very careful how you write your EXEC, Charles. |
|
I've always used the generic cu=3990 . I'll add the -6 and see what happens.
Here's my 2nd Level UserID:
* * * Top of File * * *
USER SSIJEDI1 SHOWTIME 512M 2G BCEFGH
IPL ZCMS
MACHINE Z
OPTION DEVINFO DEVMAINT MAINTCCW LNKS LNKE LNKNOPAS
CONSOLE 0020 3215
SPOOL 000C 2540 READER *
SPOOL 000D 2540 PUNCH A
SPOOL 000E 1403 A
*
LINK MAINT 0190 0190 RR
LINK MAINT 019D 019D RR
LINK MAINT 019E 019E RR
LINK SSIJEDI2 2723 2723 MW
LINK SSIJEDI2 2724 2724 MW
LINK SSIJEDI2 2725 2725 MW
LINK SSIJEDI3 3723 3723 MW
LINK SSIJEDI3 3724 3724 MW
LINK SSIJEDI3 3725 3725 MW
LINK SSIJEDI4 4723 4723 MW
LINK SSIJEDI4 4724 4724 MW
LINK SSIJEDI4 4725 4725 MW
MDISK 0191 3390 6501 10 VMCOM1 MW READ WRITE MULTI
MDISK 0720 3390 DEVNO 0720 MW
MDISK 0721 3390 DEVNO 0721 MW
MDISK 1723 3390 DEVNO 1723 MW
MDISK 1724 3390 DEVNO 1724 MW
MDISK 1725 3390 DEVNO 1725 MW
MDISK 1726 3390 DEVNO 1726 MW
Kind regards,
Charles
|
General FYI regarding email replies:I would very much appreciate it if you would not respond/reply to GitHub Issues via email. I would much prefer that you instead respond/reply directly via the GitHub Issues web page itself: When you reply directly via their web page, I can make minor edits to your reply so it is more readable (prettier) by editing the fonts being used, formatting of log messages, etc. When you reply via email however, I cannot edit your reply (GitHub does not allow it), so oftentimes it is much harder (more difficult) to read. GitHub also does not allow attachments in their email replies either, making it impossible to receive a file that may have been requested from you. It is up to you whether or not you want to take the time to reply via their web page or continue to reply via email, but it is generally preferable that you reply directly via their web page instead. Especially if you need to attach a file that was requested from you. Thanks for understanding. |
I'll gladly comply by responding only here. I'd never used GitHub except as an observer and couldn't figure out how to specify a fixed font, I think I've found that now. I used a fixed font in my e-mail reply and it appears Git lost it somehow. |
Charles, Fish is correct. D2 was based on your specific display results. As I indicated in my final comment, when PFX command was added, byte 6, bit 6 should have been turned on. Just as it is indicated in the RDCBK that Fish posted. If you write a Rexx exec to turn the bit on, use the BITOR function in Rexx. There are other ways around it, but its not worth pursuing. Turning this bit on will add support for PFX command for guest running under zVM, both for minidisks and dedicated devices. Best regards, |
(I personally prefer the PDF)
As I explained, GitHub does not allow formatting of emails. That's why I prefer that people reply to GitHub Issues directly via their web interface instead. When you post your comment/reply via their web page, you can use markdown to format your comment/reply however you like. Depending on what you are posting, this can make a HUGE difference in readability. |
Thanks, Aaron. |
Charles, Since you're the zVM Jedi (cool nickname btw!), maybe you can help me. I'm trying to recreate your problem by IPLing a second level z/VM 7.2 under my existing z/VM 7.2, but things aren't working out too well. (It's been YEARS since I've messed with such things!) Each time I do my IPL, it displays the ipl/startup messages on my CMS userid's console, and invariably gets to a point where it says I need to reply to something. Only problem is, I don't know how to reply! PLEASE NOTE: I know what I want to reply, but I can't seem to figure out how to reply! Just entering the reply and pressing enter accomplishes nothing. I don't know if my terminal conmode is wrong, or if I should dial into my userid before IPLing, or something else. I seem to recall I might need a different LINEND setting or something? (I forget! It's been too long since I've done this shit!) HELP! p.s. I'm not seeing any type of I/O errors so far, but I'm not convinced I've gotten far enough into the second level IPL to reach that point yet. |
p.p.s. Here's the directory entry I've defined, and the exec I'm using to IPL my second level system with:
Here's the exec I use to IPL with:
|
Hi Fish, |
You seem to have it all correct. Once logged on to your 2nd level ID, and you invoke that startup EXEC, the next thing you should see is the z/VM "startup" messages, then the prompt for what sort of start WARM, FORCE, COLD, plus the other PARMS you'd probably want to say FORCE since I don't know what shape your 2nd level would be in. You should be able to enter something on the command line such as then it will ask for TOD, just hit "Enter" to that, and then it will proceed from there. If it isn't responding to what you reply, and then "Enter" I'm not sure what your problem might be. At that point your virtual console is "talking" to your 2nd-level system, the different LINEND is if you need to pass something to 1st-Level CP . Since you're DEDICATE'ing the DASD in the CP Dir you don't need the ATTACH'es. 3270 is the correct CONMODE setting. I use Tom Brennan's Vista3270 emulator product in Windows. If you're using some X-Terminal 3270 emulator, I've never stepped into that world. I can't think of any other show-stopper but I've just awakened. You are getting to here, right? : 07:14:11 z/VM V7 R2.0 SERVICE LEVEL 2001 (64-BIT) 07:14:11 SYSTEM NUCLEUS CREATED ON 2020-07-29 AT 16:50:40, LOADED FROM M01RES 07:14:11 07:14:11 **************************************************************** 07:14:11 * LICENSED MATERIALS - PROPERTY OF IBM* * 07:14:11 * * 07:14:11 * 5741-A09 (C) COPYRIGHT IBM CORP. 1983, 2020. ALL RIGHTS * 07:14:11 * RESERVED. US GOVERNMENT USERS RESTRICTED RIGHTS - USE, * 07:14:11 * DUPLICATION OR DISCLOSURE RESTRICTED BY GSA ADP SCHEDULE * 07:14:11 * CONTRACT WITH IBM CORP. * 07:14:11 * * 07:14:11 * * TRADEMARK OF INTERNATIONAL BUSINESS MACHINES. * 07:14:11 **************************************************************** 07:14:11 07:14:11 HCPZCO6718I Using parm disk 1 on volume VMCOM1 (device 0700). 07:14:11 HCPZCO6718I Parm disk resides on cylinders 1 through 120. 07:14:12 07:14:12 HCPIIS954I DASD 1725 VOLID M01P01 IS A DUPLICATE OF DASD 0705 07:14:12 HCPIIS954I DASD 1723 VOLID M01RES IS A DUPLICATE OF DASD 0703 07:14:12 HCPIIS954I DASD 1724 VOLID M01S01 IS A DUPLICATE OF DASD 0704 07:14:12 HCPIIS954I DASD 0721 VOLID 720RL1 IS A DUPLICATE OF DASD 0701 07:14:12 HCPIIS954I DASD 0722 VOLID 720RL2 IS A DUPLICATE OF DASD 0702 07:14:12 HCPIIS954I DASD 0720 VOLID VMCOM1 IS A DUPLICATE OF DASD 0700 07:14:12 Start ((Warm|Force|COLD|CLEAN) (DRain) (DIsable) (NODIRect) 07:14:12 (NOAUTOlog)) or (SHUTDOWN) <---------------------- what 'cha want ? 07:14:19 WARM DRAIN NOAUTOLOG <---------------------- plus hit 'Enter' 07:14:19 NOW 07:14:19 CDT WEDNESDAY 2023-06-14 07:14:19 Change TOD clock (Yes|No) <---------------------- just 'Enter' to this 07:14:21 07:14:21 The directory on volume M01RES at address 0703 has been brought online. 07:14:28 HCPWRS2513I 07:14:28 HCPWRS2513I Spool files available 1614 07:14:30 HCPWRS2512I Spooling initialization is complete. 07:14:30 DASD 0704 dump unit CP IPL pages 25157 PGMBKs DEFAULT FRMTBL DEFAULT 07:14:30 HCPAAU2700I System gateway ZVMJEDI identified. 07:14:30 z/VM Version 7 Release 2.0, Service Level 2001 (64-bit), 07:14:30 built on IBM Virtualization Technology 07:14:30 There is no logmsg data 07:14:30 FILES: 0056 RDR, 0014 PRT, NO PUN 07:14:30 LOGON AT 07:14:30 CDT WEDNESDAY 06/14/23 07:14:30 GRAF 0020 LOGON AS OPERATOR USERS = 1 HOLDING ZVMJEDI |
Actually, you should be getting the first Command Reject even before the "What kind of start do you want??" message appears; it will occur the moment it tries to bring the Object Directory online.
|
Eureka! But don't start cheering yet. Well, ok... cheer a little, but not too loudly. I noticed you DEDICATE'd your CP VOLS, instead of using DEVNO statements as I did. I changed my 2nd-level ID to use DEDICATEs and now mine is coming up cleanly, NO Command Rejects anywhere and AUTOLOG1 brought up all The Usual Suspects. So far, so good. This is a great clue, though, because what it tells us is: when a volume is DEDICATE'd, that changes completely how the I/O is handled between Host and Guest. So "something" is going wrong between Host 7.2 and Guest 7.2 as they pass the I/O back and forth when DEVNO is used. If I remember right, and I'm not sure I do, when a volume is DEDICATED, the Guest gets to do its own I/O and Host CP just watches. Or maybe it's the other way around. Aaron???? Geez, I hate getting old! Unfortunately, this will preclude this 2nd level z/VM from ever being part of a true 2nd-level SSI cluster, because each Member has to see everyone else's CP OWNED volumes, you'll notice I had LINK statements to 3 other eventual Members. Thus, why full-pack DEVNO was necessary. See Page 79 of the z/VM 7.2 Installation Guide: Anyway, that's what I've discovered from here. I hope it will be helpful. This much success WILL allow the creation of what back in the day was called CSE ( Cross-System Extension, the predecessor of SSI ) but without being able to share SPOOL. Thanks. Charles |
The logic in zVM for CCW translation is different between Dedicated/Attached and Devno/Minidisk. The bottom line is it should work either way. But just an FYI: the first thing it checks is Byte 6, Bit 6 of RDC. If it is not on, then it will go thru a series of checks with control unit models, starting with 2107 then 2105, then 1750, 3390-6.. etc. |
Charles / zVMJedi, Have you tried with MDISK statements including the MWV option? You could place all of these as needed for z/VM SSI members in a VM, e.g. called VMDUMMY, that never get's IPL'd. It's the technique to run z/OS Parallel Sysplex ("PS") members under z/VM, and they too all need access to all 2nd level z/OS PS DASD's. (This uses a non-IPL'd VM named MVSDUMMY; each actual z/OS PS member then LINKs MW to MVSDUMMY's full pack MDISKS.) Cheers, Peter |
Hi Peter, I believe I did try MDISK statements when I first started this endeavour and they didn't work. They gave the same Command Rejects as DEVNO. I'll try again so MDISK can be included in The Usual Suspects round-up. I used that same trick back in the day when my Guests were DOS/VSE and VSE/ESA. |
THANK YOU AARON!! It worked! My second level system is up and running! Unfortunately(?) however, I'm not seeing any of the I/O errors that Charles is seeing. |
You should not see the errors for dedicated/attached devices on the control unit types I previously posted. But it will still fail on minidisk/devno devices and that should not be. |
Fish! See my Comment from 3 hours ago. |
Actually I do. When the first level system is IPL'ed, it complains(?) about duplicate volume ids (volsers) or something (I forget what the exact message was; I didn't bother to write it down), more than likely because my second level system's dasds are exact copies of my first level system's dasds:
As you can see, I've simply defined another set of dasd (with different device addresses/CUUs) using the exact same set of base images, but specifying a different set of shadow files for each. (All of my base dasd images are marked read-only; I'm running on Windows.) I felt that was the fastest/easiest way to get a second level system created. (I didn't want to have to go through a full formal install.) In any case, it certainly doesn't hurt anything to detach and re-attach my dasds, right? |
Well, it's not happening to me. |
Dedicating a device and attaching it is the same thing. In either case, the device can not be shared and it follows the logic path in CCW translation. DEVNO/MINIDISK allows for device sharing and has a more strict CCW translation. |
It's not happening because you're using ATTACH'd volumes, which take a different I/O path. As soon as you try to use MDISK or DEVNO statements in your 2nd-level UserID definition, you'll get the pre-Fourth-of-July fireworks show. |
Interesting!
Yes, I noticed that in your sample directory statements that you were using, but didn't think it was anything important. (I don't "do" SSI clusters. I like to keep things simple. SSI clusters are for z/VM Jedis like you, not for mere mortals like me.<G>)
That's good to know. Unfortunately, that's kind of bad news for me, because it means I am unable to recreate your problem in order to verify that my quick fix is good or not. (I am NOT interested in trying to set up an SSI cluster! I'm a Hercules developer. I don't have the time to learn how to operate all the many different operating systems that Hercules is able to run. I have my hands full with Hercules. I let you guys -- the Hercules users -- have all the fun with that!) Give me a few minutes(?) and I'll commit what I believe is the fix (according to Aaron anyway): initializing the Control Unit features string with the X'02' bit turned on in the 6th byte of the RDC. I'll let you know once it's been committed. Once it is, you should then simply do a pull (from the 'develop' branch of course) and rebuild your Hercules. Thank you to both you and Aaron with all the help you two have provided me. I really appreciate it you guys! |
Because Hercules is not (YET!) turning on the X'02' bit in the RDC byte you mentioned. That's why. (Hopefully!) But I'm about to fix that, so everybody just hang loose for a little bit... |
Saw it! And responded to it. |
I know that, but how to I turn them into MDISK or DEVNO statements? I don't really need to be sharing these dasds with any of the other VM users, so why use MDISK or DEVNO? Other than the fact that doing so will trigger the problem of course! I mean, I'm willing to try (if you can explain to me how to do it), but generally speaking, in normal situations, DEDICATE is the proper way to go about it, yes? |
Well, it's not about setting up an SSI cluster, it's just getting z/VM to IPL 2nd-level. Replace your DEDICATE statements with these:
adjusting the RDEV's and DEVNO's to the addresses of your second-level DASD, then in your 2nd-level ID, IPL your M01RES address and it should blow up beautifully. As for DEDICATE, "best practices" with DASD is "don't do it unless you have a really good reason". As Peter tossed in a few Comments ago, you want to define your "everybody uses these" DASD to a dummy NOLOG UserID, then LINK to that from whomever needs to see whatever. |
@arfineman, @zVMJedi, @Peter-J-Jansen, @wrljet: FYI:I have just committed some fixes to Hercules's E7 Prefix CCW handling that should resolve all known/reported issues: Charles? Please refresh your repository, rebuild and retest. Aaron? Please pull, rebuild and re-run all of your tests again too. I wrote a new Hercules runtest QA test for E7 Prefix based on the information you provided, and all of them now pass. I have not committed this test program yet, but will be doing so shortly. I am once again going to close this issue as having been RESOLVED. Thank you to everyone for all of your help! I couldn't have done it without you guys!
|
No problem @fish, I'll do some more tests still, which I'll continue reporting under this Issue. What I have found so far: A test with the current commit
The CCW trace:
As much as I've followed the interesting conversation with @arfineman, the CCW trace is far beyond my level of knowledge; perhaps they are meaningful to you. But the success with the non-SSI z/VM, and the Wait 902 failure only with the SSI z/VM, kinda supports my earlier suspicion. It would mean that a 1st level SSI build, prior to the x'E7' fix, has created what is for the x'E7' fixed build a corrupted DASD. I think. As stated, I'll do so more tests. including doing a 1st level SSI build with the x'E7' fixed SDL-hyperion. OK. it'll take some time, so please bear with me. Cheers, Peter |
Hi Fish, Why do I get this error when I try to trace?
The previous traces no matter what the cause, it should not have been '01' in byte 7 of the sense. That is format 0 message 1 "Invalid Command". E7 is not an invalid command. In regard to testing, I have been enjoying the generosity of @wrljet with providing a link to a pre-build version of Hercules with the latest fixes. Otherwise I have to get back to restarting my old build machine. Best regards, |
I'm in the same boat, alas. I had to pull my VS2017 environment because it was causing Excel to crash.
|
Really? That's crazy. Any ideas how/why? Bill |
I'll rebuild now with the latest. Stand by... (I really need to get that automated!) |
Some Merry Prankster called "TLSOfficeAdd-in.dll" installed by VS. I found a purported solution posted by another victim experiencing the same Excel crash caused by VS. They solved theirs by un-Registering the .dll but that didn't work for me. So I sadly pulled VS2017, it worked beautifully with all your kind help. |
Here you go: |
Thank you. You are a life saver. |
I wonder what's up with that?! I just checked my main system (which has VS2010, 2017, and 2022 installed, and I don't have that DLL. But, if you had installed VS "by hand" at some point (not with Hercules Helper) it might have installed some Office development "workload" that includes that DLL. This last build took me a little longer because I ran into a bug that sneaked in when I made things Hercules-Aethra aware a while back. So if you do use Hercules-Helper for Windows, you'll want to refresh your copy. Bill |
Hi Fish, This is excellent work. With the latest version that the Great @wrljet built, good tests succeed and bad ones fail. However, I did notice when the device trace is on, in most cases the first CCW is displayed twice, specially in cases of a failure. I don't know if it relates to secret feature I knew about Hercules that I was not going to disclose, for the concern that it may be fixed. I had noticed that when PGMTRACE is on, the failing CCWs are traced. I liked this feature so much, that I always started a PGMTRACE with a nonexistence program interrupt code. But the channel program that was rejected from the trace provided by @Peter-J-Jansen is valid and its failure should be investigated. Best regards,
|
I'm not so sure! I personally feel a compiler bug is more likely. Why? Because according to your provided CCW trace, the X'E7' Prefix CCW that z/VM is issuing is completely valid! When I first saw your message, I copied and added the above E7 CCW into my test program (which I'm still trying to get ready for committing), and guess what? It ran fine! No error whatsoever! I suggest doing this: Since we can see from the sense information in your trace that the CMDREJ is Format 0, Message 1 (sense 80000000 00FFFF01), I suggest inserting a simple
Because the above message begins with an error message number, Herc's logmsg logic should then identify (via the HHC00007I message) the exact source file and line number where that message was issued (as long as your Then we can go from there, by inserting some additional LOGMSG statements to display the values of the variables that triggered that CMDREJ, etc. Something else you could try too, is to do a full (non-specific) CCW trace on that device (just There are only 6 places that I can see in Hercules where a Format 0 Message 1 Command Reject is being issued, and they're all in Do all of your dasds specify /* LRE only valid for 3990-3 or 3990-6 (or greater) */
if (0
|| dev->ckdcu->devt != 0x3990
|| !(0
|| MODEL3( dev->ckdcu )
|| MODEL6( dev->ckdcu )
)
)
{
/* Set command reject sense byte, and unit check status */
ckd_build_sense (dev, SENSE_CR, 0, 0, FORMAT_0, MESSAGE_1);
*unitstat = CSW_CE | CSW_DE | CSW_UC;
return;
}
Nah! Don't bother doing that just yet! Do what I suggested above first. I believe the above will be much faster and easier and should absolutely without question identify precisely where the CMDREJ is being issued. Once we know that, completing the remainder of the puzzle should quickly follow. Good Luck! I hope that helps! |
Probably because you're not running the right version of Hercules! That new feature was only just added just a few days ago on June 23. Have you done a |
With Peter probably asleep, I'll help out by saying he surely must be specifying cu=3990-6 because when this all started, one of the first things he suggested to me in an e-mail was to be sure I had cu=3990-6 in my DASD statements.
|
Anomaly. Herc was changed a short while back to try and defer tracing for read type CCWs until after the data has been read (instead of always tracing CCW before they're processed like it was doing before). Because a control CCW, while not strictly a "read" type (it can actually be a write AND a read type, depending on what it's designed to do), it is traced once before it is processed (just like write type CCWs are) and then again after it has been processed (just like read type CCWs are), resulting in it being traced twice. Otherwise we would need much more complicated logic to determine whether it should be traced before or after or both. I wanted to keep things simple. Note that you will sometimes still get double traces even for non-control type commands, such as when a write type CCW fails: because it is a write type command, it is traced before it is processed, and then again when it fails. Weird, yes, but there you have it. (My goal when I made this change was to ensure we could see the data that was actually read from the device. Before my change, the CCW trace displayed whatever garbage happened to be in the read command's I/O buffer (which was misleading) and we would never see what was actually read. With the change however, now we can see the data that was actually read.) |
Good! That's one suspect eliminated. Thanks, Bill. |
No shit, Aaron! What do you think we're doing?! |
Fish is correct and mystery solved. I tested standalone with DASD that was not a Mod-6 and received format 0, message 1, which is the correct behavior. |
Duh! But according to Bill, Peter is supposedly specifying I'm wondering if when running in SSI mode, z/VM is somehow presuming an older model CU (Control Unit) is being used, or something like that? It's channel programs are obviously not the same as they are when running in non-SSI mode. I don't know SSI so I'm just speculating (guessing) here. Do know much about SSI, Aaron? Because I don't know shit about it! |
From what I know about Hercules, I don't think you can run SSI first level. SSI requires shared DASD and at least one CTC connecting every member to others. When you run SSI on different CPUs, all members will get downgraded to the lowest model CPU. This is to ensure if a virtual machine is relocated from one member to another, the virtual machine doesn't crash due to lack of CPU facilities. If I know the configuration details of the SSI setup, I may be able to assist. |
It can be done, it'll be a single-Member SSI, which is about as satisfying as kissing one's sister given the whole intent and purpose of SSI but it can be done. Hercules' DASD-sharing scheme and Peter's CTCE make it possible to add Members 2,3, and 4 if desired. If they're all running the same Version/Release of Hercules, in like Flynn. As for "Would it / could it / will it handle an LGR?" that's yet to be determined by some Brave Soul with a fetish for arrows in the back.
|
Apologies for wasting all of your precious time caused by this silly oversight of mine! Cheers, Peter |
Fantastic! Mystery solved.
Well don't feel too embarrassed, Peter, It's okay to feel a little embarrassed maybe, but you shouldn't feel overly embarrassed about it. You're human! Just like the rest of us. And humans sometimes make mistakes. Sometimes they're little ones, like this time. Other times they're big ones. But people make mistakes all the time. It's a part of life. I'm just glad you got things working! Take care my friend. |
The fix in addition to improved functionality also closed a bug that allowed LRE to execute on a device that did not support it. Because PFX was incorrectly bundled to LR. |
Everything working correctly here with this latest Build. Level1 z/VM 7.2 is running smoothly and all 4 Kluster Kids on the 2nd floor are happy campers as well. Now for RSCS and DirMaint and DB/2 for z/VM! Thanks again, Bill! And Fish! And everybody else who chipped in! |
Hello, Charles here.
Aaron said to open an Issue, so here it is.
He pretty well lays it out in the message thread about this issue, one last lousy little bit in byte 6 of the z/VM RDCBK Real Device Characteristics control block isn't being set correctly to handle PFX (CCW opcode X'E7'): byte 6 needs to be set with 'D2' instead of the 'D0' it contains otherwise.
And I'm running Hyperion 4.5 on a Windows Server 2008 R2 Host, so no tricky Linux builds.
Thank you, and there is no urgency about this; I can stuff those Byte 6's with 'D2' from an EXEC
for all my CP OWNED DASD. Maybe just slipstream it into 4.6 along with other fixes. I'm about to
pull 4.6 and get it going.
Regards,
Charles Perkins
The text was updated successfully, but these errors were encountered: