Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTCE links dying with VM/Passthrough (PVM) #670

Open
HackerSmacker opened this issue Jul 5, 2024 · 7 comments
Open

CTCE links dying with VM/Passthrough (PVM) #670

HackerSmacker opened this issue Jul 5, 2024 · 7 comments
Assignees
Labels
QUESTION... A question was asked but has not been answered yet, -OR- additional feedback is requested. Researching... The issue is being looked into or additional information is being gathered/located. (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it.

Comments

@HackerSmacker
Copy link

HackerSmacker commented Jul 5, 2024

Hi folks,

I've ran into some hot water with a recent breaking change on Hercules, around the 4.6 mark; this change has seemingly broken PVM. I have tested the following configurations of PVM and found that SOME work:

PVM 2.1 (1993):

  • VM/SP 5
  • VM/SP 6
  • VM/HPO 4.2
  • VM/HPO 5
  • VM/ESA 1.1.1 (370)
  • VM/ESA 1.2.2
  • VM/ESA 2.1.0

PVM 2.1 (1998):

  • VM/ESA 2.4.0
  • z/VM 4.2
  • z/VM 4.4
  • z/VM 5.3
  • z/VM 6.2
  • z/VM 6.3
  • z/VM 6.4
  • z/VM 7.1

Specifically, VM/ESA 2.4 (my hub node) can talk to any not-XA+ VM (so, any 370-type VM). This behavior seems to be correct -- I can have any non-XA version of VM talking to any other non-XA version of VM or an XA version of VM, but, two XA versions of VM cannot talk to each other. There are no protocol differences between the different versions of PVM I used -- I only used different versions to try to gain more "period-accurateness" since I am somewhat lacking in different versions of PVM (I only have 5 versions, 2 of which cannot talk to the other 3). I think this may be related to this issue here: #640 I recall being able to revive the links in the past by recreating the devices, but, it was not a permanent fix.

Version info:

HHC01413I Hercules version 4.7.0.11119-SDL-gf7d2360a
HHC01414I (C) Copyright 1999-2024 by Roger Bowler, Jan Jaeger, and others
HHC01417I ** The SDL 4.x Hyperion version of Hercules **
HHC01415I Build date: Jul  2 2024 at 15:01:43
HHC01417I Built with: GCC 13.2.1 20230801
HHC01417I Build type: GNU/Linux x86_64 host architecture build
HHC01417I Running on: server1 (Linux-6.6.8 x86_64) MP=32
HHC01417I Built with crypto external package version 1.0.0.52-ga5096e5
HHC01417I Built with decNumber external package version 3.68.0.102-g3aa2f45
HHC01417I Built with SoftFloat external package version 3.5.0.105-g4b0c326
HHC01417I Built with telnet external package version 1.0.0.63-g729f0b6

The link devices are defined as such, for example:

# VM/ESA 2.4
0441    CTCE    3501 127.0.0.1 3502

# z/VM 6.2
0441    CTCE    3502 127.0.0.1 3501

The device was initialized with CP SET RDEVICE 441 TYPE CTCA beforehand, though the autosense detects the correct device type.

@Fish-Git
Copy link
Member

Fish-Git commented Aug 27, 2024

@HackerSmacker:  Have you tried using the 4.8 'develop' branch of Hercules yet? Does the problem exist there too? Or does it only fail with version 4.7? Some minor(?) changes where made to CTCE logic since 4.7 was released that only exist in version 4.8-DEV, so you might want to give 4.8 a try.

If 4.8 still fails the same way, then we'll obviously have to dig into your issue a little deeper.

Thanks.

@Fish-Git Fish-Git added QUESTION... A question was asked but has not been answered yet, -OR- additional feedback is requested. Researching... The issue is being looked into or additional information is being gathered/located. (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it. labels Aug 27, 2024
@Fish-Git
Copy link
Member

Also, a SIE fix was recently made to 4.8-DEV too (which fixed a problem with VM/ESA 2.4), which might also impact what you're doing, so again, please give our 4.8 'develop' branch a try and let us know whether it works any better or not. Thanks.

@Peter-J-Jansen
Copy link
Collaborator

@HackerSmacker : Issue #640 is indeed the latest CTCE fix that may be helpful to you. I suggest you to build 4.8 development branch commit a291e7e (or later) and try that. If it still does not work, both Hercules logs would be needed to try researching the problem. Thanks.

@HackerSmacker
Copy link
Author

HackerSmacker commented Aug 28, 2024 via email

@HackerSmacker
Copy link
Author

I've compiled it, and, it's running a few different versions of VM. I'll chime back in tomorrow with test results for PVM, VTAM, RSCS, and TSAF (whether or not the links die). I'm testing VM/ESA 1.1, 1.2, 2.1, 2.4, z/VM 4.4, 5.3, and 6.4.

@HackerSmacker
Copy link
Author

Okay, I've let it run for about a day, and, RSCS/PVM/VTAM are rock-solid (so far, this might change later), but, TSAF (at least, on z/VM 4.4 and VM/ESA 2.4) still shows no hope. I've read through #640 but I'm still getting that dreaded SET_370_MODE error:

02:26:39 ATSL1Y795I Retry limit exceeded on unit 0E50 SET_370_MODE
02:26:39 ATSL1Y708E An attempt to reset link 0E50 has failed
02:26:39 ATSMRX520I Synchronization is now NORMAL

The Herc console (with ctc debug on e50) reports the following:

HHC05079I 0:0E50 CTCE: -> 0:0E51 #0011 cmd=RST=00 xy=aa->Aa l=0000 k=0F500510              w=0,r=0 SENSE=4100 CLEAR                                   
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0012 cmd=RST=00 xy=aa->aa l=0000 k=0F500513              w=0,r=0 SENSE=4100 HALT                                    
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0013 cmd=NOP=03 xy=aa->Aa l=0001 k=0F510411 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0014 cmd=NOP=03 xy=aa->aa l=0001 k=0F510416 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0015 cmd=SEM=C3 xy=an->an l=0001 k=0F5104D7 Stat=0C CC=0 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0016 cmd=WRT=01 xy=an->an l=03FC k=A8A1E217 Stat=02 CC=1 w=0,r=0 SENSE=4100                                         
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0017 cmd=WRT=01 xy=an->an l=0034 k=26376CD9 Stat=02 CC=1 w=0,r=0 SENSE=4100

The other side (VM/ESA 2.4) has the same behavior... I'll continue to look into it; I got interrupted with a 4-hour-gap as I was configuring it, and, as such, I do not recall if I ever saw the link go up.

@HackerSmacker
Copy link
Author

Alrighty... I'm a few days in and there haven't been any issues at all. That fix definitely did something, but, I'm still at a loss for TSAF; it's definitely user-error on my end though, I suspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QUESTION... A question was asked but has not been answered yet, -OR- additional feedback is requested. Researching... The issue is being looked into or additional information is being gathered/located. (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it.
Projects
None yet
Development

No branches or pull requests

3 participants