Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple trains in/waiting at spmutex enter region after server restart #492

Open
amalon opened this issue Jan 5, 2024 · 38 comments
Open

Comments

@amalon
Copy link

amalon commented Jan 5, 2024

Info

TCCoasters: v1.20.4-v2-SNAPSHOT (build: 334)
TrainCarts: v1.20.4-v2-SNAPSHOT (build: 1524)
BKCommonLib: v1.20.4-v1 (build: 1663)
Server: git-Paper-65 (MC: 1.20.1)

Bug

Description

When the server is restarted, trains waiting on an spmutex are permitted to enter the mutex region even if there is already a train in the region.

Expected behaviour

Waiting trains should continue to wait as if the server hadn't been restarted.

Actual behaviour

Trains can enter an spmutex region already occupied by other trains.

Steps to reproduce

Queue up trains at an spmutex and /stop the server, on start up watch the trains enter the spmutex region.

Here's an example that reproduces it:
2024-01-05_16 38 24

Click button twice, one cart stops at station, one waits. then /stop, restart server, and the 2nd cart has entered region and connected to the first.

@bergerkiller
Copy link
Member

Yeah I expect this, as the mutex locking state isnt saved to disk on shutdown. Maybe I should just do that and restore after reload

@amalon
Copy link
Author

amalon commented Jan 5, 2024

Yeh, it's probably only really an issue because my server is set up to turn off overnight and when no players are connected for however many minutes, so it tends to turn on and off several times a day.

@bergerkiller
Copy link
Member

As this is a much broader issue, Ill make fixing this a part of a partial 'trains.groupdata' rewrite fix. Trains need to remember what signs they had activated when they unloaded/server restarted. They should also remember any actions they had running at that time (launch actions). It's kind of an age-old bug that none of that works.

I'll make saving the mutex zones the train has occupied/is waiting for a part of that metadata saving.

@bergerkiller
Copy link
Member

traincarts-persistent-mutex-test-build.zip
This test build persistently saves and restores all of the following data:

  • Action (launch, waiting at stations, etc.) that trains are doing at time of server shutdown
  • Signs trains are on at shutdown, that they wont activate again after restart
  • All temporary pathing mutex zones that are active
  • The mutex locking state of all mutexes, for all trains that entered them

I did some testing of my own and it seems to work fine. However, its possible something isn't working right when tested on a larger scale.

If this build for some reason does not work, don't revert back to a spigot build as that will probably cause all trains to break. Downgrade instead to this build, which can handle the new group data format:
https://ci.mg-dev.eu/job/TrainCarts/1534/

@amalon
Copy link
Author

amalon commented Jan 17, 2024

It mostly appears to work for me with that test build, however I did have one train near an spmutex split apart on a point, leaving one cart on the wrong branch of the point:

[09:22:48 WARN]: [Train_Carts] Train 'train_circle3' split apart because: Maximum distance reached searching next cart
[09:22:48 WARN]: [Train_Carts] Search start: {rail={com.bergerkiller.bukkit.coasters.CoasterRailType@3acceb03 241/62/-729}, pos={241.5/62.937/-730.4695}, mot={0.0/-2.0E-4/1.0}, f=SOUTH}
[09:22:48 WARN]: [Train_Carts] Search end: {rail={com.bergerkiller.bukkit.coasters.CoasterRailType@3acceb03 239/62/-325}, pos={240.2581/62.9375/-327.711}, mot={-0.2418/0.0/0.9703}, f=SOUTH}
[09:22:48 WARN]: [Train_Carts] Cart that could not be reached: cart #1 of 6 [9696a910-f530-4d3e-8d7b-7e4d8f525924] at x=240.22541765443765 y=62.93729376206962 z=-724.2629314349915

I'll have a further play later to see if i can reliably reproduce it.

@bergerkiller
Copy link
Member

You mean it split at a junction? So the junction toggled when it spawned in?

@amalon
Copy link
Author

amalon commented Jan 18, 2024

it did, though i haven't seen that again. The issue I'm seeing regularly now is that a train that is already well into an spmutex zone, waiting at a station, thinks its approaching the zone rather than in it, with coordinates where it would have entered the zone. As a result another train enters the zone in front of it.

@amalon
Copy link
Author

amalon commented Jan 18, 2024

to clarify, the split train was at a junction, but its unclear how the junction toggled.

@bergerkiller
Copy link
Member

Are all these trains on the main "world" world, or is this a separate world that is loaded in by a plugin like multiverse? Want to make sure as later loading in of worlds could have an effect

@amalon
Copy link
Author

amalon commented Jan 18, 2024

this is a multiverse world, that isn't called "world".
The message in the log at boot does say:

[14:05:18 INFO]: [Train_Carts] [STDOUT] Load path mutex: MutexZonePath{sign={world=OfflineWorld{uuid=73044f1f-4597-46e1-9834-8dcb91e91607, name=club}, x=218, y=63, z=-662},type=SMART}
[14:05:18 WARN]: Nag author(s): '[bergerkiller, lenis0012, timstans, bubba1234119, KamikazePlatypus, mg_1999, Friwi, kpaulisse]' of 'Train_Carts v1.20.4-v2-SNAPSHOT' about their usage of System.out/err.print. Please use your plugin'
s logger instead (JavaPlugin#getLogger).
[14:05:18 INFO]: [Train_Carts] [STDOUT] Load path mutex: MutexZonePath{sign={world=OfflineWorld{uuid=73044f1f-4597-46e1-9834-8dcb91e91607, name=club}, x=218, y=63, z=-662},type=SMART}
[14:05:18 INFO]: [Train_Carts] [STDOUT] Load path mutex: MutexZonePath{sign={world=OfflineWorld{uuid=73044f1f-4597-46e1-9834-8dcb91e91607, name=club}, x=239, y=62, z=-722},type=SMART}

Perhaps OfflineWorld refers to a world that isn't loaded? Its unclear from the log when the worlds are loaded, presumably not until somebody joins. There is a Preparing start region for dimension minecraft:club in the log before these messages.

The train at a station in the zone doesn't show any mutex zone in /train status after reload but does without reload, allowing the waiting train to move into it, then when the one at the station starts to move it hits the spmutex of the now moving train and they deadlock.
(Its now refusing to reproduce the missing spmutex status after reload, or mutex breakage, but will double check that when I can)

I can also confirm that after a load with a train waiting on an spmutex over a junction (switcher sign), the train following it didn't switch the points the other way as it should have and goes the wrong way. This is the train that later may get split. Something to note here is that it first goes over a conditional skip (skipping a destination, station, and animate doors, if the destination isn't the one that would be skipped). So it shouldn't be skipping them at all, but i'm wondering if its re-evaluating the skip on reload (after hitting destination sign and switching to next destination on route, but before moving fully past the skip sign, which would then make it skip when it shouldn't), and then its weirdly skipping the switcher even though thats the next sign after the 3 sign skip.

I just saw a train come out of the skip area, having apparently done the door animation, but not stopped at the station and been switched correctly after the skip (or perhaps the junction was already pointing the right way).

Sorry thats all a bit hard to describe, maybe it gives some useful pointers. If its easier for you to join again just let me know.

@bergerkiller
Copy link
Member

offline world is a bkcl api to persistent knowledge of worlds even when worlds unload / load again. It shows name=club indicating the world is loaded at the time, so thats fine then and not a concern. Does the specific sign whose mutex didnt persist show up in the logging with a Load path mutex?

Skip state is another thing Ive attempted to make persistent so yeah sounds like that part isn't working right either then.

@bergerkiller
Copy link
Member

Found some issues, am working on fixing those first. They explain the switcher sign behavior you describe

@bergerkiller
Copy link
Member

bergerkiller commented Jan 18, 2024

traincarts-test-persistence-v2.zip
Please try this version. It should at least fix the issue with the switcher sign toggling the track. It might also fix the skip sign problem as they could be related if the skip sign is re-triggered.

Unsure if it impacts the pathing mutex situation. Let's start here.

Note that I did turn off that logging as it didnt add much value, I might add more specific logging if I need to dive deeper to fix the pathing mutex situation. On my server I didnt have this issue of the smart mutex zone disappearing on restart.

One thing I did notice is that if a train is stopped inside a mutex and the server restart, it shortly forgets what the forward moving direction was, and so starts scanning the track in reverse. This causes the pathing mutex to reverse back in the opposite direction "behind" the sign. I can probably add some sort of blocker at the sign to prevent it going too far as that's annoying

@amalon
Copy link
Author

amalon commented Jan 19, 2024

Thanks, things I've noticed running this version:

  • trains stopped at a station don't resume after restart
  • Hard to tell if trains going wrong way at junction near skips is fixed as it didn't resume moving (presumably due to above point), when i launched it routed fine through junction.
  • I've three times observed two trains entering the spmutex zone (not immediately after restart), so something isn't right with that. They both claimed to have entered the same named smart mutex region via the same spmutex sign.

@bergerkiller
Copy link
Member

Confirmed the station thing, previous fix must have re-broken that. let me fix that

@bergerkiller
Copy link
Member

bergerkiller commented Jan 19, 2024

traincarts-persistence-test3.zip
This fixes the stations issue. Related to stations is potentially it also forgot skip state of the [train] skip sign, so that should now be fixed also.

Only issue that remains is the spmutex zone problem.

You mentioned not immediately after restart. You mean you rule out that either train was inside the spmutex zone at the time the server restarted? Is this happening very long time after the server has booted, and all the time?

Because if so I have to look at something different than serialization/deserialization of the state

It's also worth checking whether that same issue occurs with https://ci.mg-dev.eu/job/TrainCarts/1537/

@amalon
Copy link
Author

amalon commented Jan 19, 2024

Thanks, station resume & skip appear to be fixed i think.

The spmutex zone problem: Yes, it happens fully after server start. After it happens I destroyed one of the trains, it freed up, then happened again soon after. I just saw it happen a few more times, and think I may have a clue:

  1. train circle and central approaching spmutex from different places
  2. train central enters mutex zone, train circle waits
  3. train central reaches a 0.5second station marked route, and changes direction to enter into the platform
  4. at this point train circle starts moving and enters the platform ahead of train central
  5. deadlock

it doesn't happen every time, I guess it depends on the timing of the wait polling while the train is stopped at the routing station.

(I haven't tried 1537 yet)

@amalon
Copy link
Author

amalon commented Jan 19, 2024

I'll add the following (probably irrelevant, mutex already broken at that point):
4.1. central sees circle ahead and stops
4.2. circle stops at station platform
4.3. central then starts moving behind circle and stops right behind it, blocking it in

@amalon
Copy link
Author

amalon commented Jan 19, 2024

And one more observation: I've also seen when two trains are waiting at an spmutex, the zone becomes available, then one of them moves about a block towards the zone, then stops, and the other one proceeds into the zone instead.

@bergerkiller
Copy link
Member

bergerkiller commented Jan 19, 2024

But are these issues that existed before (so the jenkins build I linked) too, or is that since these changes? I did do a minor rewrite of some logic to integrate "unloaded" trains as part of the mutex logic, and its possible this broke something.

If the jenkins build is fine https://ci.mg-dev.eu/job/TrainCarts/1537/ then I know to review the commits...

which will take a while lol
image

@amalon
Copy link
Author

amalon commented Jan 19, 2024

I thought it was fairly solid previously while the server was running, but I have just reproduced it on 1537 so I guess it must be a pre-existing issue, probably exposed by me increasing the frequency of trains to help reproduce the persistence issue.
I'll try a few more older builds tomorrow and see if I can bisect it.

@bergerkiller
Copy link
Member

if its pre-existing then Ill have to figure out what triggers it. Does this issue exist at all for Smart Mutex signs? (not pathing)

Does a non-smart pathing mutex have the same issue?

@bergerkiller
Copy link
Member

Decided to push the changes as official as it looks stable enough. Besides a minor change of tracking unloaded trains by properties instead of name, there is no difference with the version you tested before.

https://ci.mg-dev.eu/job/TrainCarts/1538/

I don't see these problems with mutex zones myself. If you have time, maybe I can hop on the server again to see for myself what is happening and where.

@amalon
Copy link
Author

amalon commented Jan 21, 2024

I've been trying to come up with a simple test case. This is roughly equivalent:
2024-01-21_13 06 56
2024-01-21_13 07 10
(note, i've since changed top mutex to length 32, bottom to length 50)

The spawner top-left in the hole spawns storage minecarts which route to test_1 (top right), then test_2 (bottom left). The sign is:

[train:left]
spawner 0:20 0.2
s

The spawner bottom-right in the hole spawns normal minecarts which route to test_1 (top right, via bottom left), then test_2 (bottom left). The sign is:

[train:left]
spawner 0:02 0.2
m

The whole thing is a bit sensitive to timing, however, with named smart pathing mutexes there are times when carts both enter the mutex zone from top left and bottom right. /train debug mutex shows the top spmutex only going as far as test_1 (top right) and past the station. The bottom spmutex goes to bottom-left, top right, and similarly straight past the station. I'm guessing the bottom spmutex just doesn't know that the top carts are going to come back again (pathing mutex not routing past destination test_1 sign) so thinks its safe as soon as they've joined the track top-right.

With unnamed smart pathing mutexes, the top carts go so close to the junction top-right that the bottom carts can't get through. Also sometimes the carts just enter the zone even when other carts are present.

With named non-smart pathing mutexes, it kinda works I think (but resulted in excess congestion in the real case).

With unnamed pathing mutex, again carts go way too close to junctions, blocking other trains.
I did also get a case of bottom carts stopping approaching top right junction, claiming to be approaching top mutex (even though they weren't going that way...)

Yeh, feel free to rejoin to see it all. Just let me know when. Thanks

@bergerkiller
Copy link
Member

As it saves me some time, can you /tcc export this track and send the link? Just select all the nodes of this test track and do /tcc export. I can place the spawner signs myself. That way I know for sure its 100% replicated.

@amalon
Copy link
Author

amalon commented Jan 22, 2024

sure, here it is:
https://paste.traincarts.net/ejenajuzut

@bergerkiller
Copy link
Member

There was a sign mistake: at one position it set a destination "test2" but the destination sign says "test_2". Is this something you fixed yourself as well? Idk if it matters

@bergerkiller
Copy link
Member

image
is this what you're seeing too? Storage cart seems to never advance

@amalon
Copy link
Author

amalon commented Jan 22, 2024

whoops yeh that should be test_2, I don't think it makes much difference though. Between screenshot & export I had fiddled about with those signs to change storage carts from a route to just a plain destination, to see if the /train debug mutex would show the path better.

(note the reason I had different spawn rates was to avoid situation where normal minecarts would never progress into spmutex, maybe thats same thing you're seeing).

Trying just now, pretty soon it got stuck with both storage and normal train cart entering the zone... once... so yeh i haven't nailed the reproducibility. I'll have another try later to nail it down.

@bergerkiller
Copy link
Member

bergerkiller commented Jan 26, 2024

I can discover some issues that occur in your setup, some of it related to my issue of one lane never progressing. It's a little difficult to fix but I'm working on it.

@bergerkiller
Copy link
Member

One problem is that the storage carts (in my case) end up losing mutex lock once they drive from the spawn sign to the station. It seems like somehow they exit the mutex too soon. This can be verified by only spawning a single cart there and spamming /train debug mutex - the pathing mutex disappears once it stops on the station

@amalon
Copy link
Author

amalon commented Jan 28, 2024

yes, I've seen that a number of times, usually by the train already in the mutex zone saying that its approaching that same mutex zone since another got in during the gap and says to have already entered it

@bergerkiller
Copy link
Member

Found one bug: https://ci.mg-dev.eu/job/TrainCarts/1540/

However, Im not quite satisfied with the current 'forAllBlocks' logic as it sees all vanilla rails as having two or more actual blocks due to them going from 0.0 to 1.0. So I need to fix that too.

@bergerkiller
Copy link
Member

With this build vanilla track follow proper pathing mutex zone rules again. https://ci.mg-dev.eu/job/TrainCarts/1541/

Please let me know if at least the issue of two trains entering at the same time is fixed.

The issue I found, where it seems one lane is always allowed to progress while the other isn't, isn't fixed yet. I have a fix but as it had a large rewrite of lots of things I want to test that more, first.

@amalon
Copy link
Author

amalon commented Jan 28, 2024

Thanks! I noticed one case of two real trains in the same mutex region, but didn't spot how it happened. I haven't seen other issues despite watching the usual place where it happens with loads of trains spawning for quite a while. I'll keep them spawning and report back if there are still issues.

@amalon
Copy link
Author

amalon commented Jan 29, 2024

Yeh its happening still, basically as before, the trains stop momentarily at a station sign to reverse direction, and apparently release the mutex for long enough for another train to enter the region from a different branch. /train debug mutex still shows the spmutex not bouncing back from the next station in the mutex path, which would seem to explain it, it doesn't know the trains about to change direction since the destination is HoganXP3, and relies on the route to know where to go next. The tcc signs on the station node in question have:

[+train]
destination
HoganXP3
[+train]
station .8m/ss
18
route

and the trains have routes that direct them back out of the station the way they came.
(Changing order of destination & station doesn't help).
I can't reproduce it using the small test case yet.

@bergerkiller
Copy link
Member

bergerkiller commented Jan 29, 2024

The path creation only occurs the moment the train rolls over the pathing mutex sign. Once it has exited, that particular path is locked in and won't change with future rerouting. I guess the only way around this would be to take destination signs into account, but simulating all of that gets far too complex really quickly. Take skip signs and conditional switcher power signs for example.

Would it be possible to set the next destination it goes to before it stops on the station?

It might also help to add another pathing mutex sign near the station maybe? Or just use a traditional mutex zone with smart routing so it covers the entire area regardless of path the train takes.

@amalon
Copy link
Author

amalon commented Jan 29, 2024

ok fair enough. I'm trying to workaround it, but the tricky thing is that the route into the platform is the same as the route out, so there's no destination before the spmutex that would make it route into the platform rather than driving straight past it. I've tried putting a destination after the spmutex, and changing the destination sign with the station to also set the destination to route it out of the mutex zone, but it doesn't seem to make a difference to the zone shown by /train debug mutex. I was going to try explicitly setting destination on the way into the platform and doing spmutex again, but I've noticed that the spmutex state (in /train status) seems to now consistently clear once it hits that first station pause to turn around so I'm not sure that would be safe.

I'll attempt to change it around a bit so that the exit from platform doesn't pass over the same track where the train reverses direction, so that it can avoid a change of destination within the mutex zone...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants