Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault for ODD+Pythia8+Geant4 #1578

Closed
andiwand opened this issue Oct 6, 2022 · 50 comments
Closed

Segfault for ODD+Pythia8+Geant4 #1578

andiwand opened this issue Oct 6, 2022 · 50 comments
Labels
Bug Something isn't working

Comments

@andiwand
Copy link
Contributor

andiwand commented Oct 6, 2022

@benjaminhuth pointed out that ODD+Pythia8+Geant4 will segfault in full chain

I just verified this. See attached files for more information.

segfault.txt
full_chain.txt

@andiwand andiwand added the Bug Something isn't working label Oct 6, 2022
@paulgessinger
Copy link
Member

Hm, could be the two interfere?

@asalzburger
Copy link
Contributor

Is this executed in single thread ?

@asalzburger
Copy link
Contributor

No good numThreads=-1 - this would need Geant4MT.

@benjaminhuth
Copy link
Member

For me this crashes with one thread as well

@asalzburger
Copy link
Contributor

Is that with the same error / segfault ?

@benjaminhuth
Copy link
Member

Hmm it looks a bit different to be honest, but couldn't check in detail for no.
I attache my gdb backtrace, maybe this gives a hint.

backtrace.txt

@benjaminhuth
Copy link
Member

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

@andiwand
Copy link
Contributor Author

andiwand commented Oct 6, 2022

Hm, could be the two interfere?

I think so yes. Pythia8 only works and Geant4 only works.

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

sure will do

@Corentin-Allaire
Copy link
Contributor

I can confirm I just encountered the same issue using the geant4.py example :
### CAUGHT SIGNAL: 11 ### address: 0x7f9417827000, signal = SIGSEGV, value = 11, description = segmentation violation. Address not mapped to object.
I tried updating to the latest G4 version (11.0.3) but it didn't change anything

@Corentin-Allaire
Copy link
Contributor

As we discussed during today's meeting I tried to replace the ODD by the GDLM implementation of Alice_v3 and it ran through with just a few warning.
So the issue is either with the ODD itself or the DDG4DetectorConstruction...

@stale
Copy link

stale bot commented Nov 12, 2022

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

@stale stale bot added the Stale label Nov 12, 2022
@Corentin-Allaire
Copy link
Contributor

I have had an other look at this and I just notice something. If I start removing the support from the ODD xml the segfault happen much later so maybe there is something bad with the support surface definition ?

@stale stale bot removed the Stale label Nov 23, 2022
@stale
Copy link

stale bot commented Dec 24, 2022

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

@stale stale bot added the Stale label Dec 24, 2022
@Corentin-Allaire
Copy link
Contributor

I was checking back this issue out of curiosity and it is still there. Maybe we should try to investigate this again at some point ?

@stale stale bot removed the Stale label Jan 19, 2023
@paulgessinger
Copy link
Member

paulgessinger commented Jan 20, 2023

For sure this is something that we need to fix. Do we have a script to reproduce this?

@benjaminhuth
Copy link
Member

benjaminhuth commented Jan 20, 2023

Okay, I have investigated this a bit and some new infos:

First of all, I enabled some logging facilities in Geant4, which gave me the result that this is caused by photons quite far away from the center in z direction (z is around 1e4):
image

This is reproducible in pythia also with different seeds. Then I also could reproduce the crash with the ParticleGun:

addParticleGun(
    s,
    MomentumConfig(0.1 * u.GeV, 2.0 * u.GeV, transverse=True),
    EtaConfig(-4.0, 4.0, uniform=True),
    ParticleConfig(2, acts.PdgParticle.eGamma),
    vtxGen=acts.examples.GaussianVertexGenerator(
        stddev=acts.Vector4(10 * u.mm, 10 * u.mm, 10 * u.mm, 0.0 * u.ns),
        mean=acts.Vector4(18, 3.78, 1.09e4, 0),
    ),
    multiplicity=100,
    rnd=rnd,
)

I'm not totally sure what to do with these information, but maybe someone has an idea :)

@paulgessinger
Copy link
Member

So it's G4 breaking in a specific region of the detector?

@Corentin-Allaire
Copy link
Contributor

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

@benjaminhuth
Copy link
Member

benjaminhuth commented Jan 20, 2023

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

No I think with the electron in the pixel endcap is everythin fine, the photon below is the problem. There it only loggs the 0th step and then segfaults.

I could imagine that a problem is that it starts already outside of the detector (in the world_volume_1)?

Could it be that the world volume is to small or something like that?

@Corentin-Allaire
Copy link
Contributor

Oh yeah I was looking at the wrong line... But you are right, the world volume size is 10m along z so this photon is outside the DD4Hep detector.

@Corentin-Allaire
Copy link
Contributor

Unfortunately, I don't think this is the only issue :(
I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

@benjaminhuth
Copy link
Member

benjaminhuth commented Jan 20, 2023

Allready merged: #1790
With a new build from main branch you should be able to enable it via setting the logLevel to VERBOSE in the addGeant4 function.

@Corentin-Allaire
Copy link
Contributor

Oh perfect I will have a look next week in more detail then !

@benjaminhuth
Copy link
Member

Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

Actually I was able to run one event in the pythia8+geant4+ODD combination without segfault by increasing the world volumen manually from 10m to 100m in the ODD xml files...

I'm not sure if something like that would be a reasonable fix? Has this any other implications @asalzburger ?

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

@benjaminhuth
Copy link
Member

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

Okay, actually it does not resolve the issue, I still get the segfault in a later event. Maybe it has just changed the random numbers a bit so that 1 event went through.

@Corentin-Allaire
Copy link
Contributor

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

@Corentin-Allaire
Copy link
Contributor

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

If someone wants to have a look : #1792

@Corentin-Allaire
Copy link
Contributor

Corentin-Allaire commented Jan 23, 2023

With this you can cut the particle outside the detector by adding preselectParticles = ParticleSelectorConfig(eta=(-3.0, 3.0),absZ=(0, 1e4), pt=(150 * u.MeV, None), removeNeutral=True), to the addGeant4. Doesn't solve the segfault in the ttbar case (but solve the photon issue).

@Corentin-Allaire
Copy link
Contributor

Actually the code seem to be running on my side and doesn't segfault anymore... Can someone else confirm ?

@andiwand
Copy link
Contributor Author

andiwand commented Jan 24, 2023

@Corentin-Allaire are you using the chain from above? otherwise if you could share the script I can try to verify

@Corentin-Allaire
Copy link
Contributor

@andiwand here is the chain I use :
full_chain_odd.txt

@andiwand
Copy link
Contributor Author

andiwand commented Jan 24, 2023

this is still segfaulting for me on 7a3761d

segfault.txt

@Corentin-Allaire
Copy link
Contributor

I might have changed something else by accident let me check (maybe you can also run it with verbose log of G4 ?)

@benjaminhuth
Copy link
Member

benjaminhuth commented Jan 24, 2023

Actually, for me it worked now at least for two events without segfault (I only applied the z-selection, not the pt or eta ones). Thats quite good news.

However, I got the following interesting warning:
image

Maybe I should add I had a timestamp-based seed, not the usual 42.

@andiwand
Copy link
Contributor Author

did you run geant4 in verbose mode @benjaminhuth ? somehow it runs now for a couple of minutes without crashing

@benjaminhuth
Copy link
Member

nope, at least that last one not.
But I think we must consider the verbose mode to be extreeemly slow due to this huge printouts...

@andiwand
Copy link
Contributor Author

hm for me it is still crashing even with 2 events

this is my geant version:

**************************************************************
 Geant4 version Name: geant4-11-00-patch-01 [MT]   (8-March-2022)
                       Copyright : Geant4 Collaboration
                      References : NIM A 506 (2003), 250-303
                                 : IEEE-TNS 53 (2006), 270-278
                                 : NIM A 835 (2016), 186-225
                             WWW : http://geant4.org/
**************************************************************

but feel free to close the ticket since it works for both of you now

@Corentin-Allaire
Copy link
Contributor

@andiwand did you keep the number of thread to 1 ?

@Corentin-Allaire
Copy link
Contributor

I realised I modified the preselection to cut on x,y and z so I removed that change but it still works on 5 events for now.
I do have a slightly more recent G4 version :

 Geant4 version Name: geant4-11-00-patch-03 [MT]   (16-September-2022)
                       Copyright : Geant4 Collaboration
                      References : NIM A 506 (2003), 250-303
                                 : IEEE-TNS 53 (2006), 270-278
                                 : NIM A 835 (2016), 186-225
                             WWW : http://geant4.org/
**************************************************************```

@andiwand
Copy link
Contributor Author

yeah exactly I did't modify the script. just executed the one you sent here #1578 (comment)

@benjaminhuth
Copy link
Member

So I have geant4-11-00-patch-03 as well, and for me it also works now with 5 events.
Maybe its indeed the geant version?

@andiwand
Copy link
Contributor Author

let me update and check again

@Corentin-Allaire
Copy link
Contributor

I remember reading there was some issue with patch 01 witch is why I updated the first time I run into the odd segfault

@Corentin-Allaire
Copy link
Contributor

Let me also open and MR to solve to let people use G4 with the Odd

@benjaminhuth
Copy link
Member

Actually I think this is a good workaround, though not 100% satisfying...

I just wonder if the change in the full_chain_odd.py is enough to kind of document this for others?

@Corentin-Allaire
Copy link
Contributor

In my opinion this is a pythia issue and not an acts one

@Corentin-Allaire
Copy link
Contributor

I have opened a PR : #1794. I am mentioning this issue in the comment of the code if people want to understand the full story

@andiwand
Copy link
Contributor Author

I recompiled G4 with 11.1.0 and it seems not to crash anymore

@Corentin-Allaire
Copy link
Contributor

I am in the process of running on 100 events just to check that we don't get particles with X or Y > 10m

kodiakhq bot pushed a commit that referenced this issue Jan 24, 2023
This add the option to use the G4 simulation with the Odd getting around the problem noticed in issue #1578.
Incidently it also fix a bug while using the odd full chain with only csv writting
@Corentin-Allaire
Copy link
Contributor

I had the simulation run for 100 ttbar events and no issue occurred. I think we can close this one then !
As a summary in case someone comes back here :

  • Pythia sometime generate particles outside the detector world volume (in particular in Z), using a ParticleSelector can resolve the issue. If the issue was to come back, we might need to extend the ParticleSelector to also cut in X and Y.
  • G4 simulation cannot be run in multithreaded mode in Acts, doing so will result in crashes.
  • G4 version anterior to geant4-11-00-patch-03 might also result is segfault when running the simulation (has been shown for version geant4-11-00-patch-01 at least). In case of crashed in G4 try to upgrade to the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants