Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smsc/XPMEM region map fails due to padding/alignment and/or inaccurate max address #10121

Closed
gkatev opened this issue Mar 15, 2022 · 8 comments
Closed
Milestone

Comments

@gkatev
Copy link
Contributor

gkatev commented Mar 15, 2022

Hi, I observed that some of my SMSC map_peer_region calls were failing. It appears that the 8 MB padding/alignment added to XPMEM attachments, in conjunction with an inaccurately(?) set address_max, brings the attachment range over the limit.

Adding these debug prints:

diff --git a/opal/mca/smsc/xpmem/smsc_xpmem_module.c b/opal/mca/smsc/xpmem/smsc_xpmem_module.c
index d2954c1..ed7db6a 100644
--- a/opal/mca/smsc/xpmem/smsc_xpmem_module.c
+++ b/opal/mca/smsc/xpmem/smsc_xpmem_module.c
@@ -106,6 +106,8 @@ static int mca_smsc_xpmem_check_reg(mca_rcache_base_registration_t *reg, void *c
 void *mca_smsc_xpmem_map_peer_region(mca_smsc_endpoint_t *endpoint, uint64_t flags,
                                      void *remote_ptr, size_t size, void **local_ptr)
 {
+    printf("mca_smsc_xpmem_map_peer_region(%p, %zu)\n", remote_ptr, size);
+    
     mca_smsc_xpmem_endpoint_t *xpmem_endpoint = (mca_smsc_xpmem_endpoint_t *) endpoint;
     mca_rcache_base_vma_module_t *vma_module = mca_smsc_xpmem_module.vma_module;
     uint64_t attach_align = 1 << mca_smsc_xpmem_component.log_attach_align;
@@ -156,8 +158,29 @@ void *mca_smsc_xpmem_map_peer_region(mca_smsc_endpoint_t *endpoint, uint64_t fla
                                 endpoint, reg->base, reg->bound);
 
             reg->rcache_context = xpmem_attach(xpmem_addr, bound - base, NULL);
+            
+            printf("xpmem_attach({%p, %d}, %zu, NULL) = %p\n",
+                xpmem_addr.offset, xpmem_addr.apid, bound - base, reg->rcache_context);
+            
             if (OPAL_UNLIKELY((void *) -1 == reg->rcache_context)) {
+                void *sc = xpmem_attach(xpmem_addr, 4096, NULL);
+                printf("xpmem_attach({%p, %d}, %zu, NULL) = %p\n",
+                    xpmem_addr.offset, xpmem_addr.apid, 4096, sc);
+                printf("endpoint max address %p, dumping maps\n",
+                    xpmem_endpoint->address_max);
+                
+                FILE *fh = fopen("/proc/self/maps", "r");
+                assert(fh);
+                
+                char buffer[1024];
+                while (fgets(buffer, sizeof(buffer), fh)) {
+                    printf(buffer);
+                }
+
+                fclose(fh);
+                
                 OBJ_RELEASE(reg);
+                
                 return NULL;
             }

Generates this output:

mca_smsc_xpmem_map_peer_region(0x7fffffece750, 8)
xpmem_attach({0x7fffff800000, 2641}, 8388609, NULL) = 0xffffffffffffffff
xpmem_attach({0x7fffff800000, 2641}, 4096, NULL) = 0x7fe1645dd000
endpoint max address 0xffffffffff600fff, dumping maps

The attachment with size 8388609 ie. 0x800001 (btw, why the +1?) fails, whereas the 4096 one succeeds.
0x7fffff800000 + 0x800001 = 0x800000000001

If I'm interpreting the memory maps below correctly, address_max is set (incorrectly/inaccurately?) to 0xffffffffff600fff according to [vsyscall]. The area for which the map_peer_region call is made is part of the stack.

The stack here appears to be at 7ffdd10be000-7ffdd10e0000, but this may not actually be relevant as this the local process's map and not the remote one's? I can only imagine if I actually got the map of the remote process 0x800000000001, would be outside the range, while 0x7fffff800000 + 0x1000 = 0x7fffff801000 would be within it.

/proc/self/maps content (of process calling xpmem_attach):

00400000-00402000 r--p 00000000 00:30 52367028                           /home1/public/gkatev/osu-micro-benchmarks/mpi/collective/osu_reduce_integrity
00402000-00408000 r-xp 00002000 00:30 52367028                           /home1/public/gkatev/osu-micro-benchmarks/mpi/collective/osu_reduce_integrity
00408000-0040c000 r--p 00008000 00:30 52367028                           /home1/public/gkatev/osu-micro-benchmarks/mpi/collective/osu_reduce_integrity
0040c000-0040d000 r--p 0000b000 00:30 52367028                           /home1/public/gkatev/osu-micro-benchmarks/mpi/collective/osu_reduce_integrity
0040d000-0040e000 rw-p 0000c000 00:30 52367028                           /home1/public/gkatev/osu-micro-benchmarks/mpi/collective/osu_reduce_integrity
0040e000-0041c000 rw-p 00000000 00:00 0 
01e6a000-0209c000 rw-p 00000000 00:00 0                                  [heap]
7fe11d7f9000-7fe11dffa000 rw-s 00000000 00:00 0 
7fe11dffa000-7fe11e7fb000 rw-s 00000000 00:00 0 
7fe11e7fb000-7fe11effc000 rw-s 00000000 00:00 0 
7fe11effc000-7fe11f7fd000 rw-s 00000000 00:00 0 
7fe11f7fd000-7fe11fffe000 rw-s 00000000 00:00 0 
7fe11fffe000-7fe1207ff000 rw-s 00000000 00:00 0 
7fe1207ff000-7fe121000000 rw-s 00000000 00:00 0 
7fe121000000-7fe122000000 rw-s 00000000 00:17 1061540                    /dev/shm/sm_segment.taos.50914.ba300001.63
7fe122000000-7fe123000000 rw-s 00000000 00:17 1061570                    /dev/shm/sm_segment.taos.50914.ba300001.62
7fe123000000-7fe124000000 rw-s 00000000 00:17 1061550                    /dev/shm/sm_segment.taos.50914.ba300001.61
7fe124000000-7fe125000000 rw-s 00000000 00:17 1061559                    /dev/shm/sm_segment.taos.50914.ba300001.60
7fe125000000-7fe126000000 rw-s 00000000 00:17 1061549                    /dev/shm/sm_segment.taos.50914.ba300001.59
7fe126000000-7fe127000000 rw-s 00000000 00:17 1061534                    /dev/shm/sm_segment.taos.50914.ba300001.58
7fe127000000-7fe128000000 rw-s 00000000 00:17 1061560                    /dev/shm/sm_segment.taos.50914.ba300001.57
7fe128000000-7fe129000000 rw-s 00000000 00:17 1061510                    /dev/shm/sm_segment.taos.50914.ba300001.56
7fe129000000-7fe12a000000 rw-s 00000000 00:17 1061551                    /dev/shm/sm_segment.taos.50914.ba300001.55
7fe12a000000-7fe12b000000 rw-s 00000000 00:17 1061552                    /dev/shm/sm_segment.taos.50914.ba300001.54
7fe12b000000-7fe12c000000 rw-s 00000000 00:17 1061553                    /dev/shm/sm_segment.taos.50914.ba300001.53
7fe12c000000-7fe12d000000 rw-s 00000000 00:17 1061558                    /dev/shm/sm_segment.taos.50914.ba300001.52
7fe12d000000-7fe12e000000 rw-s 00000000 00:17 1061566                    /dev/shm/sm_segment.taos.50914.ba300001.51
7fe12e000000-7fe12f000000 rw-s 00000000 00:17 1061565                    /dev/shm/sm_segment.taos.50914.ba300001.50
7fe12f000000-7fe130000000 rw-s 00000000 00:17 1061572                    /dev/shm/sm_segment.taos.50914.ba300001.49
7fe130000000-7fe131000000 rw-s 00000000 00:17 1061555                    /dev/shm/sm_segment.taos.50914.ba300001.48
7fe131000000-7fe132000000 rw-s 00000000 00:17 1061573                    /dev/shm/sm_segment.taos.50914.ba300001.47
7fe132000000-7fe133000000 rw-s 00000000 00:17 1061601                    /dev/shm/sm_segment.taos.50914.ba300001.46
7fe133000000-7fe134000000 rw-s 00000000 00:17 1061564                    /dev/shm/sm_segment.taos.50914.ba300001.45
7fe134000000-7fe135000000 rw-s 00000000 00:17 1061568                    /dev/shm/sm_segment.taos.50914.ba300001.44
7fe135000000-7fe136000000 rw-s 00000000 00:17 1061605                    /dev/shm/sm_segment.taos.50914.ba300001.43
7fe136000000-7fe137000000 rw-s 00000000 00:17 1061602                    /dev/shm/sm_segment.taos.50914.ba300001.42
7fe137000000-7fe138000000 rw-s 00000000 00:17 1061604                    /dev/shm/sm_segment.taos.50914.ba300001.41
7fe138000000-7fe139000000 rw-s 00000000 00:17 1061603                    /dev/shm/sm_segment.taos.50914.ba300001.40
7fe139000000-7fe13a000000 rw-s 00000000 00:17 1061607                    /dev/shm/sm_segment.taos.50914.ba300001.39
7fe13a000000-7fe13b000000 rw-s 00000000 00:17 1061608                    /dev/shm/sm_segment.taos.50914.ba300001.38
7fe13b000000-7fe13c000000 rw-s 00000000 00:17 1061609                    /dev/shm/sm_segment.taos.50914.ba300001.37
7fe13c000000-7fe13d000000 rw-s 00000000 00:17 1061606                    /dev/shm/sm_segment.taos.50914.ba300001.36
7fe13d000000-7fe13e000000 rw-s 00000000 00:17 1061597                    /dev/shm/sm_segment.taos.50914.ba300001.35
7fe13e000000-7fe13f000000 rw-s 00000000 00:17 1061596                    /dev/shm/sm_segment.taos.50914.ba300001.34
7fe13f000000-7fe140000000 rw-s 00000000 00:17 1061600                    /dev/shm/sm_segment.taos.50914.ba300001.33
7fe140000000-7fe141000000 rw-s 00000000 00:17 1061598                    /dev/shm/sm_segment.taos.50914.ba300001.32
7fe141000000-7fe142000000 rw-s 00000000 00:17 1061599                    /dev/shm/sm_segment.taos.50914.ba300001.31
7fe142000000-7fe143000000 rw-s 00000000 00:17 1061595                    /dev/shm/sm_segment.taos.50914.ba300001.30
7fe143000000-7fe144000000 rw-s 00000000 00:17 1061594                    /dev/shm/sm_segment.taos.50914.ba300001.29
7fe144000000-7fe145000000 rw-s 00000000 00:17 1061588                    /dev/shm/sm_segment.taos.50914.ba300001.28
7fe145000000-7fe146000000 rw-s 00000000 00:17 1061592                    /dev/shm/sm_segment.taos.50914.ba300001.27
7fe146000000-7fe147000000 rw-s 00000000 00:17 1061590                    /dev/shm/sm_segment.taos.50914.ba300001.26
7fe147000000-7fe148000000 rw-s 00000000 00:17 1061591                    /dev/shm/sm_segment.taos.50914.ba300001.25
7fe148000000-7fe149000000 rw-s 00000000 00:17 1061593                    /dev/shm/sm_segment.taos.50914.ba300001.24
7fe149000000-7fe14a000000 rw-s 00000000 00:17 1061569                    /dev/shm/sm_segment.taos.50914.ba300001.23
7fe14a000000-7fe14b000000 rw-s 00000000 00:17 1061579                    /dev/shm/sm_segment.taos.50914.ba300001.22
7fe14b000000-7fe14c000000 rw-s 00000000 00:17 1061587                    /dev/shm/sm_segment.taos.50914.ba300001.21
7fe14c000000-7fe14d000000 rw-s 00000000 00:17 1061586                    /dev/shm/sm_segment.taos.50914.ba300001.20
7fe14d000000-7fe14e000000 rw-s 00000000 00:17 1061582                    /dev/shm/sm_segment.taos.50914.ba300001.19
7fe14e000000-7fe14f000000 rw-s 00000000 00:17 1061584                    /dev/shm/sm_segment.taos.50914.ba300001.18
7fe14f000000-7fe150000000 rw-s 00000000 00:17 1061563                    /dev/shm/sm_segment.taos.50914.ba300001.17
7fe150000000-7fe151000000 rw-s 00000000 00:17 1061585                    /dev/shm/sm_segment.taos.50914.ba300001.16
7fe151000000-7fe152000000 rw-s 00000000 00:17 1061576                    /dev/shm/sm_segment.taos.50914.ba300001.15
7fe152000000-7fe153000000 rw-s 00000000 00:17 1061567                    /dev/shm/sm_segment.taos.50914.ba300001.14
7fe153000000-7fe154000000 rw-s 00000000 00:17 1061580                    /dev/shm/sm_segment.taos.50914.ba300001.13
7fe154000000-7fe155000000 rw-s 00000000 00:17 1061581                    /dev/shm/sm_segment.taos.50914.ba300001.12
7fe155000000-7fe156000000 rw-s 00000000 00:17 1061589                    /dev/shm/sm_segment.taos.50914.ba300001.11
7fe156000000-7fe157000000 rw-s 00000000 00:17 1061575                    /dev/shm/sm_segment.taos.50914.ba300001.10
7fe157000000-7fe158000000 rw-s 00000000 00:17 1061578                    /dev/shm/sm_segment.taos.50914.ba300001.9
7fe158000000-7fe159000000 rw-s 00000000 00:17 1061562                    /dev/shm/sm_segment.taos.50914.ba300001.8
7fe159000000-7fe15a000000 rw-s 00000000 00:17 1061577                    /dev/shm/sm_segment.taos.50914.ba300001.7
7fe15a000000-7fe15b000000 rw-s 00000000 00:17 1061571                    /dev/shm/sm_segment.taos.50914.ba300001.6
7fe15b000000-7fe15c000000 rw-s 00000000 00:17 1061574                    /dev/shm/sm_segment.taos.50914.ba300001.5
7fe15c000000-7fe15d000000 rw-s 00000000 00:17 1061561                    /dev/shm/sm_segment.taos.50914.ba300001.4
7fe15d000000-7fe15e000000 rw-s 00000000 00:17 1061556                    /dev/shm/sm_segment.taos.50914.ba300001.3
7fe15e000000-7fe15f000000 rw-s 00000000 00:17 1061557                    /dev/shm/sm_segment.taos.50914.ba300001.2
7fe15f000000-7fe160000000 rw-s 00000000 00:17 1061583                    /dev/shm/sm_segment.taos.50914.ba300001.1
7fe160000000-7fe160021000 rw-p 00000000 00:00 0 
7fe160021000-7fe164000000 ---p 00000000 00:00 0 
7fe16459a000-7fe16459d000 rw-s 00000000 00:17 1061610                    /dev/shm/[email protected]:0_ctrl:0
7fe1645dc000-7fe1645dd000 rw-p 00000000 00:00 0 
7fe1645dd000-7fe1645de000 rw-s 00000000 00:00 0 
7fe1645de000-7fe164642000 rw-p 00000000 00:00 0 
7fe164642000-7fe164647000 r--p 00000000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164647000-7fe164661000 r-xp 00005000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164661000-7fe164666000 r--p 0001f000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164666000-7fe164667000 ---p 00024000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164667000-7fe164668000 r--p 00024000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164668000-7fe164669000 rw-p 00025000 00:30 85597706                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_libnbc.so
7fe164669000-7fe16466e000 r--p 00000000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe16466e000-7fe164675000 r-xp 00005000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe164675000-7fe16467a000 r--p 0000c000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe16467a000-7fe16467b000 ---p 00011000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe16467b000-7fe16467c000 r--p 00011000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe16467c000-7fe16467d000 rw-p 00012000 00:30 85597716                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_tuned.so
7fe16467d000-7fe164680000 r--p 00000000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164680000-7fe164690000 r-xp 00003000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164690000-7fe164692000 r--p 00013000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164692000-7fe164693000 ---p 00015000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164693000-7fe164694000 r--p 00015000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164694000-7fe164695000 rw-p 00016000 00:30 85597702                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_basic.so
7fe164695000-7fe164817000 rw-p 00000000 00:00 0 
7fe164817000-7fe165817000 rw-s 00000000 00:17 1061554                    /dev/shm/sm_segment.taos.50914.ba300001.0
7fe165817000-7fe165c17000 r--s 00000000 08:05 537435549                  /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds21_2624/smdataseg-prterun-taos-2624@1-0
7fe165c17000-7fe166017000 r--s 00000000 08:05 537435548                  /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds21_2624/smseg-prterun-taos-2624@1-0
7fe166017000-7fe166018000 ---p 00000000 00:00 0 
7fe166018000-7fe16683f000 rw-p 00000000 00:00 0 
7fe16683f000-7fe166848000 r-xp 00000000 08:05 537703981                  /usr/lib64/libltdl.so.7.3.0
7fe166848000-7fe166a47000 ---p 00009000 08:05 537703981                  /usr/lib64/libltdl.so.7.3.0
7fe166a47000-7fe166a48000 r--p 00008000 08:05 537703981                  /usr/lib64/libltdl.so.7.3.0
7fe166a48000-7fe166a49000 rw-p 00009000 08:05 537703981                  /usr/lib64/libltdl.so.7.3.0
7fe166a49000-7fe166a4f000 r-xp 00000000 08:05 537839663                  /usr/lib64/libatomic.so.1.0.0
7fe166a4f000-7fe166c4e000 ---p 00006000 08:05 537839663                  /usr/lib64/libatomic.so.1.0.0
7fe166c4e000-7fe166c4f000 r--p 00005000 08:05 537839663                  /usr/lib64/libatomic.so.1.0.0
7fe166c4f000-7fe166c50000 rw-p 00006000 08:05 537839663                  /usr/lib64/libatomic.so.1.0.0
7fe166c50000-7fe166c51000 rw-p 00000000 00:00 0 
7fe166c51000-7fe166c53000 r-xp 00000000 08:05 536878494                  /usr/lib64/libutil-2.17.so
7fe166c53000-7fe166e52000 ---p 00002000 08:05 536878494                  /usr/lib64/libutil-2.17.so
7fe166e52000-7fe166e53000 r--p 00001000 08:05 536878494                  /usr/lib64/libutil-2.17.so
7fe166e53000-7fe166e54000 rw-p 00002000 08:05 536878494                  /usr/lib64/libutil-2.17.so
7fe166e54000-7fe166e5b000 r-xp 00000000 08:05 536878490                  /usr/lib64/librt-2.17.so
7fe166e5b000-7fe16705a000 ---p 00007000 08:05 536878490                  /usr/lib64/librt-2.17.so
7fe16705a000-7fe16705b000 r--p 00006000 08:05 536878490                  /usr/lib64/librt-2.17.so
7fe16705b000-7fe16705c000 rw-p 00007000 08:05 536878490                  /usr/lib64/librt-2.17.so
7fe16705c000-7fe16705e000 r-xp 00000000 08:05 536878466                  /usr/lib64/libdl-2.17.so
7fe16705e000-7fe16725e000 ---p 00002000 08:05 536878466                  /usr/lib64/libdl-2.17.so
7fe16725e000-7fe16725f000 r--p 00002000 08:05 536878466                  /usr/lib64/libdl-2.17.so
7fe16725f000-7fe167260000 rw-p 00003000 08:05 536878466                  /usr/lib64/libdl-2.17.so
7fe167260000-7fe16729c000 r-xp 00000000 08:05 538013163                  /usr/lib64/libhwloc.so.5.7.5
7fe16729c000-7fe16749b000 ---p 0003c000 08:05 538013163                  /usr/lib64/libhwloc.so.5.7.5
7fe16749b000-7fe16749c000 r--p 0003b000 08:05 538013163                  /usr/lib64/libhwloc.so.5.7.5
7fe16749c000-7fe16749d000 rw-p 0003c000 08:05 538013163                  /usr/lib64/libhwloc.so.5.7.5
7fe16749d000-7fe16749e000 r--p 00000000 00:30 85596391                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_pthreads-2.1.so.7.0.1
7fe16749e000-7fe16749f000 r-xp 00001000 00:30 85596391                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_pthreads-2.1.so.7.0.1
7fe16749f000-7fe1674a0000 r--p 00002000 00:30 85596391                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_pthreads-2.1.so.7.0.1
7fe1674a0000-7fe1674a1000 r--p 00002000 00:30 85596391                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_pthreads-2.1.so.7.0.1
7fe1674a1000-7fe1674a2000 rw-p 00003000 00:30 85596391                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_pthreads-2.1.so.7.0.1
7fe1674a2000-7fe1674ac000 r--p 00000000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674ac000-7fe1674cc000 r-xp 0000a000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674cc000-7fe1674d8000 r--p 0002a000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674d8000-7fe1674d9000 ---p 00036000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674d9000-7fe1674da000 r--p 00036000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674da000-7fe1674db000 rw-p 00037000 00:30 85596350                   /home1/public/gkatev/openmpi-5.0.x/lib/libevent_core-2.1.so.7.0.1
7fe1674db000-7fe1674dc000 rw-p 00000000 00:00 0 
7fe1674dc000-7fe1674f1000 r-xp 00000000 08:05 536879245                  /usr/lib64/libz.so.1.2.7
7fe1674f1000-7fe1676f0000 ---p 00015000 08:05 536879245                  /usr/lib64/libz.so.1.2.7
7fe1676f0000-7fe1676f1000 r--p 00014000 08:05 536879245                  /usr/lib64/libz.so.1.2.7
7fe1676f1000-7fe1676f2000 rw-p 00015000 08:05 536879245                  /usr/lib64/libz.so.1.2.7
7fe1676f2000-7fe16772e000 r--p 00000000 00:30 85596547                   /home1/public/gkatev/openmpi-5.0.x/lib/libpmix.so.2.5.2
7fe16772e000-7fe16789c000 r-xp 0003c000 00:30 85596547                   /home1/public/gkatev/openmpi-5.0.x/lib/libpmix.so.2.5.2
7fe16789c000-7fe1678e7000 r--p 001aa000 00:30 85596547                   /home1/public/gkatev/openmpi-5.0.x/lib/libpmix.so.2.5.2
7fe1678e7000-7fe1678ea000 r--p 001f4000 00:30 85596547                   /home1/public/gkatev/openmpi-5.0.x/lib/libpmix.so.2.5.2
7fe1678ea000-7fe1678ff000 rw-p 001f7000 00:30 85596547                   /home1/public/gkatev/openmpi-5.0.x/lib/libpmix.so.2.5.2
7fe1678ff000-7fe167902000 rw-p 00000000 00:00 0 
7fe167902000-7fe167903000 r-xp 00000000 00:30 32558782                   /home1/public/gkatev/xpmem/lib/libxpmem.so.0.0.0
7fe167903000-7fe167b03000 ---p 00001000 00:30 32558782                   /home1/public/gkatev/xpmem/lib/libxpmem.so.0.0.0
7fe167b03000-7fe167b04000 r--p 00001000 00:30 32558782                   /home1/public/gkatev/xpmem/lib/libxpmem.so.0.0.0
7fe167b04000-7fe167b05000 rw-p 00002000 00:30 32558782                   /home1/public/gkatev/xpmem/lib/libxpmem.so.0.0.0
7fe167b05000-7fe167b0f000 r-xp 00000000 08:05 536879334                  /usr/lib64/libnuma.so.1.0.0
7fe167b0f000-7fe167d0f000 ---p 0000a000 08:05 536879334                  /usr/lib64/libnuma.so.1.0.0
7fe167d0f000-7fe167d10000 r--p 0000a000 08:05 536879334                  /usr/lib64/libnuma.so.1.0.0
7fe167d10000-7fe167d11000 rw-p 0000b000 08:05 536879334                  /usr/lib64/libnuma.so.1.0.0
7fe167d11000-7fe167d22000 r--p 00000000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d22000-7fe167d53000 r-xp 00011000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d53000-7fe167d67000 r--p 00042000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d67000-7fe167d68000 ---p 00056000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d68000-7fe167d69000 r--p 00056000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d69000-7fe167d6c000 rw-p 00057000 00:30 84156802                   /home1/public/gkatev/ucx/lib/libucs.so.0.0.0
7fe167d6c000-7fe167d70000 rw-p 00000000 00:00 0 
7fe167d70000-7fe167d80000 r--p 00000000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167d80000-7fe167d98000 r-xp 00010000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167d98000-7fe167da5000 r--p 00028000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167da5000-7fe167da6000 ---p 00035000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167da6000-7fe167da7000 r--p 00035000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167da7000-7fe167dab000 rw-p 00036000 00:30 84162752                   /home1/public/gkatev/ucx/lib/libuct.so.0.0.0
7fe167dab000-7fe167ea0000 r-xp 00000000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167ea0000-7fe167ea1000 r-xp 000f5000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167ea1000-7fe167ea3000 r-xp 000f6000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167ea3000-7fe167ea5000 r-xp 000f8000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167ea5000-7fe167eaa000 r-xp 000fa000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167eaa000-7fe167eac000 r-xp 000ff000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167eac000-7fe167f6f000 r-xp 00101000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe167f6f000-7fe16816e000 ---p 001c4000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe16816e000-7fe168172000 r--p 001c3000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe168172000-7fe168174000 rw-p 001c7000 08:05 536878460                  /usr/lib64/libc-2.17.so
7fe168174000-7fe168179000 rw-p 00000000 00:00 0 
7fe168179000-7fe168190000 r-xp 00000000 08:05 536878486                  /usr/lib64/libpthread-2.17.so
7fe168190000-7fe16838f000 ---p 00017000 08:05 536878486                  /usr/lib64/libpthread-2.17.so
7fe16838f000-7fe168390000 r--p 00016000 08:05 536878486                  /usr/lib64/libpthread-2.17.so
7fe168390000-7fe168391000 rw-p 00017000 08:05 536878486                  /usr/lib64/libpthread-2.17.so
7fe168391000-7fe168395000 rw-p 00000000 00:00 0 
7fe168395000-7fe1683aa000 r-xp 00000000 08:05 541686087                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fe1683aa000-7fe1685a9000 ---p 00015000 08:05 541686087                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fe1685a9000-7fe1685aa000 r--p 00014000 08:05 541686087                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fe1685aa000-7fe1685ab000 rw-p 00015000 08:05 541686087                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fe1685ab000-7fe1686ac000 r-xp 00000000 08:05 536878468                  /usr/lib64/libm-2.17.so
7fe1686ac000-7fe1688ab000 ---p 00101000 08:05 536878468                  /usr/lib64/libm-2.17.so
7fe1688ab000-7fe1688ac000 r--p 00100000 08:05 536878468                  /usr/lib64/libm-2.17.so
7fe1688ac000-7fe1688ad000 rw-p 00101000 08:05 536878468                  /usr/lib64/libm-2.17.so
7fe1688ad000-7fe168996000 r-xp 00000000 08:05 541686091                  /usr/lib64/libstdc++.so.6.0.19
7fe168996000-7fe168b96000 ---p 000e9000 08:05 541686091                  /usr/lib64/libstdc++.so.6.0.19
7fe168b96000-7fe168b9e000 r--p 000e9000 08:05 541686091                  /usr/lib64/libstdc++.so.6.0.19
7fe168b9e000-7fe168ba0000 rw-p 000f1000 08:05 541686091                  /usr/lib64/libstdc++.so.6.0.19
7fe168ba0000-7fe168bb5000 rw-p 00000000 00:00 0 
7fe168bb5000-7fe168bfd000 r--p 00000000 00:30 85597692                   /home1/public/gkatev/openmpi-5.0.x/lib/libmpi.so.80.0.0
7fe168bfd000-7fe168dc8000 r-xp 00048000 00:30 85597692                   /home1/public/gkatev/openmpi-5.0.x/lib/libmpi.so.80.0.0
7fe168dc8000-7fe168e1d000 r--p 00213000 00:30 85597692                   /home1/public/gkatev/openmpi-5.0.x/lib/libmpi.so.80.0.0
7fe168e1d000-7fe168e1f000 r--p 00267000 00:30 85597692                   /home1/public/gkatev/openmpi-5.0.x/lib/libmpi.so.80.0.0
7fe168e1f000-7fe168e41000 rw-p 00269000 00:30 85597692                   /home1/public/gkatev/openmpi-5.0.x/lib/libmpi.so.80.0.0
7fe168e41000-7fe168e60000 rw-p 00000000 00:00 0 
7fe168e60000-7fe168e82000 r-xp 00000000 08:05 536878453                  /usr/lib64/ld-2.17.so
7fe168e82000-7fe168e88000 rw-p 00000000 00:00 0 
7fe168e88000-7fe168e8d000 r--p 00000000 00:30 84156790                   /home1/public/gkatev/ucx/lib/libucm.so.0.0.0
7fe168e8d000-7fe168e9d000 r-xp 00005000 00:30 84156790                   /home1/public/gkatev/ucx/lib/libucm.so.0.0.0
7fe168e9d000-7fe168ea2000 r--p 00015000 00:30 84156790                   /home1/public/gkatev/ucx/lib/libucm.so.0.0.0
7fe168ea2000-7fe168ea3000 r--p 00019000 00:30 84156790                   /home1/public/gkatev/ucx/lib/libucm.so.0.0.0
7fe168ea3000-7fe168ea4000 rw-p 0001a000 00:30 84156790                   /home1/public/gkatev/ucx/lib/libucm.so.0.0.0
7fe168ea4000-7fe168ea6000 rw-p 00000000 00:00 0 
7fe168ea6000-7fe168ebb000 r--p 00000000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168ebb000-7fe168f35000 r-xp 00015000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168f35000-7fe168f58000 r--p 0008f000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168f58000-7fe168f59000 ---p 000b2000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168f59000-7fe168f5a000 r--p 000b2000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168f5a000-7fe168f5f000 rw-p 000b3000 00:30 84162756                   /home1/public/gkatev/ucx/lib/libucp.so.0.0.0
7fe168f5f000-7fe168f81000 r--p 00000000 00:30 85592142                   /home1/public/gkatev/openmpi-5.0.x/lib/libopen-pal.so.80.0.0
7fe168f81000-7fe169033000 r-xp 00022000 00:30 85592142                   /home1/public/gkatev/openmpi-5.0.x/lib/libopen-pal.so.80.0.0
7fe169033000-7fe169058000 r--p 000d4000 00:30 85592142                   /home1/public/gkatev/openmpi-5.0.x/lib/libopen-pal.so.80.0.0
7fe169058000-7fe16905b000 r--p 000f8000 00:30 85592142                   /home1/public/gkatev/openmpi-5.0.x/lib/libopen-pal.so.80.0.0
7fe16905b000-7fe169066000 rw-p 000fb000 00:30 85592142                   /home1/public/gkatev/openmpi-5.0.x/lib/libopen-pal.so.80.0.0
7fe169066000-7fe16906d000 rw-p 00000000 00:00 0 
7fe16906d000-7fe16906f000 r--p 00000000 00:30 85592120                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_xhc.so
7fe16906f000-7fe169075000 r-xp 00002000 00:30 85592120                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_xhc.so
7fe169075000-7fe169077000 r--p 00008000 00:30 85592120                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_xhc.so
7fe169077000-7fe169078000 r--p 00009000 00:30 85592120                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_xhc.so
7fe169078000-7fe169079000 rw-p 0000a000 00:30 85592120                   /home1/public/gkatev/openmpi-5.0.x/lib/openmpi/mca_coll_xhc.so
7fe169079000-7fe16907a000 r--s 00000000 08:05 537435546                  /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds21_2624/initial-pmix_shared-segment-0
7fe16907a000-7fe16907d000 rw-s 00000000 08:05 537435547                  /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds21_2624/smlockseg-prterun-taos-2624@1
7fe16907d000-7fe16907e000 r--s 00000000 08:05 4127870                    /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds12_2624/initial-pmix_shared-segment-0
7fe16907e000-7fe16907f000 rw-s 00000000 08:05 4127871                    /tmp/prte.taos.50914/dvm.2624/pmix_dstor_ds12_2624/dstore_sm.lock
7fe16907f000-7fe169081000 rw-p 00000000 00:00 0 
7fe169081000-7fe169082000 r--p 00021000 08:05 536878453                  /usr/lib64/ld-2.17.so
7fe169082000-7fe169083000 rw-p 00022000 08:05 536878453                  /usr/lib64/ld-2.17.so
7fe169083000-7fe169084000 rw-p 00000000 00:00 0 
7ffdd10be000-7ffdd10e0000 rw-p 00000000 00:00 0                          [stack]
7ffdd1147000-7ffdd114b000 r--p 00000000 00:00 0                          [vvar]
7ffdd114b000-7ffdd114d000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
@jsquyres jsquyres added this to the v5.0.0 milestone Mar 15, 2022
@devreal
Copy link
Contributor

devreal commented Mar 15, 2022

I am fighting an issue in smsc as well. I believe the +1 here is incorrect. Removing it doesn't change the outcome though.

@devreal
Copy link
Contributor

devreal commented Mar 15, 2022

@gkatev Do your attachments succeed if you choose a smaller (4K) alignment via --mca smsc_xpmem_log_align 12?

@gkatev
Copy link
Contributor Author

gkatev commented Mar 15, 2022

I'm havin a harder time reproducing it now, but given that in my debug code, where every time the 8MB-aligned attachment failed the 4KB one did succeed, I would guesstimate that it would not appear with a smaller alignment.

Yes, it also seemed to me that that +1 is redundant. Perhaps also the -1?

What issue are you seeing with smsc?

Edit: Was able to reproduce it again with align=23, did not trigger align=12.

@devreal
Copy link
Contributor

devreal commented Mar 15, 2022

OK, I think I am hitting the same issue. Will work on a patch. Am I right assuming that we shouldn't attempt to include the [vsyscall] area in a mapping?

@gkatev
Copy link
Contributor Author

gkatev commented Mar 16, 2022

Yes it seems like a good solution for this case. (assuming that there's not reason to attach to another process's [vsyscall]?)

Maybe in order to cover all cases, one would have to take into account also the gaps in the remote process's memory map. Eg. in the example above, could the padding cause an attachment to exceed the heap's limit? But I guess that creating and keeping up to date such a structure is a more complicated endeavour without that much of a benefit.

The neatest way to handle all cases would probably be to have the alignment be equal to the page size, seeing as it is the implicit padding that triggers the issue -- if the remote process communicated an address for another to attach to, it would/should be valid to attach to the (entire) page containing it.

As I understand the padding is an optimization to proactively create an attachment to a larger area (?), as it might happen to contain a buffer that will be used in future communication. So, removing the padding might not be the optimal solution, as it might cause a performance loss in specific applications (but, I think, actually not because of the extra attach call, but rather in the case that a merge between two regions is triggered, and the existing one has to be detached and re-built).

While we are at it, I would also suggest setting to minimum value for the alignment parameter not to 4K, but to the actual page size (eg. on an ARM system of mine it is 64K), as AFAIK xpmem will always align to the page.

A middle-ground defensive solution to all this (along with not considering [vsyscall] for address_max?) could be to fall-back to a min-alignment-sized attachment, whenever a larger padded one fails. (and only failing and returning an error if that fails as well)

@devreal
Copy link
Contributor

devreal commented Mar 16, 2022

@gkatev I created #10127. Can you give that a try? I believe we should try to use the requested upper bound if the aligned upper bound cannot be mapped. That worked for in my tests with variables on the stack.

@gkatev
Copy link
Contributor Author

gkatev commented Mar 16, 2022

Yes as far as I can tell that solves it!

I would probably have the bound fall back to the next page limit from the requested bound -- not sure if using the exact requested bound is safer than that or not. If the bound passed to xpmem_attach is not page-aligned, xpmem will align it and include the whole page in the attachment. The downside to that is that smsc/xpmem won't know about it, which could lead to duplicate attachments. But functionally, I'd say both options are okay.

@gkatev
Copy link
Contributor Author

gkatev commented Nov 21, 2022

As far as I can tell this was fully fixed in the mentioned PRs, thanks!

@gkatev gkatev closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants