Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idr0090-ashdown-malaria S-BIAD882 #651

Open
will-moore opened this issue Feb 22, 2023 · 30 comments
Open

idr0090-ashdown-malaria S-BIAD882 #651

will-moore opened this issue Feb 22, 2023 · 30 comments

Comments

@will-moore
Copy link
Member

idr0090-ashdown-malaria

@will-moore will-moore moved this to test convert in NGFF conversion Feb 22, 2023
@dominikl
Copy link
Member

Running out of diskspace on pilot-zarr2-dev... Going to try to convert on pilot-idrtesting.

@dominikl
Copy link
Member

Same on pilot-idrtersting. Dont know how much diskspace I'd need, even nearly 1Tb isnt enough.

@dominikl dominikl moved this from test convert to re-import test image in NGFF conversion Feb 27, 2023
@will-moore
Copy link
Member Author

Thanks @dominikl for these conversions...

$ ssh pilot-zarr1-dev
ls -alh /data/idr0090
total 96K
drwxrwxr-x.  4 dlindner dlindner  89 Apr 12 10:10 .
drwxrwxr-x. 16 root     idr-data 289 Apr  6 15:02 ..
drwxrwxr-x.  5 dlindner dlindner  89 Apr 12 10:58 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr
-rw-rw-r--.  1 dlindner dlindner 95K Feb 24 13:18 190213.screen

Plate named "190211" is a sparse plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303

@will-moore
Copy link
Member Author

will-moore commented Apr 27, 2023

Make bucket...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0090
make_bucket: idr0090
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0090 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0090  --cors-configuration file://cors.json

Upload 1 plate...

# pilot-zarr1-dev
(base) [wmoore@pilot-zarr1-dev data]$ /home/wmoore/mc cp -r idr0090/190213.ome.zarr/ uk1s3/idr0090/zarr/190213.ome.zarr
...

1.02 TiB

https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0090/zarr/190213.ome.zarr&well=all

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0090/zarr/190213.ome.zarr

Image

@will-moore will-moore moved this from re-import test image to convert all data to NGFF in NGFF conversion Apr 27, 2023
@will-moore
Copy link
Member Author

So far we have 2 plates on pilot-zarr1-dev.
Zipping (with -m to remove originals)...

$ screen -S idr0090_zip
$ ls -lh /data/idr0090
total 0
drwxrwxr-x. 10 dlindner dlindner 154 Apr 12 18:34 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr

$ cd /data/idr0090
$ for i in */; do zip -mr "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

Doh! Got permission denied! sudo...

$ for i in */; do sudo zip -mr "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

Current log (21 hours later...)

  adding: 190211.ome.zarr/B/9/3/0/0/3/23/0/1 (deflated 38%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/ (stored 0%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/0 (deflated 36%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/1 (deflated 37%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/24/ (stored 0%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/24/0/ (stored 0%)

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

zip still running...27 hours... Not half-way yet!! - This is on Well 14/31 for that plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303

)
  adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/0 (deflated 26%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/1 (deflated 24%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/14/ (stored 0%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/14/0/ (stored 0

@dominikl
Copy link
Member

Conversion takes 34 hours / plate.

@will-moore
Copy link
Member Author

Installed p7zip on pilot-zarr1-dev:

(base) [wmoore@pilot-zarr1-dev ~]$ sudo yum install p7zip
(base) [wmoore@pilot-zarr1-dev ~]$ which 7za
/usr/bin/7za

@will-moore
Copy link
Member Author

will-moore commented Jul 13, 2023

Cancelled the previous zip process (still less than halfway through).
Hopefully enough space to zip, then upload and delete...

(base) [wmoore@pilot-zarr1-dev idr0090]$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.9T  3.3T  1.7T  67% /data

$ screen -r idr0090_zip
$ cd /data/idr0090
$ 7za a 190213.ome.zarr.zip 190213.ome.zarr

@will-moore
Copy link
Member Author

will-moore commented Jul 13, 2023

Wow, finally completed zipping one plate... started upload

(base) [wmoore@pilot-zarr1-dev idr0090]$ screen -S idr0090_zip

$ sudo 7za a 190213.ome.zarr.zip 190213.ome.zarr

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)

Scanning the drive:
1048096 folders, 803034 files, 1120438944155 bytes (1044 GiB)

Creating archive: 190213.ome.zarr.zip

Items to compress: 1851130

                                                  
Files read from disk: 803034
Archive size: 752178841171 bytes (701 GiB)
Everything is Ok

upload...

$ screen -r idr0090_zip
$ cd .aspera/cli/bin
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 [email protected]:5f/136exxxxxx

delete (might take a while)...

$ screen -S idr0090_rm
$ sudo rm -rf 190213.ome.zarr

@will-moore
Copy link
Member Author

Unfortunately upload timed-out. Needs about 7 hours to upload!

(base) [wmoore@pilot-zarr1-dev bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 [email protected]:5f/13xxxxxxx
190213.ome.zarr.zip                                                                                                                  10%   75GB  250Mb/s  6:04:22 ETA
Partial Completion: 79044314K bytes transferred in 2697 seconds
 (240063K bits/sec), in 1 file, 1 directory; 1 file failed.

Session Stop  (Error: Session data transfer timeout (server), Session data transfer timeout)

@will-moore
Copy link
Member Author

@dominikl I've cleaned-up space I've been using on pilot-zarr1-dev.
I don't have anything important there now, except the idr0090 plate and zplate.zip, so feel free to delete anything else you need.
Still quite a bit of space used - not sure where (apart from idr0090).

(base) [wmoore@pilot-zarr1-dev data]$ df -h /data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.9T  2.6T  2.4T  52% /data

@dominikl
Copy link
Member

The upload seems to be a problem indeed. Just got the session timeout as well. I'll have a look if it's possible to split the zip into maybe 10 parts so that they're < 100GB.

@dominikl
Copy link
Member

Creating 100Gb chunks now, with -v100g . But there's another problem, 190129 is failing with a NPE

(base) [dlindner@pilot-zarr1-dev idr0090]$ /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo /uod/idr/metadata/idr0090-ashdown-malaria/screens/190129.screen 190129.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp7557699430086545059/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/dlindner/bioformats2raw/lib/kryo-2.24.0.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@16150369): java.lang.NullPointerException
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)
        at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.NullPointerException

@will-moore
Copy link
Member Author

Oh dear! I don't know if BioStudies will handle multiple zips correctly - e.g. unzip them into a single Fileset. Might need to contact them and ask for advice?

@dominikl
Copy link
Member

I did, on bia-idr channel, but no reply yet. I can't see why this should be a problem. You simply extract it using the first volume and it figures the other volumen files out itself:

Extracting archive: 190206.ome.zarr.zip.001
--
Path = 190206.ome.zarr.zip.001
Type = Split
Physical Size = 107374182400
Volumes = 7
Total Physical Size = 697753098234
----
Path = 190206.ome.zarr.zip
Size = 697753098234

@dominikl
Copy link
Member

I'll collect the failed plates here:

  • 190129
  • 190227

@dominikl
Copy link
Member

Just can't zip the last plate...

(base) [dlindner@pilot-zarr1-dev idr0090]$ 7za -v100g  a 190904.ome.zarr.zip 190904.ome.zarr

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)

Scanning the drive:
3007563 folders, 2304336 files, 3091971823339 bytes (2880 GiB)

Creating archive: 190904.ome.zarr.zip

Items to compress: 5311899



System ERROR:
E_FAIL

@dominikl
Copy link
Member

Ah, there's not enough disk space probably...

@dominikl
Copy link
Member

1.7 Tb free now, should be enough. Using idr-ftp / idr-testing to export the two failed plates 190129 and 190227 with omero-cli-zarr.

@dominikl
Copy link
Member

Ok, it looks like this plate actually is nearly 3Tb... will copy it over to idrftp to do the zipping there.

@dominikl
Copy link
Member

dominikl commented Sep 4, 2023

Everything's uploaded now. Also updated idr0090_files.tsv to include ImageID column as the zip files are split into 100gb chunks.

@dominikl dominikl moved this from convert all data to NGFF to BioStudies Submission in NGFF conversion Sep 4, 2023
@dominikl dominikl removed their assignment Sep 4, 2023
@francesw francesw self-assigned this Sep 4, 2023
@francesw francesw changed the title idr0090-ashdown-malaria to NGFF idr0090-ashdown-malaria S-BIAD882 Sep 11, 2023
@francesw francesw moved this from BioStudies Submission to Data on Embassy s3 in NGFF conversion Sep 11, 2023
@francesw francesw removed their assignment Sep 11, 2023
@will-moore will-moore moved this from Data on Embassy s3 to create new Filesets in idr-next in NGFF conversion Oct 6, 2023
@will-moore
Copy link
Member Author

Running on idr0125-pilot as wmoore...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
>   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
> done
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-11/2021-02/20/06-09-40.395 for fileset: 4782270
...

@will-moore
Copy link
Member Author

goofys failed... 9/22 exported. 13 to go...

remounted, edited idr0013.csv and re-ran...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/19/19-38-35.684 for fileset: 4782261
...

@will-moore
Copy link
Member Author

Goofys failed again. 6 more sql generated, 7 still to go...

Going to replace goofys with geesefs. Already installed on idr0125-pilot at IDR/omero-mkngff#2 (comment)

Now mount at same URL instead of goofys...

sudo umount /bia-integrator-geesefs
sudo umount /bia-integrator-data
sudo /opt/geesefs --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
s3.INFO anonymous bucket detected
main.INFO File system has been successfully mounted.

Restarted mkngff...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/18/20-50-17.861 for fileset: 4782251
...

@will-moore
Copy link
Member Author

Server restart (idr.openmincroscopy.org release) after 1 Filseset...
Restart...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-6/2021-02/19/12-14-48.182 for fileset: 4782256
...

@will-moore
Copy link
Member Author

All done:

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ ls -alh idr0090
total 33M
drwxrwxr-x.  2 wmoore wmoore 4.0K Oct  6 16:58 .
drwx------. 24 wmoore wmoore 4.0K Oct  9 08:44 ..
-rw-rw-r--.  1 wmoore wmoore 3.8M Oct  9 07:36 4782251.sql
-rw-rw-r--.  1 wmoore wmoore 1.4M Oct  6 16:01 4782252.sql
-rw-rw-r--.  1 wmoore wmoore 1.8M Oct  6 16:30 4782253.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  9 09:53 4782254.sql
-rw-rw-r--.  1 wmoore wmoore 1.8M Oct  9 11:25 4782255.sql
-rw-rw-r--.  1 wmoore wmoore 2.4M Oct  9 09:25 4782256.sql
-rw-rw-r--.  1 wmoore wmoore 516K Oct  6 16:46 4782257.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  9 11:56 4782258.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:38 4782259.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  8 22:21 4782260.sql
-rw-rw-r--.  1 wmoore wmoore 1.6M Oct  8 22:01 4782261.sql
-rw-rw-r--.  1 wmoore wmoore 1.6M Oct  9 10:47 4782262.sql
-rw-rw-r--.  1 wmoore wmoore 1.2M Oct  8 22:08 4782263.sql
-rw-rw-r--.  1 wmoore wmoore 1.2M Oct  8 22:27 4782264.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:19 4782265.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:10 4782266.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  9 10:12 4782267.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  8 22:13 4782268.sql
-rw-rw-r--.  1 wmoore wmoore 860K Oct  6 16:43 4782269.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  6 15:53 4782270.sql
-rw-rw-r--.  1 wmoore wmoore 518K Oct  6 16:49 4782271.sql
-rw-rw-r--.  1 wmoore wmoore 3.8M Oct  8 22:50 4782272.sql

@will-moore
Copy link
Member Author

Running sql etc on idr0125-pilot.

$ psql -U omero -d idr -h $DBHOST -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
                 uuid                 
--------------------------------------
 2703680e-9e33-49b0-8fea-9f7c17df16d7

Copied idr0090 sqls to omero-server.

for i in $(ls); do sed -i 's/SECRETUUID/2703680e-9e33-49b0-8fea-9f7c17df16d7/g' $i; done

$ for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
  omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr"
done
...

UPDATE 736
BEGIN
 mkngff_fileset 
----------------
        5288269
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff/eba197df-ea03-4465-8855-2e9bde0db414.zarr -> /bia-integrator-data/S-BIAD882/eba197df-ea03-4465-8855-2e9bde0db414/eba197df-ea03-4465-8855-2e9bde0db414.zarr

Try viewing a smaller plate...(from bioformats2raw) http://localhost:1040/webclient/?show=image-12545749

@will-moore will-moore moved this from check_pixels to NGFF studies in NGFF conversion May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: NGFF studies
Development

No branches or pull requests

3 participants