Y24-247 - As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing. #4272

TWJW-SANGER · 2024-08-13T15:46:59Z

As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing.

Background
GSU have received additional QC data on samples that have already been received and processed through the sample ingestion process for Heron.
This QC data has been manually compiled by the supplier and is known to have errors.

When we received the actual samples information about deep well plates this should have been inserted into the MLWH lighthouse_sample table.
These deep well plates may have been stamped into shallow wells plates as part of the freezer space reduction cost saving activity.

To avoid sequencing samples without correct data RVI would like to check the consistency of data received in the QC files against that held in SequenceScape and MLWH.

Broadly the desired process is that:

GSU supply a file containing for each sample: Root Sample ID, Assumed Deep Well Barcode, Assumed Deep Well Location, Assumed Shallow Well Barcode, Assumed Shallow Well Location
We return a file containing for each sample: Root Sample ID, Assumed Deep Well Barcode, Assumed Deep Well Location, Assumed Shallow Well Barcode, Assumed Shallow Well Location, Actual Deep Well barcode, Actual Deep Well Location, Actual Shallow Well Barcode, Actual Shallow Well Location, Actual Match/s [True/False], Sanger Sample ID

The data on deep well plates should be in the lighthouse_sample table database.
The data on shallow well plates if they exist should be in the SequenceScape database.

Actual Match/s [True/False] is computed by comparing the Assumed data fields with the corresponding Actual data fields for both Deep and Shallow Well fields.

Sanger Sample ID is provided as a convenience for RVI.

It is expected that GSU will provide us with an input file every 2 weeks for 2 to 3 months.

Acceptance Criteria

The input file format is agreed with RVI.
The process of taking an input file and producing the output file is automated as much as possible.
Add script to psd-support-scripts ensuring documentation is provided so that any member of the team could run the process.
If Root Sample ID is not found all Actual Columns and Sanger Sample ID are NULL / blank & Actual Match is false.
If Deep or Shallow well plates are not found the corresponding Actual fields are NULL / blank.

A number of ways of implementing this are available, we want the simplest in terms of effort.

Stakeholders
Ya L
Anna G

TWJW-SANGER · 2024-08-21T12:50:09Z

Required for 600 samples coming in October

yh4-GSU · 2024-08-23T15:05:55Z

Hi Neil,
may I ask a question regarding this ticket? For the "Actual Shallow Well Barcode, Actual Shallow Well Location" that PSD aims to return, what is the detailed logic behind? Is it by inputting the Root sample ID we provide and see in LIMS which shallow well plate and well that sample is sitting in? If the sample has gone through some journey/process, will the output data from PSD be the "latest" plate/well this sample is in?
Best wishes,
Ya-Lin

neilsycamore · 2024-08-27T15:22:23Z

Hi Ya-Lin,
Searching by the root_sample_id is a very lengthy process as the sample_description attribute where it is stored is not indexed, there are 9.6m records and each one has to be searched one by one, well by well.
The 'Actual' would be either a confirmation of the root sample present in the 'assumed' shallow plate::well OR the next aliquot (child well) of the deep plate::well.

neilsycamore · 2024-08-27T15:25:56Z

Ya-Lin I have a question. Do you need to provide us with dw AND sw data? Would supplying root_sample_id, dw_barcode, dw well location not be enough for us to report back:
confirmation of dw data
sw plate:well data and sample RVI name
?

yh4-GSU · 2024-08-27T17:44:05Z

Hi Neil,

The Root Sample ID vs dwpID/Well check is to see whether the raw data we obtained from the lighthouse lab is matching the data coming through Heron pipeline and stored in LIMS. It is required because both the raw data and the Heron platemap have been manually generated by the lighthouse lab and we have observed several mistakes.

The Root Sample ID vs SwpID/Well check is to see whether the sample is sitting in the plate/well that we believe it is in as the samples once received may have gone through several procedures (e.g. stamping, cherrypicking etc). This is why we'd wish to provide you the swpID/Well that we believe the sample is in and get an answer from PSD whether this is matching. In the case when it doesn't match, we'd wish to know where (swpID/Well) this sample is sitting according to LIMS. This says we are not interested in the child swp plate of the dwp plate but rather the "latest" location of the sample after its journey.

Hope I explain everything properly 😅.

We understand searching by the root sample ID would be a lengthy process (which we would never achieve by ourselves). We are hoping once a script is written for this job to get done automatically, it'll save all of our time and efforts 🤞. Thanks a million.

Best wishes,
Ya-Lin

neilsycamore · 2024-08-29T11:52:32Z

Hi Ya-Lin
Thank you for the above.
Attached is a first attempt generated from the prod_7 data from RT807257 which was in the format root_sample_id, dw barcode, sw barcode, dw position.
This data has the added benefit that the samples have been cherrypicked several times so we can see where the 'latest' location of the sample is.

Y24-247_RVI_sample_data_prod_7.xlsx

Let me know your thoughts please

yh4-GSU · 2024-09-02T08:52:33Z

Hi Neil,

thanks a lot for your hard work. The outcome file is looking good. As you mentioned, this batch of samples are useful for your test as they've been through 2 rounds of cherrypicking. I can see your outcome for the last plate/well/platetype catches the final CP plate! 👍

There's just one thing that samples shown as "Well empty" in SW sample name (column I) were not included in the further SW check (column L-R). Is it possible to mark these "Well empty" samples as "SW match NO" and include them in the downstream checks?

The format/info for the outcome spreadsheet as it is now is already great. If you wish to reduce the output info/columns, we can discuss further as well.

Best wishes,
Ya-Lin

TWJW-SANGER added the Reporting label Aug 13, 2024

SujitDey2022 added the GSU label Aug 14, 2024

TWJW-SANGER added RVI RVI project Time Sensitive labels Aug 15, 2024

neilsycamore self-assigned this Aug 21, 2024

neilsycamore closed this as completed Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Y24-247 - As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing. #4272

Y24-247 - As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing. #4272

TWJW-SANGER commented Aug 13, 2024 •

edited by neilsycamore

Loading

TWJW-SANGER commented Aug 21, 2024

yh4-GSU commented Aug 23, 2024

neilsycamore commented Aug 27, 2024

neilsycamore commented Aug 27, 2024

yh4-GSU commented Aug 27, 2024

neilsycamore commented Aug 29, 2024

yh4-GSU commented Sep 2, 2024

Y24-247 - As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing. #4272

Y24-247 - As a GSU PM (YL) I would like to verify the manually compiled QC data of plates from a supplier is consistent with the data already processed in Sangers systems so that we can select samples with good data for sequencing. #4272

Comments

TWJW-SANGER commented Aug 13, 2024 • edited by neilsycamore Loading

TWJW-SANGER commented Aug 21, 2024

yh4-GSU commented Aug 23, 2024

neilsycamore commented Aug 27, 2024

neilsycamore commented Aug 27, 2024

yh4-GSU commented Aug 27, 2024

neilsycamore commented Aug 29, 2024

yh4-GSU commented Sep 2, 2024

TWJW-SANGER commented Aug 13, 2024 •

edited by neilsycamore

Loading