read-seqs input parameter improvement #36

penuts7644 · 2019-02-04T10:25:23Z

Hi Quentin,

According to the documentation for the -read-seqs parameter, the input CSV file should be formatted as: with the sequence index as first column and the sequence in the second separated by a semicolon ';'.

I would think that I would be able to pass in a CSV file with multiple semicolon separated columns and that IGoR will only use the first two. However, what happens is that each line is only separated on the first semicolon character found in that line. This means that the second column is combined with the remaining columns.

Example:

This index;sequence;other_data will turn into: index as first column and sequence;other_data as second column.
I would expect the following to happen: index as first column and sequence as second column.

Is there a reason for this behaviour?

Cheers, Wout

The text was updated successfully, but these errors were encountered:

qmarcou · 2019-03-05T08:38:01Z

Hi @penuts7644,
Nope there is no good reason other than: by assuming there are only two colums to the CSV the user cannot make mistakes in the column ordering.
I agree this is not very handy and I will try and make a e change for a slightly more flexible format
Best
Quentin

decenwang · 2019-03-20T12:23:36Z

Hi All, @qmarcou @penuts7644

Another question. when I input the sequences in fasta format by 'igor -read_seqs' command line, but I did not assign the index for the sample. and I found in the /tmp file, the sequences were automatically added the number with semicolon, e.g. 0; 1; 2; ………………. According to definition, the numbers are the indices, but not the DNA index/barcode. because they are from the same sample, so I really need to assign the index for each sample(all the sequences of each sample)? Anyway, I hope igor can recognize the index by itself if I can input an index file before inputing the sequences. maybe single index or dual indices.
If we use the PE sequencing, fast-dump splits, trimmomatic trims. and then we get split read1 and read2 files for each sample. So both of the Read1 and Read2 within one sample should be analyzed, or I just analyze either read1 or read2?
Could you please add a plugin or functionality as a translator from DNA into peptide? since TCR chains are special, they need the help of MHC I/II, namely the anchor residues. special amino acids (e.g. Arg, Glu may be different in numbers among different cohorts)

Thanks a million!

qmarcou added the enhancement label Mar 5, 2019

Repository owner deleted a comment from decenwang Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read-seqs input parameter improvement #36

read-seqs input parameter improvement #36

penuts7644 commented Feb 4, 2019

qmarcou commented Mar 5, 2019

decenwang commented Mar 20, 2019

read-seqs input parameter improvement #36

read-seqs input parameter improvement #36

Comments

penuts7644 commented Feb 4, 2019

qmarcou commented Mar 5, 2019

decenwang commented Mar 20, 2019