The script does 3 things:
- Given a Entrez fetch table ~ 700 columns from SRA (not included in the scripts)
- Standardize column names (CELL_LINE, CELL LINE, celllines are all the same)
- Standardize column values: (Ribo-seq, Riboseq, RIBOSEQ are all the same)
- Semi manual annotation (HeLA is female cell line, HEK is male etc)
Finally upload this file to google drive with statistics of how much could be standardized.
The procedure is packaged into 3 scripts ran from: metadata_main_script.R