Bug report - XML citations from website #127

chungimungi · 2023-10-11T10:57:28Z

Error In
Parse Outgoing XML citations from website

for a lot of the PMIDs this error is shown

import csv
import multiprocessing
import pubmed_parser as pp

def write_to_file(f, pmid, result):
    try:
        if isinstance(result, dict) and "pmid_cited" in result:
            f.write(f'## PMID : {pmid}\n')
            f.write(f'PMID CITED : {result["pmid_cited"]}\n')
            # You can add more information from `result` here if needed
        else:
            f.write(f'Error processing PMID {pmid}: Invalid result format\n')
    except Exception as e:
        f.write(f'Error processing PMID {pmid}: {str(e)}\n')

def process_pmid(pmid):
    try:
        return pp.parse_outgoing_citation_web(pmid, id_type='PMID')
    except Exception as e:
        return f'Error processing PMID {pmid}: {str(e)}'

if __name__ == '__main__':
    # Output Markdown file
    output_file = 'out1.md'

    # Open the output file for writing
    with open(output_file, 'w') as f:
        # Write Markdown headers or other content here if needed
        f.write("# Outgoing Citations\n")

        # Open and read the CSV file with PMID values
        with open('pmidfinal.csv', 'r') as csvfile:
            csvreader = csv.reader(csvfile)
            
            # Skip the first 16021 rows
            for i in range(16021):
                next(csvreader, None)
            
            # Create a multiprocessing pool
            pool = multiprocessing.Pool()
            
            for row in csvreader:
                if row:
                    pmid = str(row[0])  # Assuming the 'PMID' column is the first (index 0) column
                    pool.apply_async(process_pmid, args=(pmid,), callback=lambda result: write_to_file(f, pmid, result))
            
            pool.close()
            pool.join()

    print("Process Complete")

This is my code for the parser (skipped first 16021 rows as i had already gotten information on the ones before)

I have a csv file containing only PMIDs

This is how it looks all PMIDs where taken from pubmeds oa subset

titipata · 2023-10-12T08:15:32Z

Thanks @chungimungi! I do not have time to take a look at the code. However, it seems like we need to check parse_outgoing_citation_web to see what goes wrong. The XML format may have changed quite a bit since my last time written this code.

chungimungi added the bug label Oct 11, 2023

chungimungi changed the title ~~Bug report~~ Bug report - XML citations from website Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report - XML citations from website #127

Bug report - XML citations from website #127

chungimungi commented Oct 11, 2023 •

edited

Loading

titipata commented Oct 12, 2023

Bug report - XML citations from website #127

Bug report - XML citations from website #127

Comments

chungimungi commented Oct 11, 2023 • edited Loading

titipata commented Oct 12, 2023

chungimungi commented Oct 11, 2023 •

edited

Loading