Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report - XML citations from website #127

Open
chungimungi opened this issue Oct 11, 2023 · 1 comment
Open

Bug report - XML citations from website #127

chungimungi opened this issue Oct 11, 2023 · 1 comment
Labels

Comments

@chungimungi
Copy link

chungimungi commented Oct 11, 2023

Error In
Parse Outgoing XML citations from website

for a lot of the PMIDs this error is shown

image

import csv
import multiprocessing
import pubmed_parser as pp

def write_to_file(f, pmid, result):
    try:
        if isinstance(result, dict) and "pmid_cited" in result:
            f.write(f'## PMID : {pmid}\n')
            f.write(f'PMID CITED : {result["pmid_cited"]}\n')
            # You can add more information from `result` here if needed
        else:
            f.write(f'Error processing PMID {pmid}: Invalid result format\n')
    except Exception as e:
        f.write(f'Error processing PMID {pmid}: {str(e)}\n')

def process_pmid(pmid):
    try:
        return pp.parse_outgoing_citation_web(pmid, id_type='PMID')
    except Exception as e:
        return f'Error processing PMID {pmid}: {str(e)}'

if __name__ == '__main__':
    # Output Markdown file
    output_file = 'out1.md'

    # Open the output file for writing
    with open(output_file, 'w') as f:
        # Write Markdown headers or other content here if needed
        f.write("# Outgoing Citations\n")

        # Open and read the CSV file with PMID values
        with open('pmidfinal.csv', 'r') as csvfile:
            csvreader = csv.reader(csvfile)
            
            # Skip the first 16021 rows
            for i in range(16021):
                next(csvreader, None)
            
            # Create a multiprocessing pool
            pool = multiprocessing.Pool()
            
            for row in csvreader:
                if row:
                    pmid = str(row[0])  # Assuming the 'PMID' column is the first (index 0) column
                    pool.apply_async(process_pmid, args=(pmid,), callback=lambda result: write_to_file(f, pmid, result))
            
            pool.close()
            pool.join()

    print("Process Complete")

This is my code for the parser (skipped first 16021 rows as i had already gotten information on the ones before)

I have a csv file containing only PMIDs

image

This is how it looks all PMIDs where taken from pubmeds oa subset

@chungimungi chungimungi changed the title Bug report Bug report - XML citations from website Oct 11, 2023
@titipata
Copy link
Owner

Thanks @chungimungi! I do not have time to take a look at the code. However, it seems like we need to check parse_outgoing_citation_web to see what goes wrong. The XML format may have changed quite a bit since my last time written this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants