Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double Backslash problem in Windows - Jupyter Notebook #92

Open
ShweataNHegde opened this issue Nov 6, 2020 · 2 comments
Open

Double Backslash problem in Windows - Jupyter Notebook #92

ShweataNHegde opened this issue Nov 6, 2020 · 2 comments

Comments

@ShweataNHegde
Copy link
Collaborator

Hello,
I tried running text.ipynb (https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb) on Windows 10 Home. But the path of the files that gets printed has double backslashes, instead of one. Details below.

When I run the following cell, the path of the files that gets printed has double backslashes, instead of one.

project = 'C:/Users/shweata/ami3/src/test/resources/org/contentmine/ami/zika10'
# os.chdir(project)
file_glob = 'PMC*'
files = get_globbed_files(project, file_glob)
print("number of " + file_glob + " files: " + str(len(files)) + "\n " + str(files))
print("file type " + str(type(files)))
abstract_files = get_globbed_files(project, 'PMC*/sections/abstract/*.xml')
print("abstracts " + str(abstract_files))    
text_files = get_globbed_files(project, 'PMC*/sections/**/*.xml', recursive=False)
print("number of xml text files: " + str(len(text_files)) +"\n" + str(text_files))
figure_files = get_globbed_files(project, 'PMC*/sections/**/*figure*.xml', recursive=False)
# print("number of figure files: " + str(len(figure_files)) +"\n" + str(figure_files))

The output:

bstracts ['PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3310194\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_1.xml', 'PMC3310660\\sections\\abstract\\elem_0.xml', 'PMC3321795\\sections\\abstract\\elem_0.xml', 'PMC3321797\\sections\\abstract\\elem_0.xml']
number of xml text files: 141
['PMC3113902\\sections\\2_back\\0_ack.xml', 'PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC3113902\\sections\\article\\elem_0.xml', 'PMC3113902\\sections\\figures\\figure_1.xml', 'PMC3113902\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\2_back\\0_ack.xml', 'PMC320490\\sections\\3_floats-group\\0_figure_1.xml', 'PMC320490\\sections\\3_floats-group\\1_figure_2.xml', 'PMC320490\\sections\\3_floats-group\\2_table_1.xml', 'PMC320490\\sections\\3_floats-group\\3_table_2.xml', 'PMC320490\\sections\\3_floats-group\\4_figure_3.xml', 'PMC320490\\sections\\3_floats-group\\5_figure_4.xml', 'PMC320490\\sections\\3_floats-group\\6_figure_5.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC320490\\sections\\article\\elem_0.xml', 'PMC320490\\sections\\figures\\figure_1.xml', 'PMC320490\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\figures\\figure_3.xml', 'PMC320490\\sections\\figures\\figure_4.xml', 'PMC320490\\sections\\figures\\figure_5.xml', 'PMC320490\\sections\\tables\\table_1.xml', 'PMC320490\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\0_introduction\\0_title.xml', 'PMC3289602\\sections\\0_introduction\\1_p.xml', 'PMC3289602\\sections\\0_introduction\\2_p.xml', 'PMC3289602\\sections\\0_introduction\\3_p.xml', 'PMC3289602\\sections\\0_introduction\\4_p.xml', 'PMC3289602\\sections\\0_introduction\\5_p.xml', 'PMC3289602\\sections\\1_methods\\0_title.xml', 'PMC3289602\\sections\\2_back\\0_fn-group.xml', 'PMC3289602\\sections\\2_results\\0_title.xml', 'PMC3289602\\sections\\3_discussion\\0_title.xml', 'PMC3289602\\sections\\3_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\3_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\3_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\3_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\3_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\4_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\4_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\4_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\4_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\4_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3289602\\sections\\acknowledge\\elem_0.xml', 'PMC3289602\\sections\\article\\elem_0.xml', 'PMC3289602\\sections\\figures\\figure_1.xml', 'PMC3289602\\sections\\figures\\figure_2.xml', 'PMC3289602\\sections\\methods\\methods__4_0.xml', 'PMC3289602\\sections\\tables\\table_1.xml', 'PMC3289602\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\tables\\table_3.xml', 'PMC3310194\\sections\\2_back\\0_ack.xml', 'PMC3310194\\sections\\2_back\\2_app-group.xml', 'PMC3310194\\sections\\3_floats-group\\0_table-wrap.xml', 'PMC3310194\\sections\\3_floats-group\\10_figure_10_.xml', 'PMC3310194\\sections\\3_floats-group\\11_figure_11_.xml', 'PMC3310194\\sections\\3_floats-group\\12_figure_12_.xml', 'PMC3310194\\sections\\3_floats-group\\13_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\14_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\15_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\16_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\17_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\18_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\19_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\1_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\20_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\21_appendix_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\22_appendix_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\23_appendix_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\2_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\3_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\4_figure_4_.xml', 'PMC3310194\\sections\\3_floats-group\\5_figure_5_.xml', 'PMC3310194\\sections\\3_floats-group\\6_figure_6_.xml', 'PMC3310194\\sections\\3_floats-group\\7_figure_7_.xml', 'PMC3310194\\sections\\3_floats-group\\8_figure_8_.xml', 'PMC3310194\\sections\\3_floats-group\\9_figure_9_.xml',

(Truncated)
When I try running the subsequent cell,

text_contents = []
for text_file in text_files:
    text_filex = open(text_file,mode='r')
    text = text_filex.read()
    text_filex.close()
    text_contents.append(text)
len(text_contents) 
# text_contents

I get the following error.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-43f8fb121a67> in <module>
      1 text_contents = []
      2 for text_file in text_files:
----> 3     text_filex = open(text_file,mode='r')
      4     text = text_filex.read()
      5     text_filex.close()
FileNotFoundError: [Errno 2] No such file or directory: 'PMC3113902\\sections\\2_back\\0_ack.xml'

I looked it up online for help, and this is what I found. (https://lerner.co.il/2018/07/24/avoiding-windows-backslash-problems-with-pythons-raw-strings/ ). I tried the solutions suggested in this article, but that didn't help.
I have very little experience with programming and, any help regarding this would be appreciated.

@petermr
Copy link
Owner

petermr commented Nov 6, 2020

I will ask on Shuttleworth Slack

@petermr
Copy link
Owner

petermr commented Nov 6, 2020

I think I should be using Path... Just a guess at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants