We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I tried running text.ipynb (https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb) on Windows 10 Home. But the path of the files that gets printed has double backslashes, instead of one. Details below.
text.ipynb
When I run the following cell, the path of the files that gets printed has double backslashes, instead of one.
project = 'C:/Users/shweata/ami3/src/test/resources/org/contentmine/ami/zika10' # os.chdir(project) file_glob = 'PMC*' files = get_globbed_files(project, file_glob) print("number of " + file_glob + " files: " + str(len(files)) + "\n " + str(files)) print("file type " + str(type(files))) abstract_files = get_globbed_files(project, 'PMC*/sections/abstract/*.xml') print("abstracts " + str(abstract_files)) text_files = get_globbed_files(project, 'PMC*/sections/**/*.xml', recursive=False) print("number of xml text files: " + str(len(text_files)) +"\n" + str(text_files)) figure_files = get_globbed_files(project, 'PMC*/sections/**/*figure*.xml', recursive=False) # print("number of figure files: " + str(len(figure_files)) +"\n" + str(figure_files))
The output:
bstracts ['PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3310194\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_1.xml', 'PMC3310660\\sections\\abstract\\elem_0.xml', 'PMC3321795\\sections\\abstract\\elem_0.xml', 'PMC3321797\\sections\\abstract\\elem_0.xml'] number of xml text files: 141 ['PMC3113902\\sections\\2_back\\0_ack.xml', 'PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC3113902\\sections\\article\\elem_0.xml', 'PMC3113902\\sections\\figures\\figure_1.xml', 'PMC3113902\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\2_back\\0_ack.xml', 'PMC320490\\sections\\3_floats-group\\0_figure_1.xml', 'PMC320490\\sections\\3_floats-group\\1_figure_2.xml', 'PMC320490\\sections\\3_floats-group\\2_table_1.xml', 'PMC320490\\sections\\3_floats-group\\3_table_2.xml', 'PMC320490\\sections\\3_floats-group\\4_figure_3.xml', 'PMC320490\\sections\\3_floats-group\\5_figure_4.xml', 'PMC320490\\sections\\3_floats-group\\6_figure_5.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC320490\\sections\\article\\elem_0.xml', 'PMC320490\\sections\\figures\\figure_1.xml', 'PMC320490\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\figures\\figure_3.xml', 'PMC320490\\sections\\figures\\figure_4.xml', 'PMC320490\\sections\\figures\\figure_5.xml', 'PMC320490\\sections\\tables\\table_1.xml', 'PMC320490\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\0_introduction\\0_title.xml', 'PMC3289602\\sections\\0_introduction\\1_p.xml', 'PMC3289602\\sections\\0_introduction\\2_p.xml', 'PMC3289602\\sections\\0_introduction\\3_p.xml', 'PMC3289602\\sections\\0_introduction\\4_p.xml', 'PMC3289602\\sections\\0_introduction\\5_p.xml', 'PMC3289602\\sections\\1_methods\\0_title.xml', 'PMC3289602\\sections\\2_back\\0_fn-group.xml', 'PMC3289602\\sections\\2_results\\0_title.xml', 'PMC3289602\\sections\\3_discussion\\0_title.xml', 'PMC3289602\\sections\\3_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\3_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\3_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\3_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\3_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\4_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\4_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\4_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\4_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\4_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3289602\\sections\\acknowledge\\elem_0.xml', 'PMC3289602\\sections\\article\\elem_0.xml', 'PMC3289602\\sections\\figures\\figure_1.xml', 'PMC3289602\\sections\\figures\\figure_2.xml', 'PMC3289602\\sections\\methods\\methods__4_0.xml', 'PMC3289602\\sections\\tables\\table_1.xml', 'PMC3289602\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\tables\\table_3.xml', 'PMC3310194\\sections\\2_back\\0_ack.xml', 'PMC3310194\\sections\\2_back\\2_app-group.xml', 'PMC3310194\\sections\\3_floats-group\\0_table-wrap.xml', 'PMC3310194\\sections\\3_floats-group\\10_figure_10_.xml', 'PMC3310194\\sections\\3_floats-group\\11_figure_11_.xml', 'PMC3310194\\sections\\3_floats-group\\12_figure_12_.xml', 'PMC3310194\\sections\\3_floats-group\\13_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\14_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\15_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\16_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\17_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\18_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\19_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\1_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\20_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\21_appendix_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\22_appendix_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\23_appendix_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\2_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\3_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\4_figure_4_.xml', 'PMC3310194\\sections\\3_floats-group\\5_figure_5_.xml', 'PMC3310194\\sections\\3_floats-group\\6_figure_6_.xml', 'PMC3310194\\sections\\3_floats-group\\7_figure_7_.xml', 'PMC3310194\\sections\\3_floats-group\\8_figure_8_.xml', 'PMC3310194\\sections\\3_floats-group\\9_figure_9_.xml',
(Truncated) When I try running the subsequent cell,
text_contents = [] for text_file in text_files: text_filex = open(text_file,mode='r') text = text_filex.read() text_filex.close() text_contents.append(text) len(text_contents) # text_contents
I get the following error.
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-4-43f8fb121a67> in <module> 1 text_contents = [] 2 for text_file in text_files: ----> 3 text_filex = open(text_file,mode='r') 4 text = text_filex.read() 5 text_filex.close() FileNotFoundError: [Errno 2] No such file or directory: 'PMC3113902\\sections\\2_back\\0_ack.xml'
I looked it up online for help, and this is what I found. (https://lerner.co.il/2018/07/24/avoiding-windows-backslash-problems-with-pythons-raw-strings/ ). I tried the solutions suggested in this article, but that didn't help. I have very little experience with programming and, any help regarding this would be appreciated.
The text was updated successfully, but these errors were encountered:
I will ask on Shuttleworth Slack
Sorry, something went wrong.
I think I should be using Path... Just a guess at present.
No branches or pull requests
Hello,
I tried running
text.ipynb
(https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb) on Windows 10 Home. But the path of the files that gets printed has double backslashes, instead of one. Details below.When I run the following cell, the path of the files that gets printed has double backslashes, instead of one.
The output:
(Truncated)
When I try running the subsequent cell,
I get the following error.
I looked it up online for help, and this is what I found. (https://lerner.co.il/2018/07/24/avoiding-windows-backslash-problems-with-pythons-raw-strings/ ). I tried the solutions suggested in this article, but that didn't help.
I have very little experience with programming and, any help regarding this would be appreciated.
The text was updated successfully, but these errors were encountered: