`ami-search-new` - interpretation of output files #60

ay-amityadav · 2019-02-19T09:38:43Z

ami-version ami20190218c. The discussion below follows on the output of the command ami-search-new -p ./ami20190218c/osanctum200/ --dictionary country plantparts.

GLOBAL
Let's first look at the four major files generated. I will talk about them in the order it makes the testing process easier.

full.dataTables.html. This file contains the columns: articles, bibliography, dic:country, dic:plantparts, word:freqeuncies. Apart from the bibliography column (mentioned at issue 48), I think we should explicitly mention count 1, such as India * 1, otherwise it's confusing. Probable TODO I havent't looked at the correctness of the counts in the table by manually inspecting a paper.
Problem: The column word:frequencies doesn't seem to be sorted in any way. If sorted by counts, it would help the latter testing procees, and is probably the best way here.
commonest.dataTables.html. This file also contains the same columns as the above. Test: the entries in a particular cell for the columns dic:country, dic:plantparts, word:frequencies is the maximum of the values present in the corresponding cell in full.dataTables.html. The file passes this test. Similarly as above, we can display count 1 such as India *1
count.dataTables.html. First the columns dic:country and dic:plantparts seem to contain the total number of words present in a paper from the respective dictionaries. The values are close to the sum of counts present in a particular cell in the full.dataTables.html (the values are not equal since full.dataTables.html apparently shows only the top 5 counts in a cell). The Problem lies with the column word:frequencies where the value in a cell is less than the value in the corresponding cell of commonest.dataTables.html. Please confirm this
entries.dataTables.html . Test: The columns dic:country and dic:plantparts contains the numbe of different terms present in the paper from the respective dictionaries. Based on the entries in
full.dataTables.html for the respective dictionaries, the values in the concerned file seem correct. Problem: same as with count.dataTables.html, column word:frequencies where the value in a cell is less than the value in the corresponding cell of commonest.dataTables.html. Please confirm this

I suggest we create a separate folder for the above four files, probably named tables.

Files created in osanctum200 directory related to country dictionary:

search.country.count.xml - no information
search.country.documents.xml - no information
search.country.snippets.xml - at the moment looks ok, requires more carefull look
I suggest we create a separate folder for the above files namedcountry

Files created in osanctum directory related to plantparts dictionary:

search.plantparts.countxml - no information
search.plantparts.documents.xml - no information
search.plantparts.snippets.xml - at the moment looks ok, requires more carefull look
I suggest we create a separate folder for the above files named plantparts

Files created in osanctum directory related to word frquencies:

word.frequencies.count.xml - no information
word.frequencies.documents.xml- no information
word.frequencies.snippets.xml - at the moment looks ok, requires more carefull look
I suggest we create a separate folder for the above files named word_frequencies

PAPER SPECIFIC

Let's look inside the PMC1397864 folder.
search.country.count.xml - no information
search.country.snippets.xml - at the moment looks ok, requires more carefull look
search.plantparts.count.xml - no information
search.plantparts.snippets.xml - at the moment looks ok, requires more carefull look
word.frequencies.count.xml- no information
word.frequencies.snippets.xml - Problem: name of the file needs to be changed, it doesn't contain any snippets, other things look. Test This file and the corresponding cell in the column word:freqeuncies in full.dataTables.html should agree each other.

We can probably club the above files together in a folder

The other three files:
eupmc_result.json, fulltext.xml and scholarly.html look ok to me. We can probably club them together in a folder

results: Need to look at this folder

Cooccurrence
NOTE: On this version, I get a folder named __cooccurrence, instead of cooccurrence.
Need to look at this.

Please find the files related to this issue at: https://github.com/petermr/tigr2ess/tree/master/problems/amit/issue

The text was updated successfully, but these errors were encountered:

petermr · 2019-02-19T12:32:24Z

Thanks very much!
This is a good analysis. I think for the tutorial we will stick with fullDataTables.html. The others are experiments to see who might find them useful.

petermr · 2019-02-28T05:25:56Z

Please transfer to bottom of ami-search tutorial

petermr assigned petermr, ay-amityadav and ambarishK and unassigned ay-amityadav Feb 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ami-search-new` - interpretation of output files #60

`ami-search-new` - interpretation of output files #60

ay-amityadav commented Feb 19, 2019 •

edited

Loading

petermr commented Feb 19, 2019

petermr commented Feb 28, 2019

ami-search-new - interpretation of output files #60

ami-search-new - interpretation of output files #60

Comments

ay-amityadav commented Feb 19, 2019 • edited Loading

petermr commented Feb 19, 2019

petermr commented Feb 28, 2019

`ami-search-new` - interpretation of output files #60

`ami-search-new` - interpretation of output files #60

ay-amityadav commented Feb 19, 2019 •

edited

Loading