You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removed eprints may still appear on stats reports (usually as "unknown eprint '12345'") because IRStats2 doesn't keep a table of "active" eprints.
Solution 1
Have an "eprint set" table and JOIN any data tables (irstats2_downloads, irstats2_views etc.) on each query. Given the size of the download table, and potentially the size of the table of "active eprints", this would slow queries considerably. Views like "Top Authors" would need to perform two JOIN (on the largest tables) so that's really not an ideal solution.
Solution 2
Filter items on output, as data is extracted from the DB. Tricky also as this kills the use of SQL "LIMIT" and limits would have to be computed on-the-fly. Why? Cos a Top 10 authors perform such a LIMIT.
Solution 3
Add an extra field on the data tables "active" to flag if an eprint is "active" or not. This should be quicker than solution 1 since that field would be indexed and is a simple WHERE condition. This would however require to update all the data tables every day to mark active items (or non-active items).
Anything else we could do?
The text was updated successfully, but these errors were encountered:
Removed eprints may still appear on stats reports (usually as "unknown eprint '12345'") because IRStats2 doesn't keep a table of "active" eprints.
Solution 1
Have an "eprint set" table and JOIN any data tables (irstats2_downloads, irstats2_views etc.) on each query. Given the size of the download table, and potentially the size of the table of "active eprints", this would slow queries considerably. Views like "Top Authors" would need to perform two JOIN (on the largest tables) so that's really not an ideal solution.
Solution 2
Filter items on output, as data is extracted from the DB. Tricky also as this kills the use of SQL "LIMIT" and limits would have to be computed on-the-fly. Why? Cos a Top 10 authors perform such a LIMIT.
Solution 3
Add an extra field on the data tables "active" to flag if an eprint is "active" or not. This should be quicker than solution 1 since that field would be indexed and is a simple WHERE condition. This would however require to update all the data tables every day to mark active items (or non-active items).
Anything else we could do?
The text was updated successfully, but these errors were encountered: