You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This script is wasting a lot of time scraping pages to see if we already have files.
When we get to the initial page, we don't have all the information for our filenames, which will have a file extension.
But we should have enough to get most of the filename. And if we parse those less the file extension, we can look for conflicts. If there are conflicts, we need to traverse all the subpages. Otherwise, we need to hit the subpages for which we do not have matching files.
This would let us check for new documents, in most cases, with a single hit to the site, making this a better maintenance task than a one-off task.
The text was updated successfully, but these errors were encountered:
This script is wasting a lot of time scraping pages to see if we already have files.
When we get to the initial page, we don't have all the information for our filenames, which will have a file extension.
But we should have enough to get most of the filename. And if we parse those less the file extension, we can look for conflicts. If there are conflicts, we need to traverse all the subpages. Otherwise, we need to hit the subpages for which we do not have matching files.
This would let us check for new documents, in most cases, with a single hit to the site, making this a better maintenance task than a one-off task.
The text was updated successfully, but these errors were encountered: