Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Pagination #26

Open
Swader opened this issue Sep 20, 2015 · 0 comments
Open

Custom Pagination #26

Swader opened this issue Sep 20, 2015 · 0 comments
Assignees

Comments

@Swader
Copy link
Owner

Swader commented Sep 20, 2015

The pagination side of Diffbot is buggy at best. It will often fail to recognize articles that are multi-page and will not merge them. What's more, it tops out at 20 pages, so anything longer will get ignored.

The feature suggestion for the client is as follows:

Add a new method to the Article API: paginateBy. This method takes 2 arguments: $identifier and $maxPages. The former is a way to identify the nextPage link element on the page. This element would auto-processed to find out all the next pages programmatically. The latter is the max number of pages to concat.

This method would, in order:

  1. Make an Article API request to the original URL.
  2. Find the nextPage element and process it to find out the pattern to which to attach incrementing numbers, thus generating next pages.
  3. Make an additional Article API request to each page, up to $maxPages number of pages
  4. Concatenate the HTML content of all pages.
  5. Send the merged HTML content as a POST request to the Article API, for a final analysis of the entire post.

Alternatively, in order to save Article API requests and use up only one, the client could just Guzzle the raw HTML of all the articles, extract the content HTML, merge that and send it as POST. This, however, is less reliable, as Diffbot is much better at figuring out what is content on the page, and what isn't (headers, ads, comments, etc.).

Maybe make it a switch of some kind, and additional setter?

@Swader Swader self-assigned this Nov 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant