-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch twitterbot on shared RECAP PDFs and insert twitter card data #863
Comments
We could probably improve this to show better values for |
imo you should iframe the PDF (perhaps with I would include the Only I don't think "Complaint — Document #1 in NATIONAL VETERANS LEGAL SERVICES PROGRAM v. United States — Brought to you by the RECAP Initiative and Free Law Project, a non-profit dedicated to creating" is very good. Document number first, without a huge waste of characters like "Document." More verbose description? maybe, but, you run out of characters first. I wonder if the date or the author is better, e.g.: "#1 COMPLAINT against All Defendants United States of America filed by NATIONAL VETERANS LEGAL SERVICES PROGRAM, ALLIANCE FOR JUSTICE, NATIONAL CONSUMER LAW CENTER (Gupta, Deepak) in NATIONAL VETERANS LEGAL SERVICES PROGRAM v. United States" vs. "#1, Apr. 21, 2016 by Deepak Gupta: COMPLAINT against All Defendants United States of America filed by NATIONAL VETERANS LEGAL SERVICES PROGRAM, ALLIANCE FOR JUSTICE, NATIONAL CONSUMER LAW CENTER (Gupta, Deepak) in ATIONAL VETERANS LEGAL SERVICES PROGRAM v. United States" or some other variant. I parse to elide parens (Filing fee $ 400 receipt number 0090-4495374), which, well...yeah... What is this limited to, 70 characters? I dunno how much twitter clients actually display. |
The real fun here will be creating thumbnails of the PDFs, which seems like a noble pursuit, so that you can have a snapshot, a lá @big_cases bot. |
Hey, look, I did finish the code I was working on for this: courtlistener/cl/people_db/tasks.py Lines 10 to 40 in 9ee76e4
That's from the financial disclosure project, but it'll be trivial to generalize it. AWESOME. I think the performance should be there to do this on the fly so we don't have to do it for every PDF we get. |
This is pretty much done. I expect some trouble during deployment because there are a lot of weird file system things going on:
But those shouldn't be a big deal really. I implemented this three times, each time refactoring to decrease complexity. Ugh:
Each refactoring wasn't too bad, but it was kind of dumb. Anyway, the implementation is very simple now. Whenever twitter comes crawling, instead of serving the PDF directly, we serve the regular recap document HTML page, which has the Twitter card info, and now has an embedded PDF. Simple. Addressing the thoughts about what to put in the title and the description:
Tests are running now. If they pass, I'll deploy. |
So, when people share PDFs on twitter, we don't have any card data associated with them. This is a missed opportunity.
To address this, we can tweak the code here:
courtlistener/cl/simple_pages/views.py
Lines 336 to 341 in 481e4f9
To watch for the Twitterbot user agent. When it's detected, we provide a useful HTML response page with twitter/facebook card information. When it's not, we serve up the PDF.
I guess the next question is, what info do we want in the card. This is what we show on our HTML pages for documents:
And the
title
variable is defined as:So the result is something like:
The text was updated successfully, but these errors were encountered: