Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate committee video #161

Open
2 of 5 tasks
waldoj opened this issue Dec 18, 2017 · 12 comments
Open
2 of 5 tasks

Incorporate committee video #161

waldoj opened this issue Dec 18, 2017 · 12 comments

Comments

@waldoj
Copy link
Member

waldoj commented Dec 18, 2017

There's now a streaming video interface for committee meetings! For the House, it's completely different than the one for floor video, but for the Senate, it's Granicus. But the good news is that the House vendor has JSON representations of the data. So this view of the next week's scheduled videos also includes this JSON representation. (There's also monthly JSON.)

At this moment, the governor is speaking in Appropriations, and this is the JSON representation:

[
  {
    "Title": "Appropriations",
    "IconUri": null,
    "EntityStatus": 1,
    "EntityStatusDesc": "In Progress",
    "Location": "SCR",
    "Description": "Shared Committee Room, Pocahontas Building",
    "ThumbnailUri": "/00304/Harmony/images/video_live_small.png?timecode=20171218145408",
    "ScheduledStart": "2017-12-18T09:00:17",
    "ScheduledEnd": "2017-12-19T08:55:17",
    "HasArchiveStream": false,
    "ActualStart": "2017-12-18T09:00:17",
    "ActualEnd": null,
    "LastModifiedTime": "2017-12-18T09:32:40",
    "CommitteeId": null,
    "VenueId": null,
    "AssemblyProgress": 0,
    "AssemblyStatus": 0,
    "ForeignKey": "972",
    "Id": 2115,
    "Tag": null
  }
]

Seems to me that there are three things to be done:

  • periodically fetch this weekly representation, store it, and use it on both the bills page ("this bill is part of a hearing happening now—watch live") and perhaps in a site banner ("Appropriations is meeting now—watch live")
  • figure out if there is any method of downloading bulk video
  • set up a stream-capture process
  • retrieve the time-coded agenda information
  • include this video, after the fact, in our video processing pipeline
@waldoj
Copy link
Member Author

waldoj commented Dec 18, 2017

Note that the URL defined in ThumbnailUri just 404s.

@waldoj
Copy link
Member Author

waldoj commented Dec 18, 2017

Also note that this event is scheduled to go for just short of 24 hours, so that range doesn't look real reliable.

@waldoj
Copy link
Member Author

waldoj commented Dec 19, 2017

Uh. It looks like there's no bulk downloads? Just streaming?

@waldoj
Copy link
Member Author

waldoj commented Jan 3, 2018

This is very doable. Here's a helpful script to grab the video—turns out you can just concat all of the MP4s together!

@waldoj
Copy link
Member Author

waldoj commented Jan 3, 2018

I think the necessary process here is to build on the existing infrastructure:

  • build a watch script on rs-machine that retrieves upcoming schedules and stores them in MySQL, in the meetings table (that is, matching those records with the existing schedule data)
  • build another rs-machine script that gets the URL for those videos, when they're done, stores the URL in SQS, and starts the rs-video-processor instance
  • add a script on rs-video-processor that grabs queued videos
  • finally, handle those videos like any other in the ingestion pipeline
  • include the files.id of the resulting video in meetings, to close the loop

@waldoj
Copy link
Member Author

waldoj commented Jan 3, 2018

The URL for a given video must be constructed from the data file—it's http://sg001-harmony.sliq.net/00304/Harmony/en/PowerBrowser/PowerBrowserV2/YYYYMMDD/-1/ID, with YYMMDD and ID available as fields in the data file.

The actual URL to retrieve video from is in the page body, in script tags, e.g.:

var availableStreams = [{"GlobalEssenceFormatId":4,"IsLive":false,"Enabled":true,"AudioOnly":false,"VideoIndex":null,"AudioIndex":null,"StreamFormatId":12,"Url":"http://sg002-livein01.sliq.net/00304-vod/_definst_/2017/12/18/Appropriations_2017-12-18-09.00.00_2115_12.mp4/playlist.m3u8","Lang":"","StreamAssemblerList":null,"PreRoll":0.0,"Duration":9662,"Id":2239,"Tag":"Video"}];

So get the Url value from that, hack off /playlist.m3u8, and iterate from there.

@waldoj
Copy link
Member Author

waldoj commented Jan 5, 2018

I skimmed through this test video, helpfully recorded by legislative staff. Here are the three types of chyrons that I saw:

3
1
2

Large and small, basically—I think the second two are just variations of the same thing. (One has Secretary of Finance written under the caption text.) The large one has text running under the seal. It's a fair bet that the purpose of this test run was to identify those problems, and that they won't be an issue in production.

So, really, just two types of chryons. I think the best test will be to check for the presence of two blue pixels. If the bottom left pixel is blue, then it's a short chyron, and crop accordingly. If it isn't, but if a pixel above and to the left is blue, then it's a large chyron and, again, crop accordingly.

I'm dubious that the format of the top text is established at this point, but it should be pretty easy to extract. Bill number and patron. The bottom text could be useful as a sanity check, in case of an OCR error for the top text, since that's the bill's catch line.

@waldoj
Copy link
Member Author

waldoj commented Jan 8, 2018

Started to support House committee chyrons in 670890b.

@waldoj
Copy link
Member Author

waldoj commented Jan 10, 2018

Huh. Here's a completely different approach to chyron-text placement, from today's test video. (There's no real video just yet.)

chyron

@waldoj
Copy link
Member Author

waldoj commented Jan 15, 2018

Looks like the tick-tock can be grabbed from the page source itself, defined as dataModel.

@waldoj
Copy link
Member Author

waldoj commented Jan 15, 2018

Senate video lives here.

@waldoj
Copy link
Member Author

waldoj commented Jan 15, 2018

Oh, lawd...new chyron styles for the Senate.

senate

The bill text is all stretchy, the chyrons are smaller, and the video is flipped horizontally, for some reason? Ugh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant