Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch match organization names to ISIL #55

Closed
acka47 opened this issue Jul 15, 2015 · 6 comments
Closed

Batch match organization names to ISIL #55

acka47 opened this issue Jul 15, 2015 · 6 comments
Assignees

Comments

@acka47
Copy link
Contributor

acka47 commented Jul 15, 2015

@hauschke has a list of ~1100 organizations he wants to get the ISIL for. He would like to do this with open refine but we don't offer an open refine reconciliation service (see also lobid/lodmill#168). I guess , the API already supports this use case if you have the needed programming skills.

We will either have to provide a reconciliation service or create an example script that lets @hauschke execute the matching. He would provide us with a csv list of the names for this...

@acka47
Copy link
Contributor Author

acka47 commented Jul 22, 2015

@hauschke Sent a csv with library names and postalc code and – if existing – DBS ID. You can find it here: https://gist.github.com/acka47/9bdc24359fe811e90026

@acka47
Copy link
Contributor Author

acka47 commented Aug 6, 2015

@fsteeg: Do you want to take this one over?

@fsteeg
Copy link
Member

fsteeg commented Aug 11, 2015

Deployed a first take on a reconciliation service for lobid-organisations:

http://beta.lobid.org/organisations/reconcile

I have not worked with OpenRefine before, so I'm not sure this all makes sense, but here is what I did to test the service (in OpenRefine v2.6-rc1):

  • Open bib_ohne_sigel.csv (from https://gist.github.com/acka47/9bdc24359fe811e90026) -> Next -> Create Project
  • On the bibliothek column, drop down menu -> Reconcile -> Start Reconciling
  • Add Standard Service -> paste http://beta.lobid.org/organisations/reconcile -> Add Service
  • Click new lobid-organisations entry to close pane on the left
  • Check dbs-id, As Property: a (arbitrary, but needs to be set)
  • Check plz, As Property: b (arbitrary, but needs to be set)
  • Click Start Reconciling
  • For each bibliothek cell OpenRefine now lists the suggested candidates
  • Click on the candidates to view their lobid-organisations JSON content
  • Deselect the lowest scoring suggestions with the slider on the left (I selected 0.17 - 2.59, resulting in 1015 rows of 1027 total)
  • On the bibliothek column drop down menu -> Edit column -> Add column based on this column
  • New column name: hbz-id (entries with no Sigel get DBS-<dbs-id> ids in lobid-organisations)
  • Expression: cell.recon.best.id -> OK
  • On the left, select all again (to get back the rows where we did not add the id)
  • In the upper right, Export -> Comma-separated values

The exported CSV file contains all original rows, with 1015 of 1027 now containing a hbz-id. I've uploaded it here: https://gist.github.com/fsteeg/df41a245b9ee404ef036

@hauschke Is this usable for your use case? Anything missing or wrong?

@hauschke
Copy link

Thank you very much, it works like a charm! 😄

@acka47
Copy link
Contributor Author

acka47 commented Aug 24, 2015

As @hauschke is satisfied, a +1 from me. I haven't tried it myself, though.

@acka47 acka47 added deploy and removed review labels Aug 24, 2015
@acka47 acka47 assigned fsteeg and unassigned acka47 Aug 24, 2015
@fsteeg
Copy link
Member

fsteeg commented Aug 24, 2015

Great, happy it works for you @hauschke. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants