Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"USERS (Controlled-Access Data)"の要素と値が不一致 #2

Open
tfuji opened this issue Aug 29, 2024 · 2 comments
Open

"USERS (Controlled-Access Data)"の要素と値が不一致 #2

tfuji opened this issue Aug 29, 2024 · 2 comments

Comments

@tfuji
Copy link
Collaborator

tfuji commented Aug 29, 2024

json_from_joomla/humandb_20231223_both.json
"Period of Data Use"に"Data in Use (Dataset ID)" の値が含まれている

      "USERS (Controlled-Access Data)": {
        "Mark Daly": {
          "Principal Investigator": "Mark Daly",
          "Affiliation": "Broad Institute of MIT and Harvard",
          "Research Title": "",
          "Data in Use (Dataset ID)": "",
          "Period of Data Use": "JGAD000101, JGAD000102, JGAD000123, JGAD000124, JGAD000144-JGAD000201, JGAD000220"
        },
@tfuji
Copy link
Collaborator Author

tfuji commented Aug 31, 2024

@mitsuhashi @skwsm
以下で全件の確認ができます。修正よろしくお願いします。
https://github.com/dbcls/humandbs/tree/dev?tab=readme-ov-file#users-controlled-access-data

@mitsuhashi
Copy link
Collaborator

@tfuji @skwsm お疲れ様です。

スクレイピング結果のJSONを見ると、スクリプトがCountry/Region列を想定していないように見えます。その右側では列名と値の対応がずれているようです。

dbcls3284:json_from_joomla mitsuhashi$ git branch
  import_json
* import_json_skwsm
  main
dbcls3284:json_from_joomla mitsuhashi$ grep 'Country' humandb_20231223_both.json  | head -10

Joomla!のhtmlを確認しましたが、列名と値の対応に問題はないと思います。

hum00355-v1

https://humandbs.dbcls.jp/en/hum0355-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Maher Eamonn</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">University of Cambridge</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">United Kingdom of Great Britain and Northern Ireland</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Molecular Pathology of Human Genetic Disease</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000663</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2023/03/19-2024/07/20</span></td>
</tr>
</tbody>
</table>

hum00327-v1

https://humandbs.dbcls.jp/en/hum0327-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Michiaki Hamada</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Faculty of Science and Engineering, Waseda University</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Japan</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Construction of RNA-targeted Drug Discovery Database</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000624</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2022/12/26-2025/03/31</span></td>
</tr>
</tbody>
</table> 

hum00320-v1

https://humandbs.dbcls.jp/en/hum0320-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Ansuman Satpathy</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Department of Pathology, Stanford University</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">United States of America</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Epigenetics of Inflammatory Skin Disorders</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000597</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2022/07/04-2023/05/31</span></td>
</tr>
</tbody>
</table>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants