-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: frank-zsy <[email protected]>
- Loading branch information
Showing
6 changed files
with
108 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,26 @@ | ||
# Gitee | ||
|
||
## 数据来源 | ||
|
||
OpenDigger 与 [Gitee](https://gitee.com/) 进行官方合作,在内部长期维护 [GVP](https://gitee.com/gvp) 项目清单,并通过 [Gitee API](https://gitee.com/api/v5/swagger#/getV5ReposOwnerRepoEvents) 对所有 GVP 项目进行历史事件日志采集工作。 | ||
|
||
数据采集、清洗与入库的相关代码目前并未开源至 OpenDigger 项目,作为定时任务每天运行并导入数据到数据库。 | ||
|
||
所有 Gitee 仓库的数据均会导出指标数据,如果您发现您的项目不在[导出列表](../metrics/metrics_usage_guide#导出范围),请在 OpenDigger 仓库中提交 Issue,我们会将您的仓库加入到采集列表,同时也支持直接加入一个组织。 | ||
|
||
## 注意 | ||
|
||
由于 Gitee 的 Issues 与 Pull Request 使用了不同的编号体系,为了兼容 GitHub 的纯数字编号体系,我们对 Gitee 的 Issues 编号做了额外的处理,将其看做是 36 进制数字并转换成 10 进制后进行存储,所以在使用时如果需要恢复 Issues 编号,请将其转换成 36 进制即可。 | ||
|
||
以下是 JavaScript 中进行 10 进制与 36 进制数转换的示例: | ||
|
||
```JavaScript | ||
const rawIssueNumber = 'I1R'; | ||
|
||
// 36 进制转换成 10 进制 | ||
const issueNumber = parseInt(rawIssueNumber, 36); | ||
console.log(issueNumber); // 23391 | ||
|
||
// 10 进制转换成 36 进制 | ||
console.log(issueNumber.toString(36).toUpperCase()); // I1R | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,29 @@ | ||
# GitHub | ||
|
||
## 数据来源 | ||
|
||
OpenDigger 使用 [GHArchive](https://www.gharchive.org/) 作为数据源,采集 GitHub 全域日志数据并使用 [ClickHouse](https://github.com/ClickHouse/ClickHouse) 云服务作为数据服务基础设施。 | ||
|
||
数据采集、清洗与入库的相关代码目前并未开源至 OpenDigger 项目,作为定时任务每小时运行并导入数据到数据库。 | ||
|
||
## 数据缺失 | ||
|
||
由于 GHArchive 可能存在服务不可用的情况,所以 OpenDigger 的 GitHub 数据源存在部分数据缺失的情况,目前缺失的数据时段如下: | ||
|
||
- 2016-10-21-18 | ||
- 2018-10-21-23 | ||
- 2018-10-22-0 ~ 2018-10-22-1 | ||
- 2019-05-08-12 ~ 2019-05-08-13 | ||
- 2019-09-12-8 ~ 2019-09-13-5 | ||
- 2020-03-05-22 | ||
- 2020-06-10-12 ~ 2020-06-10-21 | ||
- 2020-08-21-9 ~ 2020-08-23-15 | ||
- 2020-10-30-17 | ||
- 2021-08-25-17 ~ 2021-08-27-22 | ||
- 2021-09-11-9 | ||
- 2021-10-22-5 ~ 2021-10-22-22 | ||
- 2021-10-23-2 ~ 2021-10-23-22 | ||
- 2021-10-24-3 ~ 2021-10-24-22 | ||
- 2021-10-25-1 ~ 2021-10-25-22 | ||
- 2021-10-26-0 ~ 2021-10-29-17 | ||
- 2023-05-14-19 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
25 changes: 25 additions & 0 deletions
25
i18n/en/docusaurus-plugin-content-docs/current/user_docs/data_sources/gitee.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,26 @@ | ||
# Gitee | ||
|
||
## Data Source | ||
|
||
OpenDigger has an official partnership with [Gitee](https://gitee.com/) and maintains the [GVP](https://gitee.com/gvp) project list internally. OpenDigger collects event logs for all GVP repositories using the [Gitee API](https://gitee.com/api/v5/swagger#/getV5ReposOwnerRepoEvents). | ||
|
||
The code for data collection, cleaning, and database entry is not currently open sourced in the OpenDigger project and runs as a scheduled task daily to import data into the database. | ||
|
||
All data from Gitee repositories will be exported as metric data. If you find that your project is not in the [export list](../metrics/metrics_usage_guide#export-range), please submit an issue in the OpenDigger repository, and we will add your repository to the collection list, you can also directly add an organization. | ||
|
||
## Note | ||
|
||
Since Gitee’s Issues and Pull Requests use different numbering systems, we have made additional adjustments to Gitee’s Issue numbers to be compatible with GitHub's purely numerical system. We treat Gitee's Issue numbers as base-36 numbers and convert them to base-10 for storage. Therefore, if you need to retrieve the Issue numbers, you can convert them back to base-36. | ||
|
||
Below is an example of converting between base-10 and base-36 numbers in JavaScript: | ||
|
||
```JavaScript | ||
const rawIssueNumber = 'I1R'; | ||
|
||
// Convert base-36 to base-10 | ||
const issueNumber = parseInt(rawIssueNumber, 36); | ||
console.log(issueNumber); // 23391 | ||
|
||
// Convert base-10 to base-36 | ||
console.log(issueNumber.toString(36).toUpperCase()); // I1R | ||
``` |
28 changes: 28 additions & 0 deletions
28
i18n/en/docusaurus-plugin-content-docs/current/user_docs/data_sources/github.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,29 @@ | ||
# GitHub | ||
|
||
## Data Source | ||
|
||
OpenDigger uses [GHArchive](https://www.gharchive.org/) as its data source, collecting global event log data from GitHub and using [ClickHouse](https://github.com/ClickHouse/ClickHouse) cloud service as the underlying data infrastructure. | ||
|
||
The code for data collection, cleaning, and database entry is not currently open sourced in the OpenDigger project and runs as a scheduled task every hour to import data into the database. | ||
|
||
## Missing Data | ||
|
||
Due to potential service outages from GHArchive, there is data loss in OpenDigger's GitHub data source. The following time periods currently show missing data: | ||
|
||
- 2016-10-21-18 | ||
- 2018-10-21-23 | ||
- 2018-10-22-0 ~ 2018-10-22-1 | ||
- 2019-05-08-12 ~ 2019-05-08-13 | ||
- 2019-09-12-8 ~ 2019-09-13-5 | ||
- 2020-03-05-22 | ||
- 2020-06-10-12 ~ 2020-06-10-21 | ||
- 2020-08-21-9 ~ 2020-08-23-15 | ||
- 2020-10-30-17 | ||
- 2021-08-25-17 ~ 2021-08-27-22 | ||
- 2021-09-11-9 | ||
- 2021-10-22-5 ~ 2021-10-22-22 | ||
- 2021-10-23-2 ~ 2021-10-23-22 | ||
- 2021-10-24-3 ~ 2021-10-24-22 | ||
- 2021-10-25-1 ~ 2021-10-25-22 | ||
- 2021-10-26-0 ~ 2021-10-29-17 | ||
- 2023-05-14-19 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters