Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to remove duplicate issues on declarations repository #1115

Merged
merged 8 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

All changes that impact users of this module are documented in this file, in the [Common Changelog](https://common-changelog.org) format with some additional specifications defined in the CONTRIBUTING file. This codebase adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased [minor]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is a no-release considering there is strictly no change in behavior exposed to reusers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that initially, but if we ask our partners to update to the latest version to access this script, they won’t be able to do so if there’s no official release available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, true 😅


> Development of this release was supported by the [French Ministry for Foreign Affairs](https://www.diplomatie.gouv.fr/fr/politique-etrangere-de-la-france/diplomatie-numerique/) through its ministerial [State Startups incubator](https://beta.gouv.fr/startups/open-terms-archive.html) under the aegis of the Ambassador for Digital Affairs.

### Added

- Add script to remove duplicate issues in GitHub reports

## 2.4.0 - 2024-10-24

_Full changeset and discussions: [#1114](https://github.com/OpenTermsArchive/engine/pull/1114)._
Expand Down
35 changes: 35 additions & 0 deletions scripts/reporter/duplicate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Duplicate issues removal script

This script helps remove duplicate issues from a GitHub repository by closing newer duplicate issues.
MattiSG marked this conversation as resolved.
Show resolved Hide resolved

## Prerequisites

1. Set up environment variables:
- Create a `.env` file in the root directory
- Add the GitHub personal access token of the bot that manage issues on your collection with repo permissions:
MattiSG marked this conversation as resolved.
Show resolved Hide resolved
```
OTA_ENGINE_GITHUB_TOKEN=your_github_token
```

2. Configure the target repository in `config/development.json`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really have to be in development? Or just in the environment that will be loaded at config loading time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be in any config file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

```json
{
"@opentermsarchive/engine": {
"reporter": {
"githubIssues": {
"repositories": {
"declarations": "owner/repository"
}
}
}
}
}
```

## Usage

Run the script using:

```
node scripts/reporter/duplicate/index.js
```
77 changes: 77 additions & 0 deletions scripts/reporter/duplicate/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import 'dotenv/config';
import config from 'config';
import { Octokit } from 'octokit';

async function removeDuplicateIssues() {
try {
const repository = config.get('@opentermsarchive/engine.reporter.githubIssues.repositories.declarations');
const [ owner, repo ] = repository.split('/');

if (!repository) {
throw new Error('Repository configuration is not set');
}
Ndpnt marked this conversation as resolved.
Show resolved Hide resolved

const octokit = new Octokit({ auth: process.env.OTA_ENGINE_GITHUB_TOKEN });

console.log(`Getting issues from repository ${repository}…`);

const issues = await octokit.paginate('GET /repos/{owner}/{repo}/issues', {
owner,
repo,
state: 'open',
per_page: 100,
});

const onlyIssues = issues.filter(issue => !issue.pull_request);
const issuesByTitle = new Map();
let counter = 0;

console.log(`Found ${onlyIssues.length} issues`);

for (const issue of onlyIssues) {
if (!issuesByTitle.has(issue.title)) {
issuesByTitle.set(issue.title, [issue]);
} else {
issuesByTitle.get(issue.title).push(issue);
}
}

for (const [ title, duplicateIssues ] of issuesByTitle) {
if (duplicateIssues.length === 1) continue;

const originalIssue = duplicateIssues.reduce((oldest, current) => (new Date(current.created_at) < new Date(oldest.created_at) ? current : oldest));

console.log(`\nFound ${duplicateIssues.length - 1} duplicates for issue #${originalIssue.number} "${title}"`);

for (const issue of duplicateIssues) {
if (issue.number === originalIssue.number) {
continue;
}

await octokit.request('PATCH /repos/{owner}/{repo}/issues/{issue_number}', { /* eslint-disable-line no-await-in-loop */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we await? Couldn't this be done fully asynchronously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I find logs much easier to read when they’re sequential. Since we need to not send our requests in parallel to avoid hitting GitHub’s rate limit, I opted for a setup that maintains clear, readable output

owner,
repo,
issue_number: issue.number,
state: 'closed',
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use state_reason to avoid the following comment request? 🙂

Copy link
Member Author

@Ndpnt Ndpnt Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not what you think:

state_reason string or null
The reason for the state change. Ignored unless state is changed.
Can be one of: completed, not_planned, reopened, null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could still set not_planned then 😉


await octokit.request('POST /repos/{owner}/{repo}/issues/{issue_number}/comments', { /* eslint-disable-line no-await-in-loop */
owner,
repo,
issue_number: issue.number,
body: `Closing duplicate issue. Original issue: #${originalIssue.number}`,
MattiSG marked this conversation as resolved.
Show resolved Hide resolved
});

counter++;
console.log(`Closed issue #${issue.number}: ${issue.html_url}`);
}
}

console.log(`\nDuplicate removal process completed; ${counter} issues closed`);
} catch (error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do a try / catch and exit instead of simply not catching any error, since we call the function from the main event loop anyway? 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of habit. But it can be removed.

console.log(`Failed to remove duplicate issues: ${error.stack}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(`Failed to remove duplicate issues: ${error.stack}`);
console.log(`Failed to remove some duplicate issues: ${error.stack}`);

process.exit(1);
}
}

removeDuplicateIssues();
Loading