As Sacha Willems posted: "Shrinking a git(hub) repository isn’t just about deleting locally present files but requires cleaning up the history as files that have been removed are still present in the repository’s history and therefore still contribute to it’s size."
With the GitHub action Branch-Pruner, you can easily reduce the size of one/multiple GitHub repo(s) by manually and/or automatically truncating the old commit history of one/multiple selected branch(es). This means that you can delete all commits with previous and unused file versions up to an arbitrarily selected point in your Git history without losing newer commits with newer file versions of a selected branch tree.
Normally YOU SHOULD NEVER DO THIS and there are huge drawbacks. However, in some cases it is really useful to get rid of the old stuff on a regular basis. E. g., if your repository size is growing continuously and you only ever need the latest commit history. Or when you encounter problems of a general slowness with Git commands like push
and pull.
Then it's time for the Branch-Pruner. It will speed you up again 😉.
I, Sitdisch, created the Branch-Pruner because I needed a GitHub action that would periodically auto-crop my repo size, and there was no action out there before. My solution approach is based on this blog post by Thomas Sutton and this blog post by Alin Ruscior. Thanks to both.
P.S. my Branch-Pruner Gif based on the Git Logo by Jason Long [License: CC BY 3.0] and the scissor icon from the googlefonts/noto-emoji repository [License: Apache-2.0].
The Branch-Pruner rewrites the entire commit history of the branch being pruned. The new history takes the branch-tree of the selected NEW-FIRST-COMMIT.
That means all subsequent commits have the old order and be authored by the original sources.
But the Drawbacks are:
- the files are marked as created in the
NEW-FIRST-COMMIT
- all commits have new time stamps and commit-hashes
- all commits are committed by the selected
User
(default:github-actions[bot]
) - all forks and other branches have nothing to compare with the pruned branch anymore
- cuts can't be undone.
Oh, you're still here then let's do it. First, choose a workflow file:
Truncates the old commit history of the current main branch with minimal settings.
Set it up (click to toggle)
1. add the branch-pruner-easy.yml workflow file to a repository
it has to be the target repository where you want to prune the main branch (this is not the case with the other workflow files)
the path has to be
.github/workflows/branch-pruner-easy.yml
2. create a new encrypted repository secret
give the secret a name e. g.
BRANCH_PRUNER_TOKEN
the value of the secret must be the value of the Personal Access Token (PAT) for the repository where you want to prune the main branch
procedure for creating a PAT (fine-grained) or a PAT (classic)
select only the minimum scopes and permissions required
PAT (fine-grained): repository permissions
contents => access: read and write
metadata => access: read-only
PAT (classic): e. g. repo and workflow
CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again
add the secret to the same repository where you added this workflow file
3. adapt your branch-pruner-easy.yml file
3.1 for manual triggers
you don't have to adjust anything in the workflow file; just use it
3.2 for all other triggers
adapt this section:
############################################################## # DEFINE YOUR INPUTS AND TRIGGERS IN THE FOLLOWING ############################################################## # INPUTS as environmental variables (env) env: NEW_FIRST_COMMIT: # e.g. commit-hash or HEAD~N etc. TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN' # TRIGGERS on: # push: # schedule: # - cron: '00 23 28 * *'CONSIDER:
INPUTS:
you only have to define
NEW_FIRST_COMMIT
andTOKEN_NAME
;
NEW-FIRST-COMMIT
: choose it carefully; E. g.,HEAD~N
is really useful for autonomously truncating commits on a regular basis. However, know what you are doing.HEAD~N
orHEAD^N
may be not the commits you're targeting. For more information aboutHEAD~N
andHEAD^N
look e. g. here.
TOKEN_NAME
: never enter the actual value of the personal access tokenTRIGGERS:
schedule
:
e. g.
cron: '00 23 28 * *'
executes the Branch-Pruner every 28th day of a month at 23:00you can check your inputs here
hidden defaults (changeable with the other workflow files):
target repository & branch: repository with this workflow file and main branch
user settings:
user who commit: github-actions[bot]
user e-mail address: 41898282+github-actions[bot]@users.noreply.github.com
That's it. Happy pruning.
Truncates the old commit history of a selected target branch.
Set it up (click to toggle)
1. add the branch-pruner-default.yml workflow file to a repository
it doesn't have to be the repository you want to prune; e. g., you can simply fork the
myactionway/branch-pruner-workflows
repository
- CONSIDER: with a forked repository, you need to confirm that you want to use a workflow before you can actually use it (repo menu > actions tab > push the button)
the path has to be
.github/workflows/branch-pruner-default.yml
2. create a new encrypted repository secret
give the secret a name e. g.
BRANCH_PRUNER_TOKEN
the value of the secret must be the value of the Personal Access Token (PAT) for the repository where you want to prune a branch
procedure for creating a PAT (fine-grained) or a PAT (classic)
select only the minimum scopes and permissions required
PAT (fine-grained): repository permissions
contents => access: read and write
metadata => access: read-only
PAT (classic): e. g. repo and workflow
CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again
add the secret to the same repository where you added this workflow file
3. adapt your branch-pruner-default.yml file
3.1 for manual triggers
you don't have to adjust anything in the workflow file; just use it
3.2 for all other triggers
adapt this section:
############################################################## # DEFINE YOUR INPUTS AND TRIGGERS IN THE FOLLOWING ############################################################## # INPUTS as environmental variables (env) env: NEW_FIRST_COMMIT: # e.g. commit-hash or HEAD~N etc. TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN' REPOSITORY: # target repository e.g. 'dummy/mytargetrepo' BRANCH: # branch to be pruned e.g 'main' USER_NAME: # user who should commit e.g. 'dummy' USER_EMAIL: # e.g. '[email protected]' # TRIGGERS on: # push: # schedule: # - cron: '00 23 28 * *'CONSIDER:
INPUTS:
you only have to define
NEW_FIRST_COMMIT
andTOKEN_NAME
; if any other input is blank, one of these default values will be used insteadDEFAULT_REPOSITORY: ${{ github.repository }} # repo with this file DEFAULT_BRANCH: 'main' DEFAULT_USER_NAME: 'github-actions[bot]' DEFAULT_USER_EMAIL: '41898282+github-actions[bot]@users.noreply.github.com'
NEW-FIRST-COMMIT
: choose it carefully; E. g.,HEAD~N
is really useful for autonomously truncating commits on a regular basis. However, know what you are doing.HEAD~N
orHEAD^N
may be not the commits you're targeting. For more information aboutHEAD~N
andHEAD^N
look e. g. here.
TOKEN_NAME
: never enter the actual value of the personal access tokenTRIGGERS:
schedule
:
e. g.
cron: '00 23 28 * *'
executes the Branch-Pruner every 28th day of a month at 23:00you can check your inputs here
That's it. Happy pruning.
Truncates the old commit history of multiple selected target branches.
Set it up (click to toggle)
1. add the branch-pruner-advanced.yml workflow file to a repository
it doesn't have to be a repository where you want to prune branches; e. g., you can simply fork the
myactionway/branch-pruner-workflows
repository
- CONSIDER: with a forked repository, you need to confirm that you want to use a workflow before you can actually use it (repo menu > actions tab > push the button)
the path has to be
.github/workflows/branch-pruner-advanced.yml
2. create new encrypted repository secrets
give the secrets names e. g.
BRANCH_PRUNER_TOKEN_1
andBRANCH_PRUNER_TOKEN_2
the values of the secrets must be the values of the Personal Access Tokens (PAT) for the repositories where you want to prune branches
procedure for creating a PAT (fine-grained) or a PAT (classic)
select only the minimum scopes and permissions required
PAT (fine-grained): repository permissions
contents => access: read and write
metadata => access: read-only
PAT (classic): e. g. repo and workflow
CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again
add the secrets to the same repository where you added this workflow file
3. adapt your branch-pruner-advanced.yml file
3.1 define your defaults
adapt this section:
############################################################## # DEFINE YOUR DEFAULTS (INPUTS & TRIGGERS) IN THE FOLLOWING ############################################################## # INPUTS as environmental variables (env) env: TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN_1' REPOSITORY: # target repository e.g. 'dummy/mytargetrepo_1' USER_NAME: # user who should commit e.g. 'dummy' USER_EMAIL: # e.g. '[email protected]' # TRIGGERS on: # push: # schedule: # - cron: '00 23 28 * *' workflow_dispatch:CONSIDER:
INPUTS:
TOKEN_NAME
: never enter the actual value of the personal access tokenall inputs except
TOKEN_NAME
have predefined values; you can, but you don't have to overwrite them# Predefined values REPOSITORY: ${{ github.repository }} # repo with this file USER_NAME: 'github-actions[bot]' USER_EMAIL: '41898282+github-actions[bot]@users.noreply.github.com'TRIGGERS:
schedule
:
e. g.
cron: '00 23 28 * *'
executes the Branch-Pruner every 28th day of a month at 23:00you can check your inputs here
workflow_dispatch
:
no predefined inputs; the
env
defined in this workflow file are used instead when this trigger is triggeredprocedure for manually running a workflow using the GitHub CLI
procedure for manually running a workflow using the REST API
3.2 define your settings for the different target branches
adapt this section:
############################################################## # FIRST TARGET BRANCH | DEFINE YOUR ENV IN THE FOLLOWING ############################################################## - NAME: 'Pruning Branch 1' NEW_FIRST_COMMIT: 'HEAD~40' BRANCH: 'main' # TOKEN_NAME: # REPOSITORY: # USER_NAME: # USER_EMAIL: ############################################################## # SECOND TARGET BRANCH | DEFINE YOUR ENV IN THE FOLLOWING ############################################################## - NAME: 'Pruning Branch 2' NEW_FIRST_COMMIT: 'HEAD^20' BRANCH: 'dev' # TOKEN_NAME: # e.g. 'BRANCH_PRUNER_TOKEN_2' # REPOSITORY: # e.g. 'dummy/mytargetrepo_2' # USER_NAME: # USER_EMAIL: ############################################################## # THIRD TARGET BRANCH | FEEL FREE TO ADD MORE TARGET BRANCHES # ...CONSIDER:
you just have to define
NAME
,NEW_FIRST_COMMIT
andBRANCH
for each target branch; if you do not define any of the other inputs, your predefined defaults will be used insteadonly a maximum of 256 target branches per workflow run is possible [GitHub restriction]
That's it. Happy pruning.
Warning: If you use your own workflow file, it is highly recommended to set a time limit for the job execution (GitHub's default: 6 hours); default in the proposed workflow files
timeout-minutes: 8
The error "fatal: refusing to merge unrelated histories" occurs when you pull the pruned branch back to your local machine:
possible solution [source]:
git fetch --all
git reset --hard origin/<PRUNED_BRANCH>
(replace<PRUNED_BRANCH>
)
"Error: fatal: could not read Username for 'https://github.com': terminal prompts disabled":
your personal access token may has expired and you need to set a new one as the value of the encrypted repository secret; that means back to the setup section
more information about this GitHub action checkout issue can be found e. g. here
"remote: Permission to ... denied to ... fatal: unable to access 'https://github.com/...': The requested URL returned error: 403":
- your personal access token used does not have the minimum scopes/permissions required to prune a branch in your target repository
You get a failed job because it exceeded the maximum execution time:
increase
timeout-minutes
in your workflow file (default in the proposed workflow files = 8min)if that doesn't help, it could be a general issue with GitHub Actions
The workflow logs do not provide enough detail to diagnose why a workflow, job, or step is not working as expected:
- enable addition debug logging
You are experiencing strange behavior from GitHub actions:
- maybe it's a general incident [status check]
Your workflow trigger schedule doesn't fire:
in my experience, a workflow file with this trigger must be placed in the default branch
in this chat Brightran said: "... The workaround is to push something to trigger them. ..." and Hless said: "... It appears to me that it takes while before schedules actions run at all in a new repo". In my experience, they are right.
For my Website-Boilerplates, I use the Lighthouse-Badger 🦡 🗼 🎖️ to update automatically my Lighthouse badges and reports once a week. Meanwhile, my repository size continues to grow.
To counter this, I use the Branch-Pruner once a month. That way, I have the repo size under control and also the ability to see the latest history of my badges and reports without the really old stuff.
The use of protected brand names, trade names, utility models and brand logos on this website does not constitute an infringement of copyright; rather, it serves as an illustrative note. Even if this is not marked as such at the respective points, the corresponding legal provisions always apply.
The brand names and logos used are the property of their respective owners and are subject to their copyright provisions.
This offer is in no way related to the legal entities of the protected brand names and logos used.
This README contains links to external third-party websites. The README operator has no influence on the content of these sites. Therefore, he cannot assume any liability. Instead, the respective provider is always responsible for the content.
The linked pages were checked for possible legal violations at the time of linking and illegal content wasn't discernible. A permanent control of the linked pages is unreasonable without concrete evidence of an infringement. However, if the README operator becomes aware of such a violation, he will act immediately.
- Shields.io [License: CC0 1.0]