Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: deprecate sha256 attribute in fetchers in favor of hash = "<SRI hash>" #325892

Open
1 of 54 tasks
Aleksanaa opened this issue Jul 9, 2024 · 15 comments
Open
1 of 54 tasks
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: architecture Relating to code and API architecture of Nixpkgs

Comments

@Aleksanaa
Copy link
Member

Aleksanaa commented Jul 9, 2024

When we did not support SRI hash, we wrote a lot of sha256 = "...", and some of new PRs are still written with this attribute. However, when using an empty string to obtain the correct hash from the error, the SRI hash is obtained, which causes some confusion.

Let's move on from this old attribute. I don't expect to remove it within a certain period of time, but we can throw a warning to prevent this type of writing from continuing to appear in nixpkgs.

I did an experiment last time with cargoHash in #323983. I wrote the following script for this:

#!/usr/bin/env bash

process_line() {
    local filename=${1%:}
    if [[ $4 =~ \"(.*)\"\; ]]; then
      local sha256="${BASH_REMATCH[1]}"
    fi

    [[ -z $sha256 ]] && return 0

    local hash=$(nix hash to-sri --type sha256 $sha256)

    echo "Processing: $filename"
    echo "  $sha256 => $hash"

    sed -i "s|sha256 = \"$sha256\"|hash = \"$hash\"|" $filename
}

# split output by line
grep -r 'sha256 = ' . | while IFS= read -r line; do
    # split them further by space
    read -r -a parts <<< "$line"
    process_line "${parts[@]}"
done

We can deprecate each fetcher's sha256 separately, instead of the entire hash, to avoid the burden of review:

I collapsed the check list because it's not feasible to deprecate one by one
  • buildBazelPackage buildBazelPackage: support fetchAttrs.hash #342037
  • fetch-scm
  • fetchCrate
  • fetchDebianPatch
  • fetchDockerConfig
  • fetchDockerLayer
  • fetchFirefoxAddon
  • fetchFrom9Front
  • fetchFromBitbucket fetchFromBitBucket: deprecate sha256 attribute #326028
  • fetchFromGitHub
  • fetchFromGitLab
  • fetchFromGitea
  • fetchFromGithub
  • fetchFromGitiles
  • fetchFromRepoOrCz
  • fetchFromSavannah
  • fetchFromSourcehut
  • fetchHex
  • fetchMavenArtifact
  • fetchNextcloudApp
  • fetchNpmDeps
  • fetchNuGet
  • fetchPypi
  • fetchPypiLegacy
  • fetchRepoProject fetchRepoProject: support hash attribute #342031
  • fetchTarball
  • fetchYarnDeps
  • fetchbower
  • fetchbzr
  • fetchcvs
  • fetchdarcs
  • fetchdocker
  • fetchegg
  • fetchfossil
  • fetchgit
  • fetchgitLocal
  • fetchgx
  • fetchhg
  • fetchipfs
  • fetchit
  • fetchmail
  • fetchmail_7
  • fetchmtn
  • fetchpatch
  • fetchpatch2
  • fetchpijul
  • fetchs3
  • fetchsvn
  • fetchsvnrevision
  • fetchsvnssh
  • fetchtorrent (Looks like there is no support for sha256 attribute?)
  • fetchurl
  • fetchutils
  • fetchzip
@Aleksanaa Aleksanaa added 6.topic: architecture Relating to code and API architecture of Nixpkgs 5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems labels Jul 9, 2024
@eclairevoyant
Copy link
Contributor

the SRI hash is obtained, which causes some confusion.

You can still use the SRI sha256 hash in the sha256 attr.
(Not disagreeing with the deprecation though.)

@emilazy
Copy link
Member

emilazy commented Jul 9, 2024

(Taking the opportunity to plug the related lib.fakeHash, which maybe only I use, because it lets you avoid remembering how many As you need.)

@Aleksanaa
Copy link
Member Author

You can still use the SRI sha256 hash in the sha256 attr.

Yes, it's like sha256 = "sha256-...", if after some day we decide to use sha512 by default, it will become sha256 = "sha512-"

@eclairevoyant
Copy link
Contributor

@emilazy I usually just use "" honestly, as Aleksanaa mentioned.

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

And your example bash script, it may work safely for something like cargoHash as you did in #323983, because we know it will only affect the rust builder/fetcher. But sometimes people use custom builders and scripts (especially update scripts) that may break with this. I will try to come up with a more robust script that only touches one fetcher at a time.

@Aleksanaa
Copy link
Member Author

Anyway, discussing strategy: should we remove the usages first for a given fetcher, before we add the warnings?

Yes, otherwise ofBorg will fail?

I will try to come up with a more robust script that only touches one fetcher at a time.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do and it turned out well. But update script is indeed a problem and it is not within the scope of our scan.

@emilazy
Copy link
Member

emilazy commented Jul 9, 2024

(Thank you, I had no idea "" works! I could swear it didn’t in the past. Sorry for the tangent.)

@eclairevoyant
Copy link
Contributor

Yes, otherwise ofBorg will fail?

I didn't know borgo fails on warnings. But anyway I meant, keeping the 0-rebuild treewide in a separate PR would make it easier to avoid merge conflicts and mixing of logical changes.

I'm adding one more grep like grep -rl "fetchFromBitbucket" . | xargs grep -r 'sha256 = ' | while IFS= read -r line; do

Some packages have multiple fetcher or people doing some custom constructs in the same file, which is why I'll try to avoid grep in this way if possible.

@pyrox0
Copy link
Member

pyrox0 commented Jul 12, 2024

There are many fetchers that are based on others(fetchFromOrCz is based on fetchzip, fetchFrom9Front is based on either fetchurl or fetchgit, etc). Will the sha256 attribute have to be deprecated in those fetchers first, or in the underlying ones?

@Aleksanaa
Copy link
Member Author

So I wrote this script:

my script
#!/usr/bin/env bash

EDITED_LOG=./edited-log
PROBLEM_LOG=./problem-log
SEARCH_CACHE=./search-cache

mkfifo nix_repl_in
mkfifo nix_repl_out

trap "rm -f nix_repl_in nix_repl_out" EXIT

# if replacing more files we can remove 2>/dev/null
nix repl < nix_repl_in > nix_repl_out 2>/dev/null &
NIX_REPL_PID=$!

sleep 2

exec 13>nix_repl_in
exec 14< nix_repl_out

extract_output() {
  output=$(echo $1 | ansi2txt)
  if [[ $output == "\"\"" ]]; then
    return 1
  elif [[ $output =~ ^\"([^\"]+)\"$ ]]; then
    echo "${BASH_REMATCH[1]}"
  elif [[ $output == "null" ]]; then
    echo "null"
  else
    return 1
  fi
}

send_repl() {
  local cmd=$1
  echo "$cmd" >&13
  echo "$cmd" > ./repl_in
  while IFS= read -r -d $'\n' line <&14; do
    if [[ -z "$line" ]]; then
      return 1
    else
      echo "$(extract_output $line)"
      echo "$line" > ./repl_out
    fi
  done
}

replace_hash() {
  local attr_name=$1
  
  local hashAlgo=$(send_repl "${attr_name}.src.outputHashAlgo") || return 1
  # Only to limit scope here, can be adjusted
  local hashHomepage=$(send_repl "${attr_name}.src.meta.homepage") || return 1

  if [[ "$hashAlgo" == "sha256" ]] && [[ "$hashHomepage" == "https://gitlab.com/"* ]]; then
    local hash=$(send_repl "${attr_name}.src.outputHash") || return 1
    # echo "$attr_name hash"
    if [[ "$hash" == "sha256"* ]]; then
      local position=$(send_repl "${attr_name}.meta.position") || return 1

      [[ $position == "null" ]] && return 1
      
      local position_file=${position%%:*}

      grep -q "sha256 = \"$hash\"" $position_file
      if [ $? -eq 0 ]; then
        local outPath=$(send_repl "${attr_name}.outPath") || return 1
        echo "editing: ${attr_name}:${position_file}"
        sed -i "s|sha256 = \"$hash\"|hash = \"$hash\"|" $position_file
        echo "${attr_name}:${position_file}:${outPath}" >> $EDITED_LOG
      fi
    fi
  fi
}

verify_hash() {
  local attr_name=$1
  local position_file=$2
  local outPath=$3

  local new_outPath=$(send_repl "${attr_name}.outPath")
  if [ $? -ne 0 ] || [[ "$outPath" != "$new_outPath" ]]; then
    echo "problem: ${attr_name}:${position_file}"
    # echo "'$outPath' != '$new_outPath'"
    echo "${attr_name}:${position_file}:${outPath}" >> $PROBLEM_LOG
  fi
}

[[ -f "$SEARCH_CACHE" ]] || nix search . ^ --json > $SEARCH_CACHE

pkg_list=($(cat $SEARCH_CACHE | jq --raw-output 'keys_unsorted | @sh'))

send_repl ":l ./."

sleep 1

echo "starting to replace"

for pkg_raw in ${pkg_list[@]}; do
  pkg=${pkg_raw#*.*.}
  pkg=${pkg%\'}
  replace_hash $pkg
done

send_repl ":r"

sleep 1

echo "starting to examine"

while IFS= read -r line; do
  IFS=':' read -r -a parts <<< "$line"
  verify_hash ${parts[@]}
done < "$EDITED_LOG"

It depends on jq and ansi2txt. Currently, because it is single-threaded logic, a treewide operation takes about 10 minutes. Feel free to modify it according to your ideas.

@pyrox0
Copy link
Member

pyrox0 commented Jul 13, 2024

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

@Aleksanaa
Copy link
Member Author

I tried GitHub (only editing sha256 = "sha256-), and there are 3000+ direct editions, including 1000+ objections (we can modify the check to revert changes directly, using sed or git, depending on the speed). We can still edit 2000+ files painlessly after all. Might be related to #301014

@Aleksanaa
Copy link
Member Author

Aleksanaa commented Jul 13, 2024

Just noticed in the list, but fetchit is not a fetcher, it's a podman tool. Should probably be removed.

I've given up that list, because we cannot actually determine which fetcher it is (except for those that only serve specific sites). And since they are derivatives of fetchurl or fetchzip, it is not convenient to deprecate hash for a single fetcher

@nbraud
Copy link
Contributor

nbraud commented Sep 15, 2024

You can still use the SRI sha256 hash in the sha256 attr.

Yes, it's like sha256 = "sha256-...", if after some day we decide to use sha512 by default, it will become sha256 = "sha512-"

Would that even work?

pkgs.stdenv.mkDerivation {
    name = "hash-algo-mismatch";
    outputHash = "sha512-z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg/SpIdNs6c5H0NE8XYXysP+DGNKHfuwvY7kxvUdBeoGlODJ6+SfaPg==";
    outputHashAlgo = "sha256";
}

errors out in builtins.derivationStrict with

error: hash 'sha512-z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg/SpIdNs6c5H0NE8XYXysP+DGNKHfuwvY7kxvUdBeoGlODJ6+SfaPg==' should have type 'sha256'

@nbraud
Copy link
Contributor

nbraud commented Sep 15, 2024

There are many fetchers that are based on others(fetchFromOrCz is based on fetchzip, fetchFrom9Front is based on either fetchurl or fetchgit, etc). Will the sha256 attribute have to be deprecated in those fetchers first, or in the underlying ones?

I wrote some normalization helpers in #342072 which convert { hash, sha256, sha512, ... } into { outputHash, outputHashAlgo }. In that approach, the underlying fetchers (fetchurl, fetchgit, etc.) need to be updated first.

PS: To be clear, the prefered way for callers to specify the hash would still be the hash attribute, in SRI format; { outputHash, outputHashAlgo } is only used internally to represent the potentially-legacy arguments in a uniform way.

@nbraud
Copy link
Contributor

nbraud commented Sep 17, 2024

#342072 could probably use another reviewer, now that it's ~finalized, before it's merged and I go wrap almost-all fetchers with it.

Jayman2000 pushed a commit to Jayman2000/nixpkgs-pr that referenced this issue Dec 26, 2024
It looks like sha256 will eventually be deprecated in favor of hash:
<NixOS#325892>.
Jayman2000 pushed a commit to Jayman2000/nixpkgs-pr that referenced this issue Dec 26, 2024
It looks like sha256 will eventually be deprecated in favor of hash:
<NixOS#325892>.
xokdvium added a commit to xokdvium/nixpkgs that referenced this issue Dec 26, 2024
It looks like sha256 will eventually be deprecated in favor of hash:
<NixOS#325892>.
xokdvium added a commit to xokdvium/nixpkgs that referenced this issue Dec 26, 2024
It looks like sha256 will eventually be deprecated in favor of hash:
<NixOS#325892>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: architecture Relating to code and API architecture of Nixpkgs
Projects
None yet
Development

No branches or pull requests

5 participants