The following checklist for preparing a pull request with the UCD changes for an encoding proposal was (mostly) followed for https://github.com/unicode-org/unicodetools/pulls?q=label%3Apipeline-16.0. The plan is for this process to be part of the PAG’s review of encoding proposals going forward.
Prerequisites: proposal posted to L2, SAH agreed to recommend for provisional assignment (or the proposal is already in the pipeline).
- UnicodeData.txt — Prepend lines from proposal
- Commit
- UTC decision — Check counts, code points, names, properties
- SAH report — Check counts, code points, names, properties
- Ken’s UnicodeData draft — Check consistent
If the proposal supplies LineBreak.txt:
- LineBreak.txt — Prepend lines from proposal
- Commit
If the proposal does not supply LineBreak.txt:
- LineBreak.txt — Regenerate [TODO(markus): This should become « invoke Ken’s tool »]
- Update modified lines
- Commit
New scripts only:
- UCD_Names — Check script name
- Scripts.txt — Prepend ranges (carefully mind any gaps)
- Commit
New blocks only:
- ShortBlockNames.txt — Update, keep sorted
- Blocks.txt — Update, keep sorted [TODO(egg): This one wants to be generated…]
- Commit
Joining scripts only:
- ArabicShaping.txt — Merge from proposal, keep sorted
- Commit
Indic scripts only:
- IndicPositionalCategory — Prepend lines from proposal
- IndicSyllabicCategory — Prepend lines from proposal
- Commit
- PropsList.txt — Add Other_Alphabetic, Other_Lowercase, Diacritic, and Extender to satisfy invariants, or to taste
- Commit
- UCD — Regenerate
- Enums — Regenerate
PR preparation:
- If from SAH — Link SAH issue
- If from ESC or CJK — Mention ESC or CJK in the PR description
- When for a UTC decision — Cite in the format UTC-\d\d\d-[MC]\d+ or with a link.
- Whenever there is a Proposal document — Cite L2 number in the format L2/yy-nnn
- data-for-new — Set label
- pipeline-* — Set label to pipeline-recommended-to-UTC if the characters are not yet in the pipeline, and pipeline-provisionally-assigned, or pipeline-
<version>
depending on their status in the Pipeline. - PR button — Set to DRAFT pull request
- unless approved for the upcoming version
- PR button — Press
- The Check UCA data and Check security data invariants CI checks are
suppressed; many character additions need separate handling there,
but that is out of scope for the PAG work of preparing
data-for-new
, so reporting those failures could distract from real issues in the UCD invariants. UCA and security data issues are addressed later in the process, before the start of β review.
- The Check UCA data and Check security data invariants CI checks are
suppressed; many character additions need separate handling there,
but that is out of scope for the PAG work of preparing
There are a variety of setups for unicodetools, depending on OS, in-source vs. out-of-source, git practices, etc. If you take part in UCD development, feel free to add your own.
Ken's files come from here (select appropriate ucd version e.g. ucd160
for Unicode 16.0). NOTE: this check is probably not applicable for pipeline-provisionally-assigned
data where Ken does not yet have a draft.
eggrobin (Windows, in-source; the remote corresponding to unicode-org is called la-vache, Ken’s files are downloaded next to the unicodetools repository).
$latestKenFile = (ls ..\UnicodeData-*.txt | sort LastWriteTime)[-1]
$kenUnicodeData = (Get-Content $latestKenFile)
git diff la-vache/main */UnicodeData.txt |
sls ^\+[0-9A-F] |
% {
$headLine = $_.line.Substring(1)
if (-not $kenUnicodeData.Contains($headLine)) {
$codepoint = $headLine.Split(";")[0];
echo "Mismatch for U+$codepoint";
echo "HEAD : $headLine";
echo "Ken : $($kenUnicodeData.Where({$_.Split(";")[0] -eq $codepoint}))";
}
}
eggrobin (Windows, in-source; the remote corresponding to unicode-org is called la-vache).
git fetch la-vache
git merge la-vache/main
git checkout la-vache/main unicodetools/data/ucd/dev/Derived*;
git checkout la-vache/main unicodetools/data/ucd/dev/extracted/*;
git checkout la-vache/main unicodetools/data/ucd/dev/auxiliary/*;
rm .\Generated\* -recurse -force;
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=.";
cp .\Generated\UCD\16.0.0\* .\unicodetools\data\ucd\dev -recurse -force;
rm unicodetools\data\ucd\dev\zzz-unchanged-*;
rm unicodetools\data\ucd\dev\*\zzz-unchanged-*;
rm .\unicodetools\data\ucd\dev\extra\*;
rm .\unicodetools\data\ucd\dev\cldr\*;
git add ./unicodetools/data
git merge --continue
markusicu (Linux, out-of-source; main tracks unicode-org/main)
git merge main
# complains about merge conflicts as expected
git checkout main unicodetools/data/ucd/dev/Derived*
git checkout main unicodetools/data/ucd/dev/extracted/*
git checkout main unicodetools/data/ucd/dev/auxiliary/*
rm -r ../Generated/BIN/16.0.0.0/
rm -r ../Generated/BIN/UCD_Data16.0.0.bin
mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.UCD.Main" -Dexec.args="version 16.0.0 build MakeUnicodeFiles" -am -pl unicodetools -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -DUVERSION=16.0.0
# fix merge conflicts in unicodetools/src/main/java/org/unicode/text/UCD/UCD_Types.java
# and in UCD_Names.java
# rerun mvn
cp -r ../Generated/UCD/16.0.0/* unicodetools/data/ucd/dev
rm unicodetools/data/ucd/dev/ZZZ-UNCHANGED-*
rm unicodetools/data/ucd/dev/*/ZZZ-UNCHANGED-*
rm unicodetools/data/ucd/dev/extra/*
rm unicodetools/data/ucd/dev/cldr/*
git add unicodetools/src/main/java/org/unicode/text/UCD/UCD_Names.java
git add unicodetools/src/main/java/org/unicode/text/UCD/UCD_Types.java
git add unicodetools/data
git merge --continue
macchiati (IDE)
sync github
run MakeUnicodeFiles.java -c
Cf. unicode-org#636
eggrobin (Windows, in-source).
rm .\Generated\* -recurse -force
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=."
cp .\Generated\UCD\16.0.0\* .\unicodetools\data\ucd\dev -recurse -force
rm unicodetools\data\ucd\dev\zzz-unchanged-*
rm unicodetools\data\ucd\dev\*\zzz-unchanged-*
rm .\unicodetools\data\ucd\dev\extra\*
rm .\unicodetools\data\ucd\dev\cldr\*
git add unicodetools/data/ucd/dev/*
git commit -m "Regenerate UCD"
eggrobin (Windows, in-source).
rm .\Generated\* -recurse -force
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=."
cp .\Generated\UCD\16.0.0\LineBreak.txt .\unicodetools\data\ucd\dev
eggrobin (Windows, in-source).
mvn compile exec:java '-Dexec.mainClass="org.unicode.props.GenerateEnums"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=." -U
mvn spotless:apply
git add *.java
git commit -m GenerateEnums