Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: All-in-one unicode-data #93

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions .github/workflows/haskell.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,14 @@ jobs:
ghc_version: 9.6.0.20230128
ghcup_release_channel: "https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-prereleases-0.0.7.yaml"
runner: ubuntu-latest
cabal_version: 3.8.1.0
cabal_version: 3.9.0.0

# [TODO] Use latest cabal (pre-)release
- name: head
ghc_version: head
# ghcup_release_channel: "https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-prereleases-0.0.7.yaml"
ghcup_release_channel: "https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-prereleases-0.0.7.yaml"
runner: ubuntu-latest
# cabal_version: 3.9.0.0
cabal_version: latest
# [WARNING] Ensure to use latest cabal (pre-)release
cabal_version: 3.9.0.0

- name: hlint
pack_options: HLINT_OPTIONS="lint" HLINT_TARGETS="lib exe"
Expand All @@ -148,6 +147,7 @@ jobs:
ghcup-release-channel: ${{ matrix.ghcup_release_channel }}
cabal-version: ${{ matrix.cabal_version }}

# [TODO] Use haskell/actions/setup when it supports reliably GHC head.
# Adapted from https://github.com/composewell/streamly/blob/master/.github/workflows/haskell.yml
- name: Install GHC head environment
if: ${{ matrix.ghc_version == 'head' }}
Expand All @@ -164,6 +164,8 @@ jobs:
GHCUP_VER=0.1.19.0
$CURL -sL -o ./ghcup https://downloads.haskell.org/~ghcup/$GHCUP_VER/${GHCUP_ARCH}-ghcup-$GHCUP_VER
chmod +x ./ghcup
# Set ghcup pre-release
./ghcup config add-release-channel ${{ matrix.ghcup_release_channel }}
# Install GHC head
# The URL may change, to find a working URL go to https://gitlab.haskell.org/ghc/ghc/-/jobs/
# Find a debian10 job, click on a passed/failed job, at the
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/s390x.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,24 @@ jobs:
ghc --make \
-XMagicHash -XBangPatterns -XUnboxedTuples -XScopedTypeVariables \
-XLambdaCase -XBlockArguments -XTupleSections \
-iunicode-data/test:unicode-data/lib \
-o core-test unicode-data/test/Main.hs
-iunicode-data-core/test:unicode-data-core/lib \
-o core-test unicode-data-core/test/Main.hs
./core-test
ghc --make \
-XMagicHash -XBangPatterns -XUnboxedTuples -XScopedTypeVariables \
-XLambdaCase -XBlockArguments -XTupleSections \
-iunicode-data-names/test:unicode-data-names/lib:unicode-data/lib \
-iunicode-data-names/test:unicode-data-names/lib:unicode-data-core/lib \
-o names-test unicode-data-names/test/Main.hs
./names-test
ghc --make \
-XMagicHash -XBangPatterns -XUnboxedTuples -XScopedTypeVariables \
-XLambdaCase -XBlockArguments -XTupleSections \
-iunicode-data-scripts/test:unicode-data-scripts/lib:unicode-data/lib \
-iunicode-data-scripts/test:unicode-data-scripts/lib:unicode-data-core/lib \
-o scripts-test unicode-data-scripts/test/Main.hs
./scripts-test
ghc --make \
-XMagicHash -XBangPatterns -XUnboxedTuples -XScopedTypeVariables \
-XLambdaCase -XBlockArguments -XTupleSections \
-iunicode-data-security/test:unicode-data-security/lib:unicode-data/lib \
-iunicode-data-security/test:unicode-data-security/lib:unicode-data-core/lib \
-o security-test unicode-data-security/test/Main.hs
./security-test
23 changes: 17 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
This repository provides packages to use the
[Unicode character database](https://www.unicode.org/ucd/) (UCD):

- [`unicode-data`](#unicode-data) for general character properties.
- [`unicode-data`](#unicode-data) is an all-in-one package re-exporting the
following ones.
- [`unicode-data-core`](#unicode-data-core) for general character properties.
- [`unicode-data-names`](#unicode-data-names) for characters names and aliases.
- [`unicode-data-scripts`](#unicode-data-scripts) for characters scripts.
- [`unicode-data-security`](#unicode-data-security) for security mechanisms.
Expand All @@ -16,12 +18,21 @@ The latest Unicode version supported by these libraries is

### `unicode-data`

[`unicode-data`](unicode-data#readme) provides Haskell APIs to efficiently
[`unicode-data`](unicode-data#readme) is an _all-in-one_ package that re-exports
all the `unicode-data-*` package familly.

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data)
for reference documentation.

### `unicode-data-core`

[`unicode-data-core`](unicode-data-core#readme) provides Haskell APIs to efficiently
access the Unicode character database.
Performance is the primary goal in the design of this package.

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data)
[Haddock documentation](https://hackage.haskell.org/package/unicode-data-core)
for reference documentation.

### `unicode-data-names`
Expand Down Expand Up @@ -56,17 +67,17 @@ for reference documentation.

## Performance

`unicode-data` is up to [_5 times faster_](unicode-data#performance)
`unicode-data` is up to [_5 times faster_](unicode-data-core#performance)
than `base`.

## Unicode database version update

See `unicode-data`’s [guide](unicode-data/README.md#unicode-database-version-update).
See `unicode-data-core`’s [guide](unicode-data-core/README.md#unicode-database-version-update).

## Licensing

`unicode-data*` packages are an [open source](https://github.com/composewell/unicode-data)
project available under a liberal [Apache-2.0 license](unicode-data/LICENSE).
project available under a liberal [Apache-2.0 license](unicode-data-core/LICENSE).

## Contributing

Expand Down
31 changes: 16 additions & 15 deletions experimental/unicode-data-text/unicode-data-text.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ tested-with: GHC==8.0.2
, GHC==8.6.5
, GHC==8.8.4
, GHC==8.10.7
, GHC==9.0.1
, GHC==9.2.1
, GHC==9.4.2
, GHC==9.0.2
, GHC==9.2.5
, GHC==9.4.4
, GHC==9.6.0

extra-source-files:
Changelog.md
Expand Down Expand Up @@ -66,9 +67,9 @@ library

hs-source-dirs: lib
build-depends:
base >= 4.7 && < 4.19,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.5
base >= 4.7 && < 4.19,
text >= 1.2.4 && < 2.1,
unicode-data-core >= 15.0.0 && < 15.1

test-suite test
import: default-extensions, compile-options
Expand All @@ -79,9 +80,9 @@ test-suite test
other-modules:
Unicode.Text.CaseSpec
build-depends:
base >= 4.7 && < 4.19,
hspec >= 2.0 && < 2.11,
text >= 1.2.4 && < 2.1,
base >= 4.7 && < 4.19,
hspec >= 2.0 && < 2.11,
text >= 1.2.4 && < 2.1,
unicode-data-text
build-tool-depends:
hspec-discover:hspec-discover >= 2.0 && < 2.11
Expand All @@ -95,11 +96,11 @@ benchmark bench
hs-source-dirs: bench
main-is: Main.hs
build-depends:
base >= 4.7 && < 4.19,
deepseq >= 1.1 && < 1.5,
tasty-bench >= 0.2.5 && < 0.4,
tasty >= 1.4.1,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.5,
base >= 4.7 && < 4.19,
deepseq >= 1.1 && < 1.5,
tasty-bench >= 0.2.5 && < 0.4,
tasty >= 1.4.1,
text >= 1.2.4 && < 2.1,
unicode-data-core >= 15.0.0 && < 15.1,
unicode-data-text
ghc-options: -O2 -fdicts-strict -rtsopts
1 change: 1 addition & 0 deletions stack.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
resolver: lts-18.18
packages:
- './unicode-data'
- './unicode-data-core'
- './unicode-data-names'
- './unicode-data-scripts'
- './unicode-data-security'
Expand Down
4 changes: 2 additions & 2 deletions ucd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ run_generator() {
# Compile and run ucd2haskell
cabal run --flag ucd2haskell ucd2haskell -- \
--input "./data/$VERSION" \
--output-core ./unicode-data/lib/ \
--output-core ./unicode-data-core/lib/ \
--output-names ./unicode-data-names/lib/ \
--output-scripts ./unicode-data-scripts/lib/ \
--output-security ./unicode-data-security/lib/ \
Expand All @@ -98,7 +98,7 @@ run_generator() {
--core-prop Pattern_White_Space
# Update unicodeVersion in Unicode.Char
VERSION_AS_LIST=$(echo "$VERSION" | sed "s/\./, /g")
sed -ri "s/^(unicodeVersion = makeVersion \[)[^]]*\]/\1$VERSION_AS_LIST\]/" "unicode-data/lib/Unicode/Char.hs"
sed -ri "s/^(unicodeVersion = makeVersion \[)[^]]*\]/\1$VERSION_AS_LIST\]/" "unicode-data-core/lib/Unicode/Char.hs"
}

# Print help text
Expand Down
8 changes: 4 additions & 4 deletions ucd2haskell/exe/UCD2Haskell.hs
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ data CLIOptions =
CLIOptions
{ input :: FilePath
, output_core :: FilePath
-- ^ `unicode-data`
-- ^ @unicode-data-core@
, output_names :: FilePath
-- ^ `unicode-data-names`
-- ^ @unicode-data-names@
, output_scripts :: FilePath
-- ^ `unicode-data-scripts`
-- ^ @unicode-data-scripts@
, output_security :: FilePath
-- ^ `unicode-data-security`
-- ^ @unicode-data-security@
, core_prop :: [String]
}
deriving (Show, Generic, HasArguments)
Expand Down
11 changes: 2 additions & 9 deletions ucd2haskell/ucd2haskell.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,9 @@ copyright: 2020 Composewell Technologies and Contributors
category: Data,Text,Unicode
stability: Experimental
build-type: Simple
tested-with: GHC==8.0.2
, GHC==8.2.2
, GHC==8.4.4
, GHC==8.6.5
, GHC==8.8.4
tested-with: GHC==8.8.4
, GHC==8.10.7
, GHC==9.0.1
, GHC==9.2.1
, GHC==9.4.2
, GHC==9.6.0
, GHC==9.2.5

extra-source-files:
README.md
Expand Down
102 changes: 102 additions & 0 deletions unicode-data-core/Changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Changelog

## 15.0.0 (February 2023)

- Rename package from `unicode-data` to `unicode-data-core`.
[`unicode-data`](https://hackage.haskell.org/package/unicode-data) is now
an all-in-one package with heavier dependencies.
- New version scheme: `U.B.M`, where `U` is the Unicode standard major version
number, `B` marks a breaking change and `M` a non-breaking change per
[PVP](https://pvp.haskell.org/).

### Breaking changes

- Removed deprecated predicates `isLower` and `isUpper` in `Unicode.Char.Case`.
To migrate, use `isLowerCase` and `isUpperCase` respectively.
- Removed deprecated predicates `isLetter` and `isSpace` in `Unicode.Char.General`.
To migrate, use `isAlphabetic` and `isWhiteSpace` respectively.
- Remove deprecated predicate `isNumber` in `Unicode.Char.Numeric`.
To migrate, use `isNumber` from `Unicode.Char.Numeric.Compat`.

## 0.4.0.1 (December 2022)

- Fix [Unicode blocks handling on big-endian architectures](https://github.com/composewell/unicode-data/issues/97).

## 0.4.0 (October 2022)

- Update to [Unicode 15.0.0](https://www.unicode.org/versions/Unicode15.0.0/).

## 0.3.1 (September 2022)

- Added full case conversions to `Unicode.Char.Case`:

- Case folding: `caseFoldMapping` and `toCaseFoldString`.
- Lower case: `lowerCaseMapping` and `toLowerString`.
- Upper case: `upperCaseMapping` and `toUpperString`.
- Title case: `titleCaseMapping` and `toTitleString`.
- Stream mechanism: `Unfold` and `Step`.

- Added `isNumeric`, `numericValue` and `integerValue`
to `Unicode.Char.Numeric`.
- Added the module `Unicode.Char.General.Blocks`.
- Add compatibility module:

- `Unicode.Char.Numeric.Compat`

### Deprecations

- `Unicode.Char.Numeric.isNumber`: it will be replaced by `isNumeric`
in a _future_ version of this package.
Use the function in `Unicode.Char.Numeric.Compat` instead.

## 0.3.0 (December 2021)

- Support for big-endian architectures.
- Added `unicodeVersion`.
- Added `GeneralCategory` data type and corresponding `generalCategoryAbbr`,
`generalCategory` functions.
- Added the following functions to `Unicode.Char.General`:
`isAlphabetic`, `isAlphaNum`,
`isControl`, `isMark`, `isPrint`, `isPunctuation`, `isSeparator`,
`isSymbol` and `isWhiteSpace`.
- Added the module `Unicode.Char.Numeric`.
- Add compatibility modules:

- `Unicode.Char.General.Compat`
- `Unicode.Char.Case.Compat`

These modules are compatible with `base:Data.Char`.
- Re-export some functions from `Data.Char` in order to make `Unicode.Char`
a drop-in replacement in a _future_ version of this package.
- Drop support for GHC 7.10.3

### Deprecations

- In `Unicode.Char.Case`:

- `isUpper`: use `isUpperCase` instead.
- `isLower`: use `isLowerCase` instead.

- In `Unicode.Char.General`:

- `isLetter`: use `isAlphabetic` instead.
- `isSpace`: use `isWhiteSpace` instead.

- In `Unicode.Char`: same as hereinabove. These functions will be replaced in a
_future_ release with the functions with the same names from
`Unicode.Char.Case.Compat` and `Unicode.Char.General.Compat`.

## 0.2.0 (November 2021)

* Update to [Unicode 14.0.0](https://www.unicode.org/versions/Unicode14.0.0/).
* Add `Unicode.Char.Identifiers` supporting Unicode Identifier and Pattern
Syntax.

## 0.1.0.1 (Jul 2021)

* Workaround to avoid incorrect display of dependencies on Hackage by moving
build-depends of ucd2haskell executable under a build flag conditional.

## 0.1.0 (Jul 2021)

* Initial release
Loading