Skip to content

Commit

Permalink
[Version] v1.1.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
Duyi-Wang committed Nov 30, 2023
1 parent 50d980b commit 8fa55a5
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 3 deletions.
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# CHANGELOG

# [Version 1.1.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.1.0)
v1.1.0 - Baichuan models supported.

## Models
- Introduced Baichuan models support and added the convert tool for Baichuan models.

## Performance Optimizations
- Update xDNN to version 1.2.1 to improve performance of BF16 data type with AMX instruction on 4th generation Intel Xeon Scalable processors.
- Improved performance of BF16 data type inference by adding matMul bf16bf16bf16 primitives and optimizing kernel selection strategy.
- Improved performance of the model with unbalanced split allocation.

## Functionality
- Introduced prefix sharing feature.
- Add sample strategy for token search, support temperature, top k, and top P parameter.
- Introduce convert module to xfastertransformer python API.
- Introduced grouped-query attention support for Llama2.
- Auto-detect oneCCL environment and enter single-rank model if oneCCL does not exist.
- Auto-detect oneCCL environment in compilation. If not detected, oneCCL will be built from source.
- Add C++ exit function for multi-rank model.
- Remove mklml 3rd party dependency.
- Export normalization and position embedding C++ API, including alibi embedding and rotary embedding.
- Introduced `XFT_DEBUG_DIR` environment value to specify the debug file directory.

## BUG fix
- Fix runtime issue of oneCCL shared memory model.
- Fix path concat issue in convert tools.


# [Version 1.0.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.0.0)
This is the 1st official release of xFasterTransformer.🎇🎇🎇

## Support models
- ChatGLM-6B
- ChatGLM2-6B
- Llama 1, both 7B, 33B, and 65B
- Llama 2, both 7B, 13B, and 70B
- Opt larger than 1.3B

## Features
- Support Python and C++ API to integrate xFasterTransformer into the user's own solutions. Example codes are provided to demonstrate the usage.
- Support hybrid data types such as BF16+FP16 and BF16+INT8 to accelerate the generation of the 1st token, in addition to supporting single data types like FP16, BF16, and INT8.
- Support multiple instances to accelerate model inference, both locally and through the network.
- Support Intel AMX instruction on 4th generation Intel Xeon Scalable processors.
- Support 4th generation Intel Xeon Scalable processors with HBM which has a higher memory bandwidth and shows a much better performance on LLM.
- Provide web demo scripts for users to show the performance of LLM models optimized by xFasterTransformer.
- Support multiple distribution methods, both PyPI and docker images.
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.1.0
13 changes: 10 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,24 @@ def build_cmake(self):
self.spawn(["make", "-C", build_dir, "-j"])


def get_xft_version() -> str:
xft_root = Path(__file__).parent
version = open(xft_root / "VERSION").read().strip()

return version


setup(
name="xfastertransformer",
version="1.0.0",
version=get_xft_version(),
keywords="LLM",
description="Boost large language model inference performance on CPU platform.",
long_description=long_description,
long_description_content_type='text/markdown',
long_description_content_type="text/markdown",
license="Apache 2.0",
url="https://github.com/intel/xFasterTransformer",
author="xFasterTransformer",
author_email='[email protected]',
author_email="[email protected]",
python_requires=">=3.8",
package_dir={"": "src"},
packages=find_packages(where="src", include=["xfastertransformer"]),
Expand Down

0 comments on commit 8fa55a5

Please sign in to comment.