Releases: x448/float16
v0.8.4 (Jan 17, 2020)
v0.8.3 (Jan 12, 2020)
No programming changes in this release except to update module paths.
Changes include:
- updated unit test and benchmark code to use new module path
- moved FromNaNps() up higher in the code to be near similar functions
- corrected mistakes and module path in README.md
- updated go.mod after transferring project back from organization (same owner)
I deleted v0.8.2 released last night because the module path got broken after transferring the project back. Good reminder to rest and get well if you get too sick to go out, instead of messing up simple tasks.
Thanks @fxamacker for catching the mistakes and letting me know.
v0.8.1 (Jan 9, 2020)
Changes include:
- Add PrecisionUnknown as a return value of PrecisionFromfloat32. Number of possible return values was 4 and is now 5.
// Precision indicates whether the conversion to Float16 is
// exact, subnormal without dropped bits, inexact, underflow, or overflow.
type Precision int
const (
// PrecisionExact is for non-subnormals that don't drop bits during conversion.
// All of these can round-trip float32->float16->float32.
PrecisionExact Precision = iota
// PrecisionUnknown is for subnormals that don't drop bits during conversion but
// not all of these can round-trip so precision is unknown without more effort.
// Only 2046 of these can round-trip and the rest cannot round-trip.
PrecisionUnknown
// PrecisionInexact is for dropped significand bits and cannot round-trip.
// Some of these are subnormals.
// Cannot round-trip float32->float16->float32.
PrecisionInexact
// PrecisionUnderflow is for Underflows.
// Cannot round-trip float32->float16->float32.
PrecisionUnderflow
// PrecisionOverflow is for Overflows.
// Cannot round-trip float32->float16->float32.
PrecisionOverflow
)
// PrecisionFromfloat32 returns Precision without performing the conversion.
// Conversions from both Infinity and NaN values will always report PrecisionExact
// even if NaN payload is lost or NaN quiet-bit is changed. This function is kept simple
// to allow inlining and run < 0.5 ns/op, to serve as a fast filter.
func PrecisionFromfloat32(f32 float32) Precision
v0.8.0 (Jan 5, 2020)
v0.7.1 (Jan 2, 2020)
Changes include:
- README.md was updated to fix compatibility with non-GitHub markdown.
There are no coding changes in this release.
v0.7.0 (Jan 1, 2020)
Changes include:
- Add
(f Float16) Bits() uint16
. - Update docs and unit test to include the new method.
Bits
returns the IEEE 754 binary16 representation of f, with the sign bit of f and the result in the same bit position. Bits(Frombits(x)) == x.
Calling Bits
should inline as a simple type cast so it doesn't add bloat while making the API nicer.
v0.6.0 (Dec 31, 2019)
Initial public release.
The core API is done. Breaking changes to API are unlikely. Additional features are planned.
All possible 4+ billion conversions between float16 and float32 are verified to be correct.
Conversions Between Float16 and Float32
- float16 to float32 conversions use lossless conversion.
- float32 to float16 conversions use IEEE 754-2008 "Round-to-Nearest RoundTiesToEven".
- all conversions use zero allocations and take 2.65 ns/op (in pure Go) on a desktop amd64.
Testing and Coverage
- 100% of unit tests pass:
- short mode (
go test -short
) tests around 65763 conversions in 0.005s. - normal mode (
go test
) tests all possible 4+ billion conversions in about 45s.
- short mode (
- 100% code coverage with both short mode and normal mode.
- tested on amd64 with Go 1.11, 1.12 and 1.13.
This library should work on all little-endian platforms supported by Go.
Travis is configured to use go test -short
.