Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aditional rounding modes #110

Closed

Conversation

jurevreca12
Copy link

This pull request introduces some additional rounding modes, and provides a table, that more accurately describes their behavior. Concretely, the following table has been added to docs/qonnx-custom-ops/quant_op.md:

Number \ ROUNDING_MODE ROUND=HALF_EVEN CEIL FLOOR UP DOWN HALF_UP HALF_DOWN
5.5 6 6 5 6 5 6 5
2.5 2 3 2 3 2 3 2
1.6 2 2 1 2 1 2 2
1.1 1 2 1 2 1 1 1
1.0 1 1 1 1 1 1 1
-1.0 -1 -1 -1 -1 -1 -1 -1
-1.1 -1 -1 -2 -2 -1 -1 -1
-1.6 -2 -1 -2 -2 -1 -2 -2
-2.5 -2 -2 -3 -3 -2 -3 -2
-5.5 -6 -5 -6 -6 -5 -6 -5

The newly introduced rounding modes are: UP, DOWN, HALF_UP, and HALF_DOWN. These rounding modes were inspired by rounding modes in the java math library (https://docs.oracle.com/javase/8/docs/api/java/math/RoundingMode.html), and the implementation in the Chisel dsptools library (https://github.com/ucb-bar/dsptools/blob/master/src/main/scala/dsptools/numbers/chisel_types/FixedPointTypeClass.scala#L156).

This issue partially solves the incompatibility between a high-level python implementation and a circuit implementation. For instance, consider the following test function for QKeras (v0.9.0):

def test_quantized_bits_rounding_mode():
    alpha1 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha=1)
    alpha111 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha=[1, 1, 1])
    alpha_po2 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha='auto_po2')
    try:
        assert np.array_equal(alpha1(np.array([2.5, 2.5, 3.5])), alpha111(np.array([2.5, 2.5, 3.5])))
        assert np.array_equal(alpha1(np.array([2.5, 2.5, 3.5])), alpha_po2(np.array([2.5, 2.5, 3.5])))
    finally:
        print(alpha1.scale)
        print(alpha111.scale)
        print(alpha_po2.scale)

The function above will fail on the second assert. However, the scaling factors printed in the finally block will be 1, [1,1,1] and [1,1,1]. The reason is that when using "auto_po2" the rounding mode is actually "round half up". This can be seen on:
https://github.com/google/qkeras/blob/67e7c6b8cbd6befd594f142187ac4b73b35512ac/qkeras/quantizers.py#L570C45-L570C46

v = tf.floor(tf.abs(x) / scale + 0.5)

This pull request does the following:

  • Adds rounding modes to spec.
  • Ads implementation of the rounding modes to resolve_rounding_mode function in src/qonnx/custom_op/general/quant.py.
  • Ads a simple test to check the implementation of the rounding modes tests/custom_op/test_rounding_mode.py.

The request does NOT do the following:

  • It does not fix the QKeras/Brevitas converters.

I refrained from updating the converters because I don't know the code base very well, and secondly the tests seem to be written with assert_allclose, i.e. approximate compatibility. Issues with rounding modes can be quite subtle, so they would be hard to catch with approximate compatibility.

I have had success making a bit accurate conversion between QKeras and circuits in chisel4ml, after I introduced precise rounding modes. However, this is only when all tensors had a known quantization, and the scaling factor is power-of-two. Looking at the qonnx code base, I have a hard time seeing how the input quantization is specified. In chisel4ml for instance, this is done directly as shown:

x = x_in = tf.keras.layers.Input(shape=3)
x = qkeras.QActivation(
    qkeras.quantized_bits(bits=4, integer=3, keep_negative=True)
)(x)
x = qkeras.QDense(
    4,
    kernel_quantizer=qkeras.quantized_bits(
        bits=4, integer=3, keep_negative=True, alpha=np.array([0.5, 0.25, 1, 0.25])
    ),
)(x)
x = qkeras.QActivation(qkeras.quantized_relu(bits=3, integer=3))(x)
x = qkeras.QDense(
    1,
    kernel_quantizer=qkeras.quantized_bits(
        bits=4, integer=3, keep_negative=True, alpha=np.array([0.125])
    ),
)(x)
x = qkeras.QActivation(qkeras.quantized_relu(bits=3, integer=3))(x)
model = tf.keras.Model(inputs=[x_in], outputs=[x])

This means that the inputs must be quantized to a signed 4-bit integer. I realize that qonnx targets a larger subset of neural network descriptions, however, I believe that it would be useful to make a distinction for these kind of networks(https://arxiv.org/abs/2011.10680 this paper calls them Dyadic Neural networks), as:

  1. they are highly efficient to implement in hardware, and
  2. I believe they can be "simulated" with bit-level accuracy using floating-point operations.

I have only empirically shown bit-level accuracy, however, considering the way floating-point is specified (having a power-of-two exponent bits) the equivalence should hold, as long as the mantisa/fraction field is not to big. And if it does get to big, you can also move to 64-bit floating-point number for example.

@jurevreca12 jurevreca12 marked this pull request as draft March 25, 2024 11:52
@jurevreca12
Copy link
Author

I am closing this pull request, as it has several features jumbled into it. I will make several new pull requests for all the separate functionality added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant