Skip to content

Commit

Permalink
ggml : fix quantize_row_q8_0() ARM_NEON rounding
Browse files Browse the repository at this point in the history
  • Loading branch information
ggerganov committed Apr 14, 2023
1 parent eafd47f commit 7c6c079
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions ggml.c
Original file line number Diff line number Diff line change
Expand Up @@ -1093,8 +1093,7 @@ static void quantize_row_q8_0(const float * restrict x, void * restrict vy, int

for (int l = 0; l < 8; l++) {
const float32x4_t v = vmulq_n_f32(srcv[l], id);
//TODO: rounding
const int32x4_t vi = vcvtq_s32_f32(v);
const int32x4_t vi = vcvtnq_s32_f32(v);

y[i].qs[4*l + 0] = vgetq_lane_s32(vi, 0);
y[i].qs[4*l + 1] = vgetq_lane_s32(vi, 1);
Expand Down

0 comments on commit 7c6c079

Please sign in to comment.