Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display driver architecture performance bottleneck #141

Open
fornellas opened this issue Aug 29, 2020 · 1 comment
Open

Display driver architecture performance bottleneck #141

fornellas opened this issue Aug 29, 2020 · 1 comment

Comments

@fornellas
Copy link
Contributor

fornellas commented Aug 29, 2020

I have been working on a driver for ST7789 (#132). While doing so, I learned that the current model for HAL has too many layers that do "nothing" and make things runs at half the max speed.

The problem

Currently we have to define a function with:

  switch(msg) {
    case UCG_MSG_DEV_POWER_UP:
      ...

and it is set at com_cb at struct _ucg_t. This is used by other layers, like ucg_com_SendString. For example, this code:

  for(ucg_int_t i = 0; i < ucg->arg.len; i++ ) {
    ucg_com_SendString(ucg, 2, buff);
    ucg->arg.pixel.pos.x+=dx;
    ucg->arg.pixel.pos.y+=dy;
  }

translates to this at the wire:

image

SCK is running at 20MHz, but we still have half the time spent outside of this loop from UCG_COM_MSG_SEND_STR. If we remove ucg_com_SendString(ucg, 2, buff) and use this:

  for(ucg_int_t i = 0; i < ucg->arg.len; i++ ) {
    for(int j=0;j<2;j++)
      spi_send(SPI1, buff[j]); 
    ucg->arg.pixel.pos.x+=dx;
    ucg->arg.pixel.pos.y+=dy;
  }

Then things run twice as fast, with negligible delay between sending bytes:

image

This is the difference between refreshing the screen at ~11Hz to ~20Hz! I did some math, this shaves ~50 clock cycles from my STM32 running at 84MHz.

Fix

We add another member to struct _ucg_t, which is a struct, with pointers to functions for each of the cases from com_cb, like UCG_COM_MSG_POWER_UP, UCG_COM_MSG_POWER_DOWN etc:

struct _ucg_com_cb_funcs {
  void (*power_up)(ucg_t *, ucg_com_info_t *);
  void (*power_down)(ucg_t *);
  void (*delay)(ucg_t *, uint16_t);
  void (*change_reset_line)(ucg_t *, uint8_t);
  void (*change_cd_line)(ucg_t *, uint8_t);
  void (*change_cs_line)(ucg_t *, uint8_t);
  void (*send_byte)(ucg_t *, uint8_t);
  void (*repeat_1_byte)(ucg_t *, uint16_t, uint8_t);
  void (*repeat_2_bytes)(ucg_t *, uint16_t, uint8_t[2]);
  void (*repeat_3_bytes)(ucg_t *, uint16_t, uint8_t[3]);
  void (*send_str)(ucg_t *, uint16_t, uint8_t *);
  void (*send_cd_data_sequence)(ucg_t *, uint16_t, uint8_t[]);
};

struct _ucg_t {
  struct _ucg_com_cb_funcs com_cb_funcs;
  ...

ucg_Init can be updated to support this new struct (or we can have an alternative constructor).

This allows avoiding ucg_com_SendString altogether allowing things to be fast:

  for(ucg_int_t i = 0; i < ucg->arg.len; i++ ) {
    ucg->com_cb_funcs.send_byte(ucg, buff[0]);
    ucg->com_cb_funcs.send_byte(ucg, buff[1]);
    ucg->arg.pixel.pos.x+=dx;
    ucg->arg.pixel.pos.y+=dy;
  }

Benchmarks:

Delay between sending each pack of 2 bytes for a few cases:

  • With send string:

    • Before: ucg_com_SendString(ucg, 2, buff);: 620ns
    • After: ucg->com_cb_funcs.send_str(ucg, 2, buff);: 240ns 60% faster
  • With send byte:

    • Before: ucg_com_SendByte(ucg, buff[0]); ucg_com_SendByte(ucg, buff[1]);: 590ns
    • After: ucg->com_cb_funcs.send_byte(ucg, buff[0]); ucg->com_cb_funcs.send_byte(ucg, buff[1]);: 30ns 94% faster

Compatibility

ucg_Init can have its signature updated to accept struct _ucg_com_cb_funcs as well or we can add another constructor. We can also set defaults for these values at the constructor:

  ucg->com_cb_funcs.power_up = default_power_up;

and forward calls to the existing com_cb function:

static void default_power_up(ucg_t *ucg, ucg_com_info_t *ucg_com_info) {
  ucg->com_cb(ucg, UCG_COM_MSG_POWER_UP, 0, ucg_com_info);
}

This allows all display drivers to continue to work, but some drivers can start onboarding with this new system.

PR

I can write a PR for this implementation, if there's willingness to merge it.

@fornellas
Copy link
Contributor Author

For those interested, I've given up on fixing ucglib due to lack of response on various PRs / issues, so I've solved the problems by writing a shiny new library
https://fornellas.github.io/eglib/
which solves many of the architectural problems with ucglib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant