Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global state #68

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Remove global state #68

wants to merge 35 commits into from

Conversation

jcmoyer
Copy link

@jcmoyer jcmoyer commented May 2, 2024

Hello, first of all thank you for releasing this project. The SC-55 is legendary and it is great to see such an important piece of history preserved.

The main goal of this PR is to remove global state from the emulator so that multiple instances can be created in a single process. This PR splits the emulator into a frontend (the SDL application) and a backend (the SC-55 emulator). A frontend can create and manage as many backends as it sees fit. As a side effect this also turns the backend into a library that can be linked into new frontends.

Global state is blocking a couple of issues:

In an effort to make this PR as mergeable as possible, I tried to make a minimum number of changes to existing code. New code attempts to follow existing patterns.

  • Most variable names were preserved
    • One exception: mcu_timer had some variable shadowing (global mcu_timer_t timer + local frt_t* timer) - this was resolved by renaming the local frt_t* timer to ftimer
  • I tried to stick to using C features since the project seems to be doing that already. Two exceptions: references for parameters (done to avoid changing . to -> everywhere, but these can be changed to pointers on request), and two math function templates (these can also be changed, but I'd like to avoid it).
  • All globals that were previously initialized at file scope are now initialized in their respective <module>_Init function. For example, sw_pos is initialized to 3 in MCU_Init. They are not initialized inline because doing so would cause the compiler to generate a default constructor, which makes the type non-trivial so it cannot be memset.

That said, the changes weren't all simple and some problems took a bit more effort to solve. Other contributors might like to be aware of these points:

  • All global variables were moved into their respective component structures. For example, lcd_width is now a field in lcd_t.
  • Some structures now contain RAM/ROM inline and pointers to other components. They cannot be memset to reset their state.
  • TIMER_Reset wasn't used anywhere. I renamed it TIMER_Init and use it to initialize a timer and link it to its MCU.
  • LCD size is now only set in LCD_Init instead of two places.
  • LCD m_back_path was removed because moving a std::string into lcd_t would make it non-trivial. LCD_SetBackPath was replaced with LCD_LoadBack and should be called after LCD_Init.
  • Global lcd_init was removed. If LCD_Init succeeded you have an LCD.
  • LCD event handling was extracted from LCD_Update into LCD_HandleEvent. Since multiple LCDs are supported, each LCD needs to handle only events targeted at it.
  • The current frontend SDL application was extracted from the MCU module. main is now located in main_frontend.cpp.
  • The MCU now posts sample data to the frontend as it is produced via callback. This approach was chosen because it allows the frontend to decide how to handle sample data in a zero-copy manner.
  • A new emu_t type was introduced that is conceptually a single, complete backend instance. It contains all of the SC-55 emulator state and links the individual modules together. Romset detection and loading has been moved here since romsets need to be loaded into multiple emulator components.
  • The ringbuffer in the current MCU module was extracted into a new type ringbuffer_t. The frontend creates a ringbuffer for each emu instance and fills it as the frontend receives sample data. All instance buffers are then read and mixed in the SDL audio callback. Right now ringbuffer_t is specialized for signed 16-bit sample data but I would like for it to eventually become a template type to handle different types of sample data (useful for Output 32-bit float samples to preserve dynamic range and prevent clipping #59 for instance).

As an example of what this PR enables I have modified the frontend SDL application so that you can pass -instances:<n> to open n instances of the emulator. Non-sysex MIDI events are routed to their midi_channel_id % n (so if n=2, even channels go to the first emulator, and odd channels go to the second one). Sysex events are broadcast to all emulators. This is a quick and easy way to solve polyphony problems at the expense of twice the computing power.

Performance impact

Performance seems to be unchanged or marginally better. I benchmarked running the emulator as fast as possible without locking or waiting for the ringbuffer to drain for 100_000_000 iterations. These are the time measurements for 5 consecutive runs:

master multi-instance -instances:1 multi-instance -instances:2
1 30411 ms 29287 ms 29872 ms
2 30009 ms 29393 ms 29709 ms
3 30972 ms 28519 ms 28786 ms
4 30259 ms 29483 ms 29867 ms
5 30463 ms 29288 ms 29466 ms
average 30423 ms 29194 ms 29540 ms
%change -4.04% -2.90%

@nikitalita
Copy link
Contributor

This is really cool; I've tested this with -instances:2 on midi files that had polyphony >28 such that parts would cut out or go flat on a real sc-55 mk2, and this plays those beautifully.

@pachuco
Copy link
Contributor

pachuco commented May 2, 2024

Yes please, make it easier to use as lib.

@johnnovak
Copy link

johnnovak commented May 4, 2024

Very nice @jcmoyer, this makes it a lot easier to turn this into a plugin that can be instantiated multiple times. Then clear separation and boundaries between the processing engine and the frontend is just good design. You should be able to run the processing engine completely "headless", without any UI, just from the command line.

// Should be called after loading roms
void EMU_Reset(emu_t& emu);

void EMU_SetSampleCallback(emu_t& emu, mcu_sample_callback callback, void* userdata);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMU_SetRenderCallback and mcu_render_callback would be better.

}
}

void MCU_Init(void)
void MCU_DefaultSampleCallback(void* userdata, int* sample)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest MCU_DefaultRenderCallback

}
}

void FE_ReceiveSample(void* userdata, int *sample)
Copy link

@johnnovak johnnovak May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing something like an AudioFrame struct we use in DOSBox Staging would be nicer than these raw unsafe pointers:

https://github.com/dosbox-staging/dosbox-staging/blob/main/include/audio_frame.h

Another general comment: using std::array or std::vector is zero-cost compared to C arrays but safer and there's potential to handle them in a nicer way (e.g., using iterators instead of error prone indexing and oldschool for loops).

These are all zero-cost and make for readable and more maintainable code. On MSVC on Windows you even get bounds checking for free with the standard STL containers in debug builds which is great.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this function should be changed (as well as MCU_PostSample) but I left it as I found it. The int* is a performance pessimization and it's not an intuitive API.

using std::array or std::vector is zero-cost compared to C arrays but safer

I totally agree with you and I would have written it that way myself, but I didn't want to introduce a bunch of C++ features since the current code is not using them. As far as I can tell the only STL type used anywhere is std::string, and std::vector made it into the rtmidi module only because it's required for the rtmidi callback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance pessimization and it's not an intuitive API

Haha, okay, I can see I'm among friends—we're 100% on the same page 😏 I get it, you didn't want to change too much.

else if (sample[1] < INT16_MIN)
sample[1] = INT16_MIN;
RB_Write(fe.sample_buffer, sample[0], sample[1]);
}
Copy link

@johnnovak johnnovak May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, you're overthinking this a bit and micro-optimising early... You could just simply pass AudioFrame structs around (or similar). Doing something like this is perfectly fine, so just using simple values:

AudioFrame RenderNextSample() { ... }

The compiler is more than capable of getting rid unnecessary copies, plus you'll get guaranteed copy elision and return value optimisation unless you're using some archaic compiler.

https://en.wikipedia.org/wiki/Copy_elision

So no need to complicate life with pointers and such (which can make things run slower in fact, because unsafe pointers can point to well anything, which makes the compiler's job a lot harder than just expressing intent by returning simple structs/values).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you could simply try it and re-run your nice test suite that you used for the measurements. My guess is you won't see any performance degradation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement by introducing audio_frame_t 👍🏻

src/ringbuffer.h Outdated
struct ringbuffer_t {
int16_t* samples;
size_t frame_count;
size_t sample_count;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to always use the terms frames, always deal with frames and frame counts, unless you absolutely need to do things at sample level. Simplifies the code a bit and makes terms less ambiguous.

In my own code, the only exception to this is using "sample rate" because that's such an established term (instead of the technically correct "frame rate").

So std::vector<AudioFrame> would make this a lot simpler, then you don't need separate count vars which can be error-prone to maintain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good suggestion, and it fixes some parts of the API I was kind of unhappy with. I will try to get around to simplifying this tomorrow.

Copy link

@johnnovak johnnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job @jcmoyer , left a few comments. You might take them onboard, or completely ignore them 😄 I'll probably use your code for my own stuff and do the suggested refactorings anyway (and I will credit you, of course).

This PR is an important step to take @nukeykt 's excellent work to the next level and introduce proper separation of concerns and proper multi-instance support by getting rid of globals.

mcu_timer_t* timer;
lcd_t* lcd;
pcm_t* pcm;
};
Copy link

@johnnovak johnnovak May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general comment: could use std::unique_ptr for these, and in general it would be good to get rid of pointers and use references and standard containers instead. All that would be zero-cost, no performance impact, or even could make things faster by expressing intent to the compiler instead of attempting to outsmart it by direct pointer access (doesn't always end well).

I saw you started using emu_t& emu and similar stuff, so probably you have the same views as myself, just didn't want to blast the whole codebase apart in a single PR 😎

static const int SRAM_SIZE = 0x8000;
static const int NVRAM_SIZE = 0x8000; // JV880 only
static const int CARDRAM_SIZE = 0x8000; // JV880 only
static const int ROMSM_SIZE = 0x1000;
Copy link

@johnnovak johnnovak May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: could be static constexpr uint32_t. Using uint32_t (or uint16_t) for hex addresses probably is a bit more correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I just moved these constants from elsewhere. I didn't change anything that didn't need to be.

void* callback_userdata;
mcu_sample_callback sample_callback;

SDL_mutex *work_thread_lock;
Copy link

@johnnovak johnnovak May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, you could take it further and initialise every var. So for example:

    uint16_t ad_val[4] = {};
    uint8_t ad_nibble = {};
    uint8_t sw_pos = {};
    uint8_t io_sd = {};

    submcu_t* sm = nullptr;
    pcm_t* pcm = nullptr;
    mcu_timer_t* timer = nullptr;
    lcd_t* LCD = nullptr;

Not initialising can lead to obscure bugs, and I wasted too much time on that, so these days I always initialise everything, it's cheap to do so and eliminated weird bugs that manifest only sometime, depending on the concrete memory layout (e.g. obscure bugs that crash your program every 10th run or something because some var you forgot to init is sometimes zero, and sometimes some random number).

Same comment for every var everywhere—always initialising is just good practice to save some headache later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but since memset has been used throughout the codebase to clear state I don't want to do anything that would make these types non-trivial. I thought it would be best to submit a PR as more-or-less C-style C++ in case @nukeykt doesn't approve of C++ features, and if he does then these things can be pretty easily cleaned up later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but since memset has been used throughout the codebase to clear state

Right, I actually missed that 😄 Not used to seeing memset much these days 😄

void MCU_WorkThread_Lock(void);
void MCU_WorkThread_Unlock(void);
void MCU_WorkThread_Lock(mcu_t& mcu);
void MCU_WorkThread_Unlock(mcu_t& mcu);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how you feel about const-correctness, but a lot of args and vars could be made const.


int romset = ROM_SET_MK2;

frontend_t frontend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, this would end up allocating a giant chunk of memory in stack, which could cause problems. I did this exact refactor in my branch (https://github.com/giulioz/Nuked-SC55/tree/audiounit_jv880) and I started to have some stack overflows in some conditions by using this method. I think using a malloc here would be a bit safer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is only 1688 bytes and you'll get 1-2mb of stack space by default on windows depending on the compiler, usually more on linux. At worst it's 0.16% of the total stack reservation. I did run into some stack overflow problems with the larger structures but those are malloc'd now and this type mostly contains pointers to those ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood how the structures are created! You are actually creating the mcu and pcm data structures with a malloc, sorry :)

struct audio_frame_t {
int16_t left;
int16_t right;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve 😎

@Axis4s
Copy link

Axis4s commented May 13, 2024

Hello, first of all thank you for releasing this project. The SC-55 is legendary and it is great to see such an important piece of history preserved.

The main goal of this PR is to remove global state from the emulator so that multiple instances can be created in a single process. This PR splits the emulator into a frontend (the SDL application) and a backend (the SC-55 emulator). A frontend can create and manage as many backends as it sees fit. As a side effect this also turns the backend into a library that can be linked into new frontends.

Global state is blocking a couple of issues:

* [Outputting MIDIs to WAV/MP3 possible? #11](https://github.com/nukeykt/Nuked-SC55/issues/11) - Polyphony limits on prototype midi-wav renderer

* [proposal to make standalone library and add multi-instance capability #12](https://github.com/nukeykt/Nuked-SC55/issues/12) - Proposal for a frontend/backend split with linkable backend

* [combine 2 emus to increase polyphony #29](https://github.com/nukeykt/Nuked-SC55/issues/29) - Polyphony limits; closed by this PR

In an effort to make this PR as mergeable as possible, I tried to make a minimum number of changes to existing code. New code attempts to follow existing patterns.

* Most variable names were preserved
  
  * One exception: `mcu_timer` had some variable shadowing (global `mcu_timer_t timer` + local `frt_t* timer`) - this was resolved by renaming the local `frt_t* timer` to `ftimer`

* I tried to stick to using C features since the project seems to be doing that already. Two exceptions: references for parameters (done to avoid changing . to -> everywhere, but these can be changed to pointers on request), and two math function templates (these can also be changed, but I'd like to avoid it).

* All globals that were previously initialized at file scope are now initialized in their respective `<module>_Init` function. For example, `sw_pos` is initialized to 3 in `MCU_Init`. They are not initialized inline because doing so would cause the compiler to generate a default constructor, which makes the type non-trivial so it cannot be `memset`.

That said, the changes weren't all simple and some problems took a bit more effort to solve. Other contributors might like to be aware of these points:

* All global variables were moved into their respective component structures. For example, `lcd_width` is now a field in `lcd_t`.

* Some structures now contain RAM/ROM _inline_ and pointers to other components. They cannot be `memset` to reset their state.

* `TIMER_Reset` wasn't used anywhere. I renamed it `TIMER_Init` and use it to initialize a timer and link it to its MCU.

* LCD size is now only set in `LCD_Init` instead of two places.

* LCD `m_back_path` was removed because moving a `std::string` into `lcd_t` would make it non-trivial. `LCD_SetBackPath` was replaced with `LCD_LoadBack` and should be called after `LCD_Init`.

* Global `lcd_init` was removed. If `LCD_Init` succeeded you have an LCD.

* LCD event handling was extracted from `LCD_Update` into `LCD_HandleEvent`. Since multiple LCDs are supported, each LCD needs to handle only events targeted at it.

* The current frontend SDL application was extracted from the MCU module. `main` is now located in `main_frontend.cpp`.

* The MCU now posts sample data to the frontend as it is produced via callback. This approach was chosen because it allows the frontend to decide how to handle sample data in a zero-copy manner.

* A new `emu_t` type was introduced that is conceptually a single, complete backend instance. It contains all of the SC-55 emulator state and links the individual modules together. Romset detection and loading has been moved here since romsets need to be loaded into multiple emulator components.

* The ringbuffer in the current MCU module was extracted into a new type `ringbuffer_t`. The frontend creates a ringbuffer for each emu instance and fills it as the frontend receives sample data. All instance buffers are then read and mixed in the SDL audio callback. Right now `ringbuffer_t` is specialized for signed 16-bit sample data but I would like for it to eventually become a template type to handle different types of sample data (useful for [Output 32-bit float samples to preserve dynamic range and prevent clipping #59](https://github.com/nukeykt/Nuked-SC55/pull/59) for instance).

As an example of what this PR enables I have modified the frontend SDL application so that you can pass -instances:<n> to open n instances of the emulator. Non-sysex MIDI events are routed to their midi_channel_id % n (so if n=2, even channels go to the first emulator, and odd channels go to the second one). Sysex events are broadcast to all emulators. This is a quick and easy way to solve polyphony problems at the expense of twice the computing power.

Performance impact

Performance seems to be unchanged or marginally better. I benchmarked running the emulator as fast as possible without locking or waiting for the ringbuffer to drain for 100_000_000 iterations. These are the time measurements for 5 consecutive runs:
master multi-instance -instances:1 multi-instance -instances:2
1 30411 ms 29287 ms 29872 ms
2 30009 ms 29393 ms 29709 ms
3 30972 ms 28519 ms 28786 ms
4 30259 ms 29483 ms 29867 ms
5 30463 ms 29288 ms 29466 ms
average 30423 ms 29194 ms 29540 ms
%change -4.04% -2.90%

As a side effect this also turns the backend into a library that can be linked into new frontends.

Does it mean it can be more easily integrated as a VST form?

@johnnovak
Copy link

Does it mean it can be more easily integrated as a VST form?

Yes.

@Falcosoft
Copy link
Contributor

Hi,
I have tried with the latest updates using VS 2019 and I have noticed 2 problems:

  1. It does not work with SC-55 MK1 roms (and supposedly neither with any rom sets where the 2nd rom is not 512k). The problem seems to be EMU_ReadStreamUpTo() and the error is "FATAL ERROR: Failed to read the mcu ROM2".
    It works this way:
    std::streamsize rom2_read = EMU_ReadStreamUpTo(s_rf[1], m_mcu->rom2, m_mcu->mcu_mk1 ? ROM2_SIZE / 2 : ROM2_SIZE);
  2. If enable_lcd = false in EMU_Options then LCD_Update() throws a null pointer exception (not with the wave renderer).
    It works with this check:
    if (!lcd.mcu) return;

@jcmoyer
Copy link
Author

jcmoyer commented May 14, 2024

Hi @Falcosoft I think you are building the modernize branch which isn't related to this PR, but I'll take a look.

EDIT:

1 is a bug, thanks for reporting it.

As for 2, enable_lcd isn't complete yet. Right now the idea is that when a frontend sets enable_lcd it either commits to calling LCD_* functions (true), or never calling LCD_* functions (false). Eventually, I would like to have a version of the backend that doesn't depend on SDL at all, and it's not clear yet how enable_lcd will fit in. That said I think it's desirable to be able to run the main frontend headless today, so I implemented this feature keeping the above design in mind. You can simply pass --no-lcd on the command line.

@Falcosoft
Copy link
Contributor

Hi @Falcosoft I think you are building the modernize branch which isn't related to this PR, but I'll take a look.

EDIT:

1 is a bug, thanks for reporting it.

As for 2, enable_lcd isn't complete yet. Right now the idea is that when a frontend sets enable_lcd it either commits to calling LCD_* functions (true), or never calling LCD_* functions (false). Eventually, I would like to have a version of the backend that doesn't depend on SDL at all, and it's not clear yet how enable_lcd will fit in. That said I think it's desirable to be able to run the main frontend headless today, so I implemented this feature keeping the above design in mind. You can simply pass --no-lcd on the command line.

OK, thanks!

jcmoyer added 6 commits May 21, 2024 08:41
This also reorders main() so that the mcu_t structure isn't zeroed after
loading ROMs. ROM loading will probably need to be extracted from main
at some point, especially for library usage.
jcmoyer added 27 commits May 21, 2024 08:42
The existing SDL application becomes the frontend.
The backend consists of all the emulator modules (mcu, timer, pcm, etc).

The goal here is to make the backend reusable between different frontend
applications as a library.
emu is a collection of all the emulator subsystems that simplifies
initialization and allows multiple emulators to be easily instantiated
Pass `-instances:<n>` to run `n` backend emulators. Up to 16 emulators
are supported in a single process. Events are routed to the emulator
modulo the number of instances, so if `n=2` even MIDI channels will go
to the first emulator and odd MIDI channels to the second.

- Rom loading code has been moved into emu module
- LCDs now only handle events for their own window
- MIDI module now forwards events to the frontend which in turn routes
  it to the correct backend emulator
This commit routes sysex messages to all emulators and fixes a bug in
the win32 midi input where sysex messages would be duplicated.
- Proper error handling
- Deal with all the warnings these changes introduced
It is not always zero when LCD_Init is called with memory from malloc()
This allows roms to be loaded on multiple threads simultaneously.
The midi input callback might be invoked after MIDI_Quit(). If this
happens, midi_frontend is null and is not valid for FE_RouteMidi.
Missed this one, this state is now part of fe_emu_instance_t.
begin_step and wait callbacks are needlessly complicated, and
controlling how often to step the MCU is really the responsibility of
the frontend. These two callbacks have been removed and the logic baked
into FE_RunInstance. The sample callback is fine since it provides a
zero-copy way to get sample data out of the emulator.
@johnnovak
Copy link

johnnovak commented May 24, 2024

@jcmoyer Please read your email 😏 (the one linked to your GitHub account)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants