[WIP] Binary storage for player data #4126

WalshyDev · 2024-02-11T23:39:04Z

PR Checklist

I have fully tested the proposed changes and promise that they will not break everything into chaos.
I have also tested the proposed changes in combination with various popular addons and can confirm my changes do not break them.
I have made sure that the proposed changes do not break compatibility across the supported Minecraft versions (1.16.* - 1.20.*).
I followed the existing code standards and didn't mess up the formatting.
I did my best to add documentation to any public classes or methods I added.
I have added Nonnull and Nullable annotations to my methods to indicate their behaviour for null values
I added sufficient Unit Tests to cover my code.

What changed

There's a new config property - storage.player-data
- This accepts two values, legacy and binary
- binary is marked as experimental
- legacy is the default value for now
If binary (this PR) is selected in the config for player data we will show a large warning banner in the console:
Created a new BinaryStorage to handle writing the files as binary (more on this later)
This BinaryStorage is changing the way researches are saved, they're now using their NamespacedKey's rather than incremental IDs. This has been a change we should have made forever ago and this is step 1 to removing these number IDs.

Storage choice

Ok so, as you can see from the PR, the storage is done with NBT. The question I'm sure you're asking is why NBT over xyz?
Well, there's a few reasons;

Familiarity
a. Lots of server owners and developers are familiar with NBT. This makes adoption internally and externally much easier (especially with how much MC has leaned into it over the years)
Solid storage
a. NBT is about as small as it can get. Some people think it's inflated, the format isn't. Usage however is. That isn't something any format can really prevent.
Existing tooling
a. Lots of existing tooling exists for server owners to read these files (a bit more on this later as well)

Note

This is a simple implementation which would have to have the whole file rewritten each time. I'll explore ways to avoid this but without limiting file size, there isn't a great way to do it.

Usage example

(backpack contains 3 Slimefun items; Energized Energy Capacitor, Energized Solar Generator and a Multi Tool VII)

Legacy file with all researches

-rw-r--r--   1 walshy  staff  3511 11 Feb 23:41 52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

Total: 3.5 KB

Legacy file with all researches and a waypoint

player file;

-rw-r--r--  1 walshy  staff  3511 12 Feb 00:42 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

3.5 KB

waypoint file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:42 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 3.6 KB

Legacy file with all researches, a backpack and a waypoint

player file:

-rw-r--r--  1 walshy  staff  8526 12 Feb 00:00 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

8.5 KB

waypoint player file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:00 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 8.6 KB

Binary file with all researches (no compression)

-rw-r--r--  1 walshy  staff  6875 12 Feb 00:08 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 6.8 KB

(this is so big because of using namespaces instead of int IDs, these take up way more space per research)

Binary file with all researches (gzip compression)

-rw-r--r--  1 walshy  staff  2308 12 Feb 00:06 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.3 KB

Binary file with all researches and a waypoint (gzip compression)

-rw-r--r--  1 walshy  staff  2457 12 Feb 00:37 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.4 KB

1,164 bytes smaller than legacy, 32.1% reduction in size
(reminder that most of this space will be taken by the namespace researches -- with int IDs this would be much smaller)

Tooling

I'm sure we've all used NBT programs in the past to view an NBT file, NBTExplorer is the biggest one, there's also NBT Studio. The goal is for these programs (and others) to be able to view the files. I haven't tested this yet but I think this would be ideal.

However, these only support the compression algorithms that MC support (gzip & zlib). I'd love to use lz4 or zstd. Luckily, MC recently announced lz4 in snapshot 24w04a which will be coming in 1.20.5. This hopefully means one or both of these tools update to support it.
That still doesn't mean they'd support zstd though so maybe we go lz4?

So as we stand today, we can't really go better than gzip/zlib but hopefully soon we can do lz4. Is this ok? How much should we care about existing tooling?

My goal is to avoid building our own 😅

For the reviewers

Open questions

Are we good with NBT? Any other format people really want?
- If another format, why? What advantage does it give?
Do we want to save player data elsewhere?
- We've been saving to data-storage/Slimefun/Players and data-storage/Slimefun/waypoints for a long time. If we want to change that, this is the time
- One change at least I'd like to make is moving waypoints into the player files. I need to figure out exactly how the shared waypoints work though before I do this, they seem to not do anything different but then confused how they're shared.

Important

As we stand today (12th Feb), looking for a "yay" or "nay" on the PR approach and answers to the above questions rather than a "this code and data structure is perfect".

Testing

TODO

TODO

Sign off from reviewers we want to go this way
Finish up NBT side on dough
- Write any more methods that'd be helpful
- Document all functions (snooze)
- Add more tests covering more complex cases + compression
Implement backpack support
Cleanup a bit
Do some thorough testing

After this PR

Work on getting to a more stable/finalized API for storage as a whole
Build migration from legacy -> binary
Work on moving binary storage to default

Census

Yay - 3
@J3fftw1
@ProfElements
@Sfiguz7

Nay - 0

github-actions · 2024-02-11T23:39:17Z

Your Pull Request was automatically labelled as: "🎈 Feature"
Thank you for contributing to this project! ❤️

J3fftw1 · 2024-02-12T08:10:17Z

Are we good with NBT? Any other format people really want?
* If another format, why? What advantage does it give?

I think NBT is the right way, we already discussed loads about this and about other storage options.
but the gist of our discussion was NBT is good since most of the MC community is known with this.
Even non admins/devs know about NBT this is widely used even by players using singeplayer.

Do we want to save player data elsewhere?
* We've been saving to data-storage/Slimefun/Players and data-storage/Slimefun/waypoints for a long time. If we want to change that, this is the time
* One change at least I'd like to make is moving waypoints into the player files. I need to figure out exactly how the shared waypoints work though before I do this, they seem to not do anything different but then confused how they're shared.

My vote would be to move this to ~/plugins/Slimefun/data-storage.
We dont get reports of this often but we do occasionally that people got rid of the data by moving servers.
Saving in ~/plugins/Slimefun/data-storage would make more sense to me.

My questions

How do we get people to use this, we couldn't answer this in our DMs either.
It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in.
I would love to see stats on this as well how many opt-in how many opt-out.
Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

Is there a big difference between gzip, zlib and lz4.
Can we move later on to lz4 if not lets just use what ever is supported now.
on the other hand people wont move to different standards if no one supports it.
Same discussion we had on to use postgres even though no one supports it.
Should we care about existing tooling, yes but no idc much. People need to adapt anyways but idk i will leave that to the others.

Note:

We should run these tests again on a bigger scale bigger server more player data.

Edit:
My answer would be yay for this btw.

WalshyDev · 2024-02-12T12:24:10Z

How do we get people to use this, we couldn't answer this in our DMs either.
It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in.
I would love to see stats on this as well how many opt-in how many opt-out.
Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

The age old question of how to make people use the thing.
Not sure there's really an easy way sadly. We can enable by default for new servers but are we likely to find bugs... or just push people away? New users aren't usually the ones to report stuff.
However, this is how we rolled out item IDs back in the day. Compat mode was disabled by default for new servers and it slowly moved over to be everyone.

We'll definitely log to bStats too... though I can't make a new panel so gonna have to throw it somewhere else

Is there a big difference between gzip, zlib and lz4.
Can we move later on to lz4 if not lets just use what ever is supported now.

We can do whatever we want :p
As for difference, mostly comes down to compression ratio and speed. Lz4 doesn't compress as good as others but it is very fast.
Zstd has the best compression ratio but it's slower.
I'd say these days we value speed over ratio. Disk space is much easier to get than time.

We should run these tests again on a bigger scale bigger server more player data.

Agreed, I need the biggest player file we can find and a bunch of blockstorage later too.

ProfElements · 2024-02-12T17:19:23Z

I think NBT is a good format.

I think that binary storage in general is a good idea.

I think moving away from data-storage might be a misstep if taken due to data-storage being easily accessible instead of a multi-folder dive into plugins or something similar.

Moving way-points into players is probably going to be a big win unless it constantly duplicates data.

I think we should support any compression format Minecraft supports. Zlib/Gzip is a fine alternative until its better for us to break it for a faster or more efficient compression like lz4

Edit: This is a yay for me aswell. I will probably move to binary as soon as its available on dev

Sfiguz7 · 2024-02-14T08:42:57Z

I agree with most of what has been said above.

NBT sounds like a good format to use due to the accessibility - researches taking a bit more space is honestly not a problem.

As long as we compress I don't mind which compression method is used, it's going to be quite the jump compared to what we have had for years no matter what, and if anything speed is probably something that is more important when we have tens of thousands of blocks to take care of, so the fact it is already available just makes it even more welcome of a choice.

I agree with Jeff that moving the data folder inside Slimefun's folder would make sense, no other plugin is ever going to use that folder ever and as a user I had always found it weird how it is not a subfolder: /plugins/Slimefun is the first place I would check when looking for Slimefun related anything.

Overall, yay for me

Seggan · 2024-04-16T16:59:07Z

...va/io/github/thebusybiscuit/slimefun4/storage/backend/binary/serializers/UUIDSerializer.java

+
+        long mostSig = uuid.getMostSignificantBits();
+        long leastSig = uuid.getLeastSignificantBits();
+        int[] ints = new int[] {


NBT has a long array type, I suggest we use that instead of an int array

Seggan · 2024-04-16T17:00:07Z

src/main/java/io/github/thebusybiscuit/slimefun4/storage/backend/binary/CommonKeys.java

+    public static final NamespacedKey WORLD = new NamespacedKey(Slimefun.instance(), "world");
+    public static final NamespacedKey POSITION = new NamespacedKey(Slimefun.instance(), "position");
+    public static final NamespacedKey PITCH = new NamespacedKey(Slimefun.instance(), "pitch");
+    public static final NamespacedKey YAW = new NamespacedKey(Slimefun.instance(), "yaw");


Why aren't these in LocationSerializer?

Seggan · 2024-04-16T17:00:28Z

src/main/java/io/github/thebusybiscuit/slimefun4/storage/backend/binary/BinaryStorage.java

+            waypointTag.putString(CommonKeys.ID, waypoint.getId());
+            waypointTag.putString(CommonKeys.NAME, waypoint.getName());
+            waypointTag.putCompound(CommonKeys.LOCATION, LocationSerializer.INSTANCE.serialize(waypoint.getLocation()));
+
+            waypointMap.putCompound(new NamespacedKey(Slimefun.instance(), waypoint.getId()), waypointTag);


Why don't we use a BinarySerializer for this?

Seggan · 2024-04-16T17:01:21Z

Has the NBT code for dough been PR'd yet? I don't see it so I assume not.

[WIP] Binary storage for player data

4fdeeab

github-actions bot added the 🎈 Feature This Pull Request adds a new feature. label Feb 11, 2024

WalshyDev added the 🔧 Technical Thread This Issue is a technical Thread where developers can discuss technical changes to Slimefun. label Feb 13, 2024

TheBusyBot added the ⚡ Merge Conflicts This Pull Request has merged conflicts which need to be resolved! label Feb 16, 2024

Seggan suggested changes Apr 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Binary storage for player data #4126

[WIP] Binary storage for player data #4126

WalshyDev commented Feb 11, 2024 •

edited by J3fftw1

Loading

github-actions bot commented Feb 11, 2024

J3fftw1 commented Feb 12, 2024 •

edited

Loading

WalshyDev commented Feb 12, 2024 •

edited

Loading

ProfElements commented Feb 12, 2024 •

edited

Loading

Sfiguz7 commented Feb 14, 2024

Seggan Apr 16, 2024

Seggan Apr 16, 2024

Seggan Apr 16, 2024

Seggan commented Apr 16, 2024

[WIP] Binary storage for player data #4126

Are you sure you want to change the base?

[WIP] Binary storage for player data #4126

Conversation

WalshyDev commented Feb 11, 2024 • edited by J3fftw1 Loading

PR Checklist

What changed

Storage choice

Usage example

Legacy file with all researches

Legacy file with all researches and a waypoint

Legacy file with all researches, a backpack and a waypoint

Binary file with all researches (no compression)

Binary file with all researches (gzip compression)

Binary file with all researches and a waypoint (gzip compression)

Tooling

For the reviewers

Open questions

Testing

TODO

After this PR

Census

github-actions bot commented Feb 11, 2024

J3fftw1 commented Feb 12, 2024 • edited Loading

WalshyDev commented Feb 12, 2024 • edited Loading

ProfElements commented Feb 12, 2024 • edited Loading

Sfiguz7 commented Feb 14, 2024

Seggan Apr 16, 2024

Choose a reason for hiding this comment

Seggan Apr 16, 2024

Choose a reason for hiding this comment

Seggan Apr 16, 2024

Choose a reason for hiding this comment

Seggan commented Apr 16, 2024

WalshyDev commented Feb 11, 2024 •

edited by J3fftw1

Loading

J3fftw1 commented Feb 12, 2024 •

edited

Loading

WalshyDev commented Feb 12, 2024 •

edited

Loading

ProfElements commented Feb 12, 2024 •

edited

Loading