A tiny library for reading & writing Modified UTF-8 binary sequences in Kotlin, be it with Java's streams, Okio's sinks and sources, or Ktor's inputs and outputs.
[Click here for installation steps]
Examples
- Reading Strings
- Writing Strings
- Calculating Binary Sizes
- Using Sources & Destinations
- Custom Sources & Destination
The examples below use java.io.InputStream
, but if you're working with the okio or ktor equivalents, those same
functions work with the Source
and Input
classes from those libraries respectively.
NOTE: If you're using this library from .java
source files with a different I/O library, you'll need to change
me.nullicorn.mewteaf8.Mutf8InputStreams
to the appropriate class:
- For okio, that would be
me.nullicorn.mewteaf8.Mutf8OkioSources
- For ktor, that would be
me.nullicorn.mewteaf8.Mutf8KtorInputs
import me.nullicorn.mewteaf8.readMutf8Array
import me.nullicorn.mewteaf8.readMutf8String
import me.nullicorn.mewteaf8.readMutf8ToAppendable
//...
// The next bytes in this hypothetical stream are, in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
// That's the binary Modified UTF-8 for our shrugging friend, """Β―\_(γ)_/Β―"""
val stream: InputStream // = ...
// To read the entire string, you have 3 choices:
// 1. To read the characters to a regular String:
val string: String = stream.readMutf8String()
// string == """Β―\_(γ)_/Β―"""
// 2. To read it to a new CharArray (or char[] in Java):
val chars: CharArray = stream.readMutf8Array()
// chars contentEquals """Β―\_(γ)_/Β―""".toCharArray()
// 3. To append the characters to an existing Appendable (like a StringBuilder):
val charCollector: Appendable = StringBuilder()
stream.readMutf8ToAppendable(charCollector)
// charCollector.toString() == """Β―\_(γ)_/Β―"""
import me.nullicorn.mewteaf8.Mutf8InputStreams;
import java.io.InputStream;
final class Mutf8Demo {
public static void main(String... args) {
// The next bytes in this hypothetical stream are, in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
InputStream stream;
// To read the entire string, you have 3 choices:
// 1. To read the characters to a regular String:
String string = Mutf8InputStreams.readStringFrom(stream);
// string.equals("Β―\\_(γ)_/Β―")
// 2. To read it to a new char[]:
char[] chars = Mutf8InputStreams.readArrayFrom(stream); // == new char[]{ 'Β―', '\\', '_', '(', 'γ', ')', '_', '/', 'Β―' }
// java.util.Arrays.equals(chars, "Β―\\_(γ)_/Β―".toCharArray())
// 3. To append the characters to an existing Appendable (like a StringBuilder):
Appendable charCollector = new StringBuilder();
Mutf8InputStreams.readToAppendableFrom(stream, charCollector);
// charCollector.toString().equals("Β―\\_(γ)_/Β―")
}
}
The examples below use java.io.OutputStream
, but if you're working with the okio or ktor equivalents, those same
functions work with the Sink
and Output
classes from those libraries respectively.
NOTE: If you're using this library from .java
source files with a different I/O library, you'll need to change
me.nullicorn.mewteaf8.Mutf8OutputStreams
to the appropriate class:
- For okio, that would be
me.nullicorn.mewteaf8.Mutf8OkioSinks
- For ktor, that would be
me.nullicorn.mewteaf8.Mutf8KtorOutputs
import java.io.OutputStream
import me.nullicorn.mewteaf8.writeMutf8Array
import me.nullicorn.mewteaf8.writeMutf8Sequence
// ...
// The stream we'll be writing to. If you want to mess around with this on your own, make it a
// java.io.ByteArrayOutputStream and see how the result of stream.toByteArray() changes with different strings.
val stream: OutputStream // = ...
val shrugString: CharSequence = """Β―\_(γ)_/Β―"""
val shrugArray: CharArray = shrugString.toCharArray()
// Modified UTF-8 data can be written to the stream in a few ways. For our shrugString/shrugArray above, these will all
// produce the following bytes to the stream in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
// 1. If you're starting with a String (or some other CharSequence, like a StringBuilder), use writeMutf8Sequence():
stream.writeMutf8Sequence(shrugString)
// 2. If you're starting with a CharArray, use writeMutf8Array():
stream.writeMutf8Array(shrugArray)
// If you only want to write a portion of the CharSequence/CharArray, like just the "(γ)" of our shrugString/shrugArray,
// you can define a specific range, either as two Ints or a single IntRange. All 4 lines below will produce the
// following bytes to the stream in order, as hexadecimal:
// 00 05 28 E3 83 84 29
stream.writeMutf8Sequence(shrugString, range = 3 until 6)
stream.writeMutf8Sequence(shrugString, startIndex = 3, endIndex = 6)
stream.writeMutf8Array(shrugArray, range = 3 until 6)
stream.writeMutf8Array(shrugArray, startIndex = 3, endIndex = 6)
// For fine-tuning your I/O efficiency (if you're working in a situation where that's critical), you can specify the
// bytesPerWrite, which is the (maximum) number of bytes the functions will flush to the stream at a time. In other
// words, it's the size of the in-memory buffer characters are serialized to before being flushed/written to the stream.
// If you know you'll be writing smaller strings, you can set this lower to save memory. On the other hand, if you know
// you'll be writing long strings, you can set this higher to reduce the number of flushes/writes. Currently, this
// defaults arbitrarily to 1 KiB (1024 bytes), though that is not concrete and could change in future releases at the
// discretion of contributors.
//
// That parameter can be used with both writeMutf8Array() and writeMutf8Sequence(), with and without specifying a range.
//
// For example, to write the "(γ)" part of the string to the stream 1 byte at-a-time (not recommended, purely for the
// sake of the example):
stream.writeMutf8Sequence(shrugString, range = 3 until 6, bytesPerWrite = 1)
// That call produces the following separate flushes/writes:
// 00 05
// 28
// E3
// 83
// 84
// 29
import java.io.OutputStream;
import me.nullicorn.mewteaf8.Mutf8OutputStreams;
final class Mutf8Demo {
public static void main(String... args) {
// The stream we'll be writing to. If you want to mess around with this on your own, make it a
// java.io.ByteArrayOutputStream and see how the result of stream.toByteArray() changes with different strings.
OutputStream stream; // = ...
CharSequence shrugString = "Β―\\_(γ)_/Β―";
char[] shrugArray = shrugString.toString().toCharArray();
// Modified UTF-8 data can be written to the stream in a few ways. For our shrugString/shrugArray above, these
// will all produce the following bytes to the stream in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
// 1. If you're starting with a String (or some other CharSequence, like a StringBuilder), use
// Mutf8OutputStreams.writeSequenceTo():
Mutf8OutputStreams.writeSequenceTo(stream, shrugString);
// 2. If you're starting with a CharArray, use Mutf8OutputStreams.writeArrayTo():
Mutf8OutputStreams.writeArrayTo(stream, shrugArray);
// If you only want to write a portion of the CharSequence/CharArray, like just the "(γ)" of our
// shrugString/shrugArray, you can define a specific range as two Ints . Both lines below will produce the
// following bytes to the stream in order, as hexadecimal:
// 00 05 28 E3 83 84 29
Mutf8OutputStreams.writeSequenceTo(stream, shrugString, /* startIndex = */ 3, /* endIndex = */ 6);
Mutf8OutputStreams.writeArrayTo(stream, shrugArray, /* startIndex = */ 3, /* endIndex = */ 6);
// For fine-tuning your I/O efficiency (if you're working in a situation where that's critical), you can specify
// the bytesPerWrite, which is the (maximum) number of bytes the functions will flush to the stream at a time.
// In other words, it's the size of the in-memory buffer characters are serialized to before being
// flushed/written to the stream. If you know you'll be writing smaller strings, you can set this lower to save
// memory. On the other hand, if you know you'll be writing long strings, you can set this higher to reduce the
// number of flushes/writes. Currently, this defaults arbitrarily to 1 KiB (1024 bytes), though that is not
// concrete and could change in future releases at the discretion of contributors.
//
// That parameter can be used with both writeArrayTo() and writeSequenceTo(), with and without specifying a
// range.
//
// For example, to write the "(γ)" part of the string to the stream 1 byte at-a-time (not recommended, purely
// for the sake of the example):
Mutf8OutputStreams.writeSequenceTo(stream, shrugString, /* startIndex = */ 3, /* endIndex = */ 6, /* bytesPerWrite = */ 1);
// That call produces the following separate flushes/writes:
// 00 05
// 28
// E3
// 83
// 84
// 29
}
}
The length
of a String (often) can differ from the actual number of bytes used to encode it as Modified UTF-8 binary.
But if you're the one serializing those strings, there may be times when you'll need to know the string's encoded size
without having to actually encode it, such as to calculate the serialized size of an in-memory structure (that contains
the strings) beforehand.
We refer to that binary size as the mutf8Length
(Modified UTF-8 Length). It does not account for the 2-byte length
field that comes before most Modified UTF-8 strings' charactersβthe mutf8Length
is the value of those 2 bytes. If
you need to account for those additional 2 bytes, just add 2
to your mutf8Length
.
// You can use the `mutf8Length` extension property directly on a Char to get its individual binary size, or the number
// of bytes that each instance of the character would take up in an encoded string. It will always be either 1, 2, or 3:
//
// Returns `1` because the codepoint of 'A' is U+0041, which is inside the range defined for 1-byte characters,
// `'\u0001' .. '\u007F'`.
'A'.mutf8Length
//
// Returns `2` because the codepoint of 'Β§' is U+00A7, which is inside the range defined for 2-byte characters,
// `('\u0080' .. '\u07FF') + '\u0000'`.
'Β§'.mutf8Length
//
// Returns `3` because the codepoint of 'β¨' is U+2728, which is inside the range defined for 3-byte characters,
// `'\u0800' .. '\uFFFF'`.
'β¨'.mutf8Length
// Most likely you won't need the sizes of individual characters, but entire Strings or CharArrays. For that, the same
// extension function can be used to efficiently sum up the mutf8Length's of a sequence of characters. Specifically,
// this property exists for Chars, CharSequences, and CharArrays.
//
// Returns 3 because each character's mutf8Length is 1:
"XYZ".mutf8Length
//
// Returns 6 because each character's mutf8Length is 2:
"§§§".mutf8Length
//
// Returns 9 because each character's mutf8Length is 3:
"β¨β¨β¨".mutf8Length
//
// Returns 6 because the characters' mutf8Lengths are 1, 2 and 3 respectively, for a total of 6:
"AΒ§β¨".mutf8Length
// As with writing sequences, you can also call mutf8Length() as a function to specify a specific range of indices to
// calculate a sequence's mutf8Length for. This only works for CharSequence and CharArray, not Char, because a Char is
// individual; it has no range to select from.
//
// With no arguments, mutf8Length() behaves the same as the property (see previous examples). Both of these return 13
// because:
// - The "hands" (first and last chars) are 2 bytes each because their codes are U+00A7
// - The slashes are 1 byte each because their codes are U+005C and U+002F for '\\' and '/' respectively
// - The underscores are 1 byte each because their code is U+005F
// - The parentheses are 1 byte each because their codes are U+0028 and U+0029 for '(' and ')' respectively
// - The face/symbol in the middle is 3 bytes because its code is U+30C4
"""Β―\_(γ)_/Β―""".mutf8Length
"""Β―\_(γ)_/Β―""".mutf8Length()
//
// Now let's get the mutf8Length of just the "(γ)" part of the string, which should be 5 (1 for each parenthesis, 3 for
// the symbol/face inside):
"""Β―\_(γ)_/Β―""".mutf8Length(startIndex = 3, endIndex = 6)
import me.nullicorn.mewteaf8.Mutf8Length;
final class Mutf8Demo {
public static void main(String... args) {
// You can use the `Mutf8Length.of()` function on a Char to get its individual binary size, or the number of
// bytes that each instance of the character would take up in an encoded string. It will always be either 1, 2,
// or 3:
//
// Returns `1` because the codepoint of 'A' is U+0041, which is inside the range defined for 1-byte characters,
// '\u0001' through '\u007F'.
Mutf8Length.of('A');
//
// Returns `2` because the codepoint of 'Β§' is U+00A7, which is inside the range defined for 2-byte characters,
// '\u0080' through '\u07FF', and also '\u0000'.
Mutf8Length.of('Β§');
//
// Returns `3` because the codepoint of 'β¨' is U+2728, which is inside the range defined for 3-byte characters,
// '\u0800' through '\uFFFF'.
Mutf8Length.of('β¨');
// Most likely you won't need the sizes of individual characters, but entire Strings or CharArrays. For that,
// the same function can be used to efficiently sum up the mutf8Length's of a sequence of characters.
// Specifically, this property exists for Chars, CharSequences, and CharArrays.
//
// Returns 3 because each character's mutf8Length is 1:
Mutf8Length.of("XYZ");
//
// Returns 6 because each character's mutf8Length is 2:
Mutf8Length.of("§§§");
//
// Returns 9 because each character's mutf8Length is 3:
Mutf8Length.of("β¨β¨β¨");
//
// Returns 6 because the characters' mutf8Lengths are 1, 2 and 3 respectively, for a total of 6:
Mutf8Length.of("AΒ§β¨");
// As with writing sequences, you can also call mutf8Length() as a function to specify a specific range of
// indices to calculate a sequence's mutf8Length for. This only works for CharSequence and CharArray, not Char,
// because a Char is individual; it has no range to select from.
//
// Take our shrugging friend from our other examples; their full mutf8Length is 13 because:
// - The "hands" (first and last chars) are 2 bytes each because their codes are U+00A7
// - The slashes are 1 byte each because their codes are U+005C and U+002F for '\\' and '/' respectively
// - The underscores are 1 byte each because their code is U+005F
// - The parentheses are 1 byte each because their codes are U+0028 and U+0029 for '(' and ')' respectively
// - The face/symbol in the middle is 3 bytes because its code is U+30C4
Mutf8Length.of("Β―\\_(γ)_/Β―");
//
// Now let's get the mutf8Length of just the "(γ)" part of the string. The result will be 5: 1 for each
// parenthesis, 3 for the symbol/face inside):
Mutf8Length.of("Β―\\_(γ)_/Β―", /* startIndex = */ 3, /* endIndex = */ 6);
}
}
The library's primary classes for encoding & decoding are Mutf8Source
(for reading/decoding/receiving data) and
Mutf8Sink
(for writing/encoding/sending data). In fact, the extension functions shown previously use those classes
under the hood. If you take a look at their source code, you'll see that all they do is create a source/sink, write/read
a string's length, and then write/read a string's characters.
Public implementations of those classes exist for working with java.io, okio, and ktor, and you can find those classes
in the respective module's documentation. For example, the java.io interface includes the classes StreamMutf8Sink
and
StreamMutf8Source
.
Using those classes gives you slightly more control over what gets read/written, plus allows you to create custom implementations so that you can use our encoder/decoder with your own I/O library (see here for examples).
import java.io.InputStream
import me.nullicorn.mewteaf8.Mutf8Source
import me.nullicorn.mewteaf8.StreamMutf8Source
// ...
// The next bytes in this hypothetical stream are, in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
val inputStream: InputStream // = ...
// We can then pass that stream to a StreamMutf8Source to start reading the Modified UTF-8 data it contains.
val source: Mutf8Source = StreamMutf8Source(stream = inputStream)
// First, we'll need to read the length of the string, or the first 2 bytes of data. We need this number so that we know
// how many bytes we'll need to read to get the entire encoded string.
//
// This is its own method to allow for flexibility in the encoding/decoding process. For example, if you want to have
// strings' lengths in a table at a different place in the data, then associate lengths to strings on your own. Or, if
// all your strings are the same size, encoding/decoding the length would be wasted time & space, so it could be omitted
// altogether.
//
// In our case, the bytes are 00 0D. If we combine those in big-endian order into an unsigned integer, we get 13. If you
// look back at the bytes, that's exactly how many come after the first 2.
val mutf8Length: Int = source.readLength()
// mutf8Length == 13
// Now, we can read the contents of the string: its characters. Mutf8Sources can be read from in 3 different ways:
//
// 1. As a plain String:
val decodedString: String = source.readToString(mutf8Length)
// decodedString == """Β―\_(γ)_/Β―"""
//
// 2. As a CharArray (or char[] in Java):
val decodedChars: CharArray = source.readToArray(mutf8Length)
// decodedChars contentEquals """Β―\_(γ)_/Β―""".toCharArray()
//
// 3. To the end of an existing Appendable, like a StringBuilder:
val charCollector: Appendable = StringBuilder()
source.readToAppendable(mutf8Length, destination = charCollector)
// charCollector.toString() == """Β―\_(γ)_/Β―"""
import java.io.OutputStream
import me.nullicorn.mewteaf8.mutf8Length
import me.nullicorn.mewteaf8.Mutf8Sink
import me.nullicorn.mewteaf8.StreamMutf8Sink
// ...
val outputStream: OutputStream // = ...
val sink = StreamMutf8Sink(stream = outputStream)
val shrugString = """Β―\_(γ)_/Β―"""
val shrugArray = shrugString.toCharArray()
// If you're writing Modified UTF-8 data that should be readable by receivers other than your own app, you'll need to
// write each sequence's mutf8Length directly before the characters themselves.
//
// This is its own method to allow for flexibility in the encoding/decoding process. For example, if you want to have
// strings' lengths in a table at a different place in the data, then associate lengths to strings on your own. Or, if
// all your strings are the same size, encoding/decoding the length would be wasted time & space, so it could be omitted
// altogether.
//
// Anyway, we can use the `mutf8Length` extension property to get our shrugString's entire binary size. The same
// extension would also work on our shrugArray.
sink.writeLength(shrugString.mutf8Length.toInt())
// The mutf8Length of our string/array is 13. In big-endian hexadecimal, that would be 00 0D, so those are the 2
// bytes written by that call.
// Next is to write the characters themselves. Mutf8Sink gives us 2 main methods of doing that, both of which
// will write the same bytes to our stream. In order, as hexadecimal, those bytes are:
// C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
//
// 1. Writing from a CharSequence, usually a String:
sink.writeFromSequence(shrugString)
// Like other output-related areas of this library, you can also specify a range of indices in the sequence to actually
// be written; characters outside that range will be ignored. The range can either be specified as 2 Ints, startIndex
// (inclusive) & endIndex (exclusive), or as a single IntRange. If you do specify a range, be sure you use the same
// range with `mutf8Length()` too, if applicable; otherwise, your mutf8Length may indicate the wrong number of bytes for
// other decoders to read. Not specifying a range means the entire sequence is encoded.
//
// 2. Writing from a CharArray (char[] in Java):
sink.writeFromArray(shrugArray)
// This function & its overloads also accept a range in the same way as writeFromSequence(). Other than the characters
// coming from a CharArray, the behaviour is identical.
import java.io.InputStream;
import me.nullicorn.mewteaf8.Mutf8Source;
import me.nullicorn.mewteaf8.StreamMutf8Source;
final class Mutf8Demo {
public static void main(String... args) {
// The next bytes in this hypothetical stream are, in order, as hexadecimal:
// 00 0D C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
InputStream inputStream; // = ...;
// We can then pass that stream to a StreamMutf8Source to start reading the Modified UTF-8 data it contains.
Mutf8Source source = new StreamMutf8Source(/* stream = */ inputStream);
// First, we'll need to read the length of the string, or the first 2 bytes of data. We need this number so that
// we know how many bytes we'll need to read to get the entire encoded string.
//
// This is its own method to allow for flexibility in the encoding/decoding process. For example, if you want to
// have strings' lengths in a table at a different place in the data, then associate lengths to strings on your
// own. Or, if all your strings are the same size, encoding/decoding the length would be wasted time & space, so
// it could be omitted altogether.
//
// In our case, the bytes are 00 0D. If we combine those in big-endian order into an unsigned integer, we get 13. If you
// look back at the bytes, that's exactly how many come after the first 2.
int mutf8Length = source.readLength();
// mutf8Length == 13
// Now, we can read the contents of the string: its characters. Mutf8Sources can be read from in 3 different
// ways:
//
// 1. As a plain String:
String decodedString = source.readToString(mutf8Length);
// decodedString.equals("Β―\\_(γ)_/Β―")
//
// 2. As a char array (char[])_:
char[] decodedChars = source.readToArray(mutf8Length);
// java.util.Arrays.equals(decodedChars, "Β―\\_(γ)_/Β―".toCharArray())
//
// 3. To the end of an existing Appendable, like a StringBuilder:
Appendable charCollector = StringBuilder();
source.readToAppendable(mutf8Length, /* destination = */ charCollector);
// charCollector.toString().equals("Β―\\_(γ)_/Β―")
}
}
import java.io.OutputStream;
import me.nullicorn.mewteaf8.Mutf8Length;
import me.nullicorn.mewteaf8.Mutf8Sink;
import me.nullicorn.mewteaf8.StreamMutf8Sink;
final class Mutf8Demo {
public static void main(String... args) {
OutputStream outputStream; // = ...
Mutf8Sink sink = StreamMutf8Sink(/* stream = */ outputStream);
CharSequence shrugString = "Β―\\_(γ)_/Β―";
char[] shrugArray = shrugString.toString().toCharArray();
// If you're writing Modified UTF-8 data that should be readable by receivers other than your own app, you'll
// need to write each sequence's mutf8Length directly before the characters themselves.
//
// This is its own method to allow for flexibility in the encoding/decoding process. For example, if you want to
// have strings' lengths in a table at a different place in the data, then associate lengths to strings on your
// own. Or, if all your strings are the same size, encoding/decoding the length would be wasted time & space, so
// it could be omitted altogether.
//
// Anyway, we can use the static Mutf8Length helper class to get our shrugString's entire binary size. The same
// function would also work on our shrugArray.
sink.writeLength(Mutf8Length.of(shrugString));
// The mutf8Length of our string/array is 13. In big-endian hexadecimal, that would be 00 0D, so those are the 2
// bytes written by that call.
// Next is to write the characters themselves. Mutf8Sink gives us 2 main methods of doing that, both of which
// will write the same bytes to our stream. In order, as hexadecimal, those bytes are:
// C2 AF 5C 5F 28 E3 83 84 29 5F 2F C2 AF
//
// 1. Writing from a CharSequence, usually a String:
sink.writeFromSequence(shrugString);
// Like other output-related areas of this library, you can also specify a range of indices in the sequence to
// actually be written; characters outside that range will be ignored. The range must be specified as 2 ints,
// startIndex (inclusive) & endIndex (exclusive). If you do specify a range, be sure you use the same range with
// `MutfLength.of()` too, if applicable; otherwise, your mutf8Length may indicate the wrong number of bytes for
// other decoders to read. Not specifying a range means the entire sequence is encoded.
//
// 2. Writing from a char array (char[]):
sink.writeFromArray(shrugArray);
// This function & its overloads also accept a range in the same way as writeFromSequence(). Other than the characters
// coming from a CharArray, the behaviour is identical.
}
}
If you're not using okio, ktor, or java.io for your I/O operations, you can still take advantage of the library's
encoding & decoding algorithm by defining your own Mutf8Source
and/or Mutf8Sink
implementation. Those classes simply
act as a middleman between your I/O library's sources/destinations and mew-tea-f8's encoders/decoders.
For example, let's create an implementation of each class where the sources & destinations are just a simple ByteArray
(byte[]
in Java).
import me.nullicorn.mewteaf8.Mutf8Source
class ByteArrayMutf8Source(private val bytes: ByteArray) : Mutf8Source() {
// Increments by 1 for each byte that we read, allowing us to iterate over the array's bytes on an as-needed basis.
// NOTE: Normally, before each function, you'd want to check if by reading more bytes, this index would wind up
// outside the `bytes` array's bounds. For the simplicity of this example, we won't be doing that here.
private var index = 0
// This function's job is to read the next 2 bytes & combine them into an unsigned integer in big-endian order.
override fun readLength(): Int {
// Read the next 2 bytes, in order.
val byte1 = bytes[index++]
val byte2 = bytes[index++]
// This formula can be found in the docs for `Mutf8Source.readLength()`. What it's doing is combining the next
// 2 bytes from the array into a single positive integer. They're combined in big-endian order and are 16 bits
// total, meaning the result can be anywhere from 0 through 65,535; that's the same range as a UShort in Kotlin.
return (byte1.toInt() and 0xFF shl 8) or (byte2.toInt() and 0xFF)
}
// This function's job is to read the bytes that make up a string's characters. Given an `amount`, it's expected to
// return an array of that many bytes, the next ones in the underlying source (the `bytes` array in our case).
override fun readBytes(amount: Int): ByteArray {
// Reads the next few (or many) bytes from the array and returns them as one big array. The returned array's
// size (and the size of the range we're copying) must be equal to the `amount`.
val requestedBytes = bytes.copyOfRange(fromIndex = index, toIndex = index + amount)
index += amount
return requestedBytes
}
}
import me.nullicorn.mewteaf8.Mutf8Sink
class ByteArrayMutf8Sink : Mutf8Sink() {
private var bytesWritten = 0
private var buffer: ByteArray? = null
// Lets you get a copy of the written bytes once you're done writing them.
fun toByteArray(): ByteArray =
buffer?.copyOf(newSize = bytesWritten) ?: ByteArray(size = 0)
private fun ensureCapacity(requiredCapacity: Int) {
// If the buffer hasn't been created yet, initialize it with its size being the required capacity.
if (buffer == null)
buffer = ByteArray(size = requiredCapacity)
// If the buffer is already created but is too small, copy its current contents into a new, big enough array.
else if (buffer.size < requiredCapacity)
buffer = buffer.copyOf(newSize = requiredCapacity)
}
override fun writeLength(mutf8Length: Int) {
// Validate the `mutf8Length`.
require(mutf8Length in 0 .. 65535) { "mutf8Length does not fit in an unsigned 16-bit integer: $mutf8Length" }
// Ensure we have room to add 2 more bytes into our buffer.
ensureCapacity(bytesWritten + 2)
// Write the `mutf8Length` as 2 bytes in big-endian order.
buffer[bytesWritten++] = (mutf8Length shr 8).toByte()
buffer[bytesWritten++] = mutf8Length.toByte()
}
override fun writeBytes(bytes: ByteArray, untilIndex: Int) {
// Validate the `untilIndex` parameter against the `bytes` array.
if (untilIndex !in 0 .. bytes.size)
throw IndexOutOfBoundsException("untilIndex, $untilIndex, is out of bounds for array with size of ${bytes.size}")
// Make room for all the bytes being written.
ensureCapacity(bytesWritten + untilIndex)
// Copy the provided array's contents (up until the `untilIndex`) into the end of our buffer.
bytes.copyInto(destination = this.buffer, destinationOffset = bytesWritten, startIndex = 0, endIndex = untilIndex)
// Shift our working index so that new bytes are inserted after the ones we just wrote, not in the same place.
bytesWritten += untilIndex
}
}
import java.util.Arrays;
import me.nullicorn.mewteaf8.Mutf8Source;
public class ByteArrayMutf8Source extends Mutf8Source {
private final byte[] bytes;
// Increments by 1 for each byte that we read, allowing us to iterate over the array's bytes on an as-needed basis.
// NOTE: Normally, before each function, you'd want to check if by reading more bytes, this index would wind up
// outside the `bytes` array's bounds. For the simplicity of this example, we won't be doing that here.
private int index = 0;
public ByteArrayMutf8Source(byte[] bytes) {
this.bytes = bytes;
}
// This function's job is to read the next 2 bytes & combine them into an unsigned integer in big-endian order.
@Override
public int readLength() {
// Read the next 2 bytes, in order.
byte byte1 = bytes[index++];
byte byte2 = bytes[index++];
// This formula can be found in the docs for `Mutf8Source.readLength()`. What it's doing is combining the next
// 2 bytes from the array into a single positive integer. They're combined in big-endian order and are 16 bits
// total, meaning the result can be anywhere from 0 through 65,535; that's the same range as a UShort in Kotlin.
return (byte1 & 0xFF << 8) | (byte2 & 0xFF);
}
// This function's job is to read the bytes that make up a string's characters. Given an `amount`, it's expected to
// return an array of that many bytes, the next ones in the underlying source (the `bytes` array in our case).
@Override
protected byte[] readBytes(int amount) {
// Reads the next few (or many) bytes from the array and returns them as one big array. The returned array's
// size (and the size of the range we're copying) must be equal to the `amount`.
byte[] requestedBytes = Arrays.copyOfRange(bytes, /* from = */ index, /* to = */ index + amount);
index += amount;
return requestedBytes;
}
}
import java.util.Arrays;
import me.nullicorn.mewteaf8.Mutf8Sink;
public class ByteArrayMutf8Sink extends Mutf8Sink {
private int bytesWritten = 0;
private byte[] buffer = null;
public byte[] toByteArray() {
if (buffer == null) {
return new byte[0];
}
return Arrays.copyOf(buffer, bytesWritten);
}
// Resizes the `buffer` as we write data so that there's always enough room.
private void ensureCapacity(int requiredCapacity) {
if (requiredCapacity < 0) {
throw new IllegalArgumentException("requiredCapacity must be positive, not " + requiredCapacity);
}
// If the buffer hasn't been created yet, initialize it with its size being the required capacity.
if (buffer == null) {
buffer = new byte[requiredCapacity];
}
// If the buffer is already created but is too small, copy its current contents into a new, big enough array.
else if (buffer.size < requiredCapacity) {
buffer = Arrays.copyOf(buffer, requiredCapacity);
}
}
@Override
public void writeLength(int mutf8Length) {
// Validate the `mutf8Length`.
if (0 > mutf8Length || mutf8Length > 65535) {
throw new IllegalArgumentException("mutf8Length does not fit in an unsigned 16-bit integer:" + mutf8Length);
}
// Ensure we have room to add 2 more bytes into our buffer.
ensureCapacity(bytesWritten + 2);
// Write the `mutf8Length` as 2 bytes in big-endian order.
buffer[bytesWritten++] = (byte) (mutf8Length >> 8);
buffer[bytesWritten++] = (byte) mutf8Length;
}
@Override
protected void writeBytes(byte[] bytes, int untilIndex) {
// Validate the `untilIndex` parameter against the `bytes` array.
if (0 > untilIndex || untilIndex >= bytes.length) {
throw new IndexOutOfBoundsException("untilIndex, " + untilIndex + ", is out of bounds for array with size of " + bytes.length);
}
// Make room for all the bytes being written.
ensureCapacity(bytesWritten + untilIndex);
// Copy the provided array's contents (up until the `untilIndex`) into the end of our buffer.
System.arraycopy(/* src = */ bytes, /* srcPos = */ 0, /* dest = */ buffer, /* destPos = */ bytesWritten, /* length = */ untilIndex);
// Shift our working index so that new bytes are inserted after the ones we just wrote, not in the same place.
bytesWritten += untilIndex;
}
}
Depending on how you're already doing reading & writing in your project, this library comes with a couple of submodules for common I/O libraries.
- Click on the dependency(ies) your project needs in the table below
- If your project uses the Kotlin gradle plugin, use the links in the "Dependency (for Kotlin Projects)" column
- Otherwise, if your project is Java (or another JVM language), use the links in the "Dependency (for Java Projects)" column
- On the
mvnrepository.com
page that loads, select the latest version of the dependency - On the next page, select the tab for your project's build tool (Maven, Gradle, etc)
- Underneath the tab you selected, copy the snippet into the relevant part of your build file
- With Maven, for example, paste it inside the
<dependencies> </dependencies>
tag ofpom.xml
- With Gradle, for example, past it inside the
dependencies { }
block ofbuild.gradle
orbuild.gradle.kts
(for Groovy or Kotlin buildscripts respectively)
- With Maven, for example, paste it inside the
I/O Interface | Dependency (for Kotlin Projects) | Dependency (for Java Projects) |
---|---|---|
java.io (InputStream /OutputStream ) |
mew-tea-f8-core | mew-tea-f8-core-jvm |
okio (Source /Sink , BufferedSource /BufferedSink ) |
mew-tea-f8-okio | mew-tea-f8-okio-jvm |
ktor / ktor-io (Input /Output ) |
mew-tea-f8-ktor-io | mew-tea-f8-ktor-io-jvm |
Other / Custom | mew-tea-f8-core | mew-tea-f8-core-jvm |