Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format improvements (WIP) #63

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

zero318
Copy link
Contributor

@zero318 zero318 commented Jun 30, 2022

No description provided.

@zero318 zero318 changed the title Format improvements Format improvements (WIP) Jun 30, 2022
src/core_mapfiles/std.rs Outdated Show resolved Hide resolved
src/llir/abi.rs Outdated Show resolved Hide resolved
src/llir/abi.rs Show resolved Hide resolved
src/llir/abi.rs Outdated Show resolved Hide resolved
@@ -476,14 +476,18 @@ pub enum Expr {
},
}

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub struct IntFormat {
pub unsigned: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might prefer signed rather than unsigned to avoid double negatives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with unsigned since it's more of an exception to the norm than anything. I tend to structure my bools with true as the "do something different" value so that they don't need to be negated whenever being read, but I guess that doesn't matter as much with match since you have to be explicit in both cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true as "do something different" makes sense in C-likes where it's easy to zero-initialize things. It doesn't make so much sense in rust.

src/llir/raise/early.rs Outdated Show resolved Hide resolved
src/llir/raise/early.rs Outdated Show resolved Hide resolved
src/llir/raise/early.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
src/llir/abi.rs Show resolved Hide resolved
@@ -665,7 +679,7 @@ impl AtomRaiser<'_, '_> {
| ArgEncoding::Integer { ty_color: None, format, .. }
=> Ok(ast::Expr::LitInt { value: raw.expect_int(), format: *format }),

| ArgEncoding::Padding
| ArgEncoding::Padding { .. }
=> Ok(ast::Expr::from(raw.expect_int())),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is unreachable now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not unreachable since the padding args are only dropped after being raised thanks to some sort of logic with @arg0. I don't know how important the positioning of that is since you'd left a comment about it being important to reference the original args list, so I chose to leave it there for now. If it's possible to implement the "verify the bytes are 0 or fallback to blob" check at at line 541 before raising all the arguments, then it would be unreachable for sure.

@@ -1082,7 +1103,7 @@ impl InstrFormat for OldeEclHooks {
fn read_instr(&self, f: &mut BinReader, emitter: &dyn Emitter) -> ReadResult<ReadInstr> {
let time = f.read_i32()?;
let opcode = f.read_u16()?;
let size = f.read_u16()? as usize;
let size = f.read_i16()? as usize;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me this is just begging for an Out of Memory error without providing any advantages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is signed though, read with MOVSX. If a size is greater than 0x7FFF, it'll index backwards (and probably explode)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter whether it's signed in game because it gets cast immediately to an unsigned type. So unless you plan to add range checking (which you certainly can do), you're literally just trading one incorrect behavior for another...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range checking would definitely be ideal, though IDK how you'd want the error to be formatted.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems similar to the check in anm

@ExpHP ExpHP marked this pull request as ready for review July 15, 2022 23:05
@ExpHP ExpHP marked this pull request as draft July 15, 2022 23:05
@@ -629,7 +629,7 @@ static ECL_08_09: &'static CoreSignatures = &CoreSignatures {
(Th08, 127, Some(("S", None))),
(Th08, 128, Some(("S(imm)f(imm)f(imm)f(imm)f(imm)", None))), // Argument 1 is unread
(Th08, 129, Some(("b(imm)---", None))),
(Th08, 130, Some((r#"s(imm;enum="EclSub")--"#, None))),
(Th08, 130, Some((r#"s(imm;extend;enum="EclSub")--"#, None))),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extend" is too short of a name for something that is probably only ever going to be used once. Probably want something with multiple words, meaning we might need to pick now whether we want snake_case or kebab-case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're quite a few places where the game engines will read a full dword from the signature in order to check for variables but then only read the lower bits of the resulting value. Ideally extend could be replaced with a more general purpose attribute applicable to those situations as well since that would eliminate the need for extended padding anyway. Maybe this could be written as E(imm;range="s") or something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking this should probably be a dword-sized value with an attribute that limits its value size. However, I really do not want to see abi letters get used inside string literals.

src/llir/abi.rs Outdated
@@ -42,7 +42,7 @@ pub enum ArgEncoding {
/// The first argument may have `arg0` if it is two bytes large. This indicates that the argument is
/// stored in the arg0 header field of the instruction in EoSD and PCB ECL. (which is mapped to the
/// `@arg0` pseudo-argument in raw instruction syntax)
Integer { size: u8, ty_color: Option<TypeColor>, arg0: bool, immediate: bool, format: ast::IntFormat },
Integer { size: u8, ty_color: Option<TypeColor>, arg0: bool, immediate: bool, extend: bool, format: ast::IntFormat },
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting out of hand, let's break this up over lines.

@@ -42,7 +42,7 @@ pub enum ArgEncoding {
/// The first argument may have `arg0` if it is two bytes large. This indicates that the argument is
/// stored in the arg0 header field of the instruction in EoSD and PCB ECL. (which is mapped to the
/// `@arg0` pseudo-argument in raw instruction syntax)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document extend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of formatting should be used for that? The only currently documented attributes are mask and furibug, which don't even use the same format.

Copy link
Owner

@ExpHP ExpHP Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh bugger, I forgot to document enum. That's the only thing I see missing from here.

It should be in prose in the documentation of Integer, just like the documentation for arg0 that I put this review comment under. (basically, "put it here!")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, imm is also missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And hex isn't included either. There're enough attributes that there should probably be a list with the names at the front.

Copy link
Owner

@ExpHP ExpHP Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we probably want to make a markdown list

///
/// Supported attributes:
///
/// * **`imm`**: lorem ipsum dolor sit amet
/// i have a barnicle in my backseat
/// * **`arg0`**: blah blah blah blah

etc

src/llir/abi.rs Outdated Show resolved Hide resolved
src/llir/abi.rs Outdated Show resolved Hide resolved
@@ -424,7 +426,7 @@ fn abi_to_signature(abi: &InstrAbi, abi_span: Span, ctx: &mut CompilerContext<'_

defs::Signature {
return_ty: sp!(value::ExprType::Void),
params: abi.encodings.iter().enumerate().flat_map(|(index, enc)| {
params: abi.encodings.iter().filter(|enc| !matches!(*enc, ArgEncoding::Padding { .. })).enumerate().flat_map(|(index, enc)| {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too noisy. Funnily, this is a flat_map and yet it doesn't even look like anything is currently taking advantage of that; but this is exactly the sort of use case for that, so let's make use of it:

Suggested change
params: abi.encodings.iter().filter(|enc| !matches!(*enc, ArgEncoding::Padding { .. })).enumerate().flat_map(|(index, enc)| {
params: abi.encodings.iter().enumerate().flat_map(|(index, enc)| {

And change the Padding match arm to:

                | ArgEncoding::Padding { .. }
                => return None,  // hide from signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned something like that would end up inserting a None into the resulting collection. Didn't know it would skip over it. Seems like there could be issues in the future though if index is ever used for anything though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I missed the fact that this would change indices, lemme read this over again.

Copy link
Owner

@ExpHP ExpHP Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned something like that would end up inserting a None into the resulting collection.

flat_map flattens the iterators being returned by the closures. Option is an iterator of zero or one items, so None produces no items. Technically, this should be using .filter_map instead (you can just change the .flat_map to .filter_map and it should still compile); this is just an alias for Option-based flat_maps that is clearer in intent.

Oops I missed the fact that this would change indices, lemme read this over again.

So the index is only used to generate arg names. I would pull that out.

// in a line of code outside the loop
let arg_names = (1..).map(|i| ident!("arg_{i}"));

// inside the loop, change this line:
let name = sp!(abi_span => ctx.resolutions.attach_fresh_res(arg_names.next().unwrap()));

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and of course get rid of the enumerate()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to just add another function to the InstrAbi implementation similar to arg_encodings that returns the filtered version? This seems like the sort of thing that would end up continually popping up throughout the code base otherwise. (I would've done that in the first place, but rust complained about the filter output not being compatible with VeclikeIterator or something and I don't know how to make it happy.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm. Crap, that's a pain.

Writing this function is probably a good idea. It will have to return the slightly weaker DoubleEndedIterator instead of VeclikeIterator. (basically because filter can't be ExactSizeIterator, which is the other half of VeclikeIterator)

// Important: we put the shortest iterator (args_iter) first in the zip list
// to ensure that this loop reads an equal number of items from all iters.
assert!(args_iter.len() <= arg_encodings_iter.len());
for enc in arg_encodings_iter.by_ref() {
if let ArgEncoding::Padding { size } = enc {
match size {
1 => args_blob.write_u8(0).expect("Cursor<Vec> failed?!"),
1 => {
if (args_blob.position() & 3) == 0 { small_padding_value = 0i8 }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid one-line ifs. I reserve these for extreme cases (usually only when there are multiple similar ifs in a row and I want to emphasize the similarities between them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll freely admit the entire logic for handling that was a 5 minute quick fix that bolts a random functionality where it doesn't quite belong. I'd love to replace the logic with something better that doesn't feel as dirty.

Copy link
Owner

@ExpHP ExpHP Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the complexity of this function increases (which it is no doubt going to, without end), we're going to want to start factoring out pieces of it into separable units.

This thing right here is easily separated out into a Finite State Machine. You could have something like:

/// Finite-state machine helper for the value of `-` padding.
struct SmallPaddingFillHelper {
    fill_value: i8,
}

impl SmallPaddingFillHelper {
    /// Update the helper in response to the next arg encoding.
    ///
    /// Returns the current fill value for `-` padding.
    fn step(&mut self, enc: &ArgEncoding, blob_position: u64) -> i8 {
        todo!("update self.fill_value");
        self.fill_value
    }
}

and then you could put let small_padding_fill = padding_helper.step(enc, args_blob.position()); near the top of the loop

Copy link
Contributor Author

@zero318 zero318 Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more that I'd rather leave the padding bytes as exclusively 0, eliminate the padding fill variable entirely, and let the values be handled by a range sort of attribute within a single larger value. That way there isn't any more multi-iteration mutable state than there needs to be.

@@ -554,18 +554,25 @@ fn encode_args(
let mut param_mask: raw::ParamMask = 0;
let mut current_param_mask_bit: raw::ParamMask = 1;

let mut small_padding_value = 0i8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to see tests for:

  • basic functionality of extend
  • padding value resetting to 0 at dword boundaries. (s(extend)------)
  • resetting to zero on positive values

That last one would involve

10 s(extend)b-
ins_10(-2, 1)

args_blob should compile to blobify![-2i16, 1i16]. You will need to impl Blobify for i16 here:

impl Blobify for i32 {
(frankly this should probably be using a macro to gen impls for all integer types)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants