Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toLeopard: Inconsistency when renaming variables with illegal characters #120

Open
ZXMushroom63 opened this issue Dec 16, 2023 · 7 comments
Labels
fmt: Leopard Pertains to Leopard format (JavaScript)

Comments

@ZXMushroom63
Copy link

While trying to convert the attatched project, among other things, the _.density and -.density variables were renamed to, density and density2.

However, when I was looking through the code in Engine.js, the RtImpulseGetDataOfObjectWithId function used variables that ended with a '2' along with variables that did not end in a number. This is not really an issue, but it makes trying to use the output of Leopard difficult.

The project: distilled-impulse.sb3.zip

@towerofnix towerofnix changed the title Inconsistency when renaming variables with illegal characters toLeopard: Inconsistency when renaming variables with illegal characters Dec 17, 2023
@towerofnix towerofnix added the fmt: Leopard Pertains to Leopard format (JavaScript) label Dec 17, 2023
@towerofnix
Copy link
Member

I think we're just stripping illegal characters and then tacking on a 2 for the then-same-name second variable. But yeah, it has a pretty harsh effect on variables which were only differentiated by special characters, which is generally what that name deduplication tends to affect AFAIK.

I wouldn't be opposed to ensuring the first variable of a duplicate name gets a 1 at the end, i.e. density1, density2. Would be analogous to costumes and sprite names in Scratch (costume1, costume2, Sprite1, Sprite2).

Just going to /cc @PullJosh since this is a minor change in project translation, do you agree/disagree or have any additional nuance to add?

@PullJosh
Copy link
Collaborator

I definitely like your suggestion of appending a 1 to the first of the would-be-duplicate identifiers. That seems like a strict improvement over what we have now.

Brainstorming other ideas: just recently I translated a project that had a variable called クラウドランキング. Upon conversion to Leopard, that variable was renamed to _, which seems like clearly a bad choice. Since all our Scratch variables end up being keys on the vars object, could we instead use the original name like this.vars["クラウドランキング"]. For the sake of learning JS, I like the dot notation better because it's more conventional for everyday uses.

So here's an idea:

  1. Delete white space in variable names and use camel case where possible
  2. Delete a few special characters that are common in English (like apostrophes)
  3. Use the resulting variable name. If it's not a valid JS identifier, use the this.vars[] notation.
  4. Deal with duplicate identifiers the way @towerofnix described, starting from 1.

Thoughts?

@towerofnix
Copy link
Member

towerofnix commented Dec 17, 2023

I like it! Also pinging @adroitwhiz, not sure you're around but would be nice to get your feedback. We can build on this more later if there are other ideas present, but the outline above feels pretty good.

That said, it's worth noting that クラウドランキング is a valid identifier - in general (which is a somewhat big "in general"), JavaScript has no problem with Unicode characters in identifiers. this.vars.クラウドランキング = 24 is 100% valid JS. See also the presumably more recent reference on MDN, which shares a demonstratory regex: /[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]*/u. And the latest spec here.

I don't think there's a lot of advantage to using bracket notation when dot notation is (albeit surprisingly) functional.

I'd swap step 2 for a more specific "Strip characters that aren't valid in JavaScript identifiers", which would implicitly include apostrophes, exclamation marks, etc. There should be very few variable names that are entirely composed of invalid characters - stuff like -, --, ----- would get turned into this.vars._1, this.vars._2, this.vars._3, but that should be basically it.

@PullJosh
Copy link
Collaborator

Wow! I had no idea all of that was valid! Yeah, I love your proposal of stripping invalid characters but leaving everything else in place.

@ZXMushroom63
Copy link
Author

Another thing you could do is just alphabetically sort the variables before replacing invalid characters. That way, variable names are consistent.

@towerofnix
Copy link
Member

towerofnix commented Jun 10, 2024

D'you mean consistent between multiple times converting the same project? (Does the 1-2-3 variable suffix change order if you convert a project to Leopard, make some changes back in Scratch, and convert it to Leopard again?) Or something else?

edit: Or just to match the alphabetical order in Scratch? If so then yeah, that makes sense to me as something we could easily do, too (at least before addressing this more effectively!).

@ZXMushroom63
Copy link
Author

ZXMushroom63 commented Jun 11, 2024

D'you mean consistent between multiple times converting the same project? (Does the 1-2-3 variable suffix change order if you convert a project to Leopard, make some changes back in Scratch, and convert it to Leopard again?) Or something else?

edit: Or just to match the alphabetical order in Scratch? If so then yeah, that makes sense to me as something we could easily do, too (at least before addressing this more effectively!).

I meant that the issue wouldn't be as bad if variables were sorted before replacing invalid characters and deduping. That way, if you had two variables differentiated by an invalid character, they will consistently have the same suffix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fmt: Leopard Pertains to Leopard format (JavaScript)
Projects
None yet
Development

No branches or pull requests

3 participants