Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor libcollections as part of collection reform #18519

Merged
merged 1 commit into from
Nov 3, 2014

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 1, 2014

  • Moves multi-collection files into their own directory, and splits them into seperate files
  • Changes exports so that each collection has its own module
  • Adds underscores to module/file names to provide more uniform naming

(that is, treemap::{TreeMap, TreeSet} => tree_map::TreeMap, tree_set::TreeSet)

  • Renames PriorityQueue to BinaryHeap
  • Renames SmallIntMap to VecMap
  • Miscellanious fallout fixes

[breaking-change]

@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2014

r? @aturon

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2014

Still waiting on full results from make-check, but there's some stuff that needs to be discussed about this patch (module renaming) anyway. Might as well parallelize that.

Also note: I couldn't split up bitv properly, as BitvSet is too coupled to Bitv's internal implementation details. I settled for renaming bitv.rs to bit (so that there could be a different bitv module).

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2014

CC #18424

pub use self::set::DifferenceItems;
pub use self::set::UnionItems;
pub use self::set::SymDifferenceItems;
pub use self::set::IntersectionItems;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting design, any particular reason you wanted to remove all the reexports here?

@alexcrichton
Copy link
Member

I kinda like how today when I'm looking for a hash set or a hash map I don't have to think about what module to look inside (there's only one). That being said I don't look much farther than std::collections that often. I suppose I'm curious, could you add some rationale for the changes here as well?

Thanks for this!

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2014

@alexcrichton: @aturon can probably give a better rationale, but here's a few points off the top of my head:

  • this logically decouples the actual implementation from the API. That is, there's no fundamental reason why TreeMap and TreeSet need be the same implementation/module.
  • If iterator names get refactored to just be like Iter, IterMut, IntoIter, then there will be a naming conflict between the maps/sets if they share a namespace. This design avoids that issue.
  • You can import exactly everything relevant to a particular collection with a single glob: use std::collections::treeset::*.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

One observation: in general the module names should be snake cased versions of the type names. That is, whenever you have an embedded capital leader in the type name, it should map to an underscore in the module name. Examples: d_list, tree_map, small_int_map.

I realize that some of these are a bit odd/painful, but naming consistency is important. In a lot of these cases, you'll know the type name by heart, and being able to translate directly to the module name is helpful.

(We could potentially consider a different convention, but snake case is a long-standing one and we'd need to actually spell out the rules, go through an RFC, etc.)

@Gankra
Copy link
Contributor Author

Gankra commented Nov 2, 2014

Yes, I brought this up on IRC and the overwhelming preference was alllowercase. It was noted that the official recommendation of PEP8 is:

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Which I agree with. Underscores are annoying/noisy in modules, especially with things like d_list and b_tree.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@gankro But that's Python's guidelines, not ours! We have a strong convention toward snake case for "value-level" items like modules and functions. Clearly the proposal here isn't to throw that out in general, or even just for modules. But how would you decide when to use which convention? The conventions lose much of their benefit if there's no consistent way to follow them. (Things like "if it improves readability" immediately mean I can no longer mechanically come up with the right name; it's a subjective choice.)

(Not trying to put this all on you, just thinking out loud about the issue.)

What would you think about having the type names Btree, Dlist and so on? Part of the concern here is consistency in chunking into "words" at both the type and module level.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@gankro More broadly, I wouldn't take much away from the IRC discussion (which was basically like: "yeah those seem unfortunate".) We're trying to set up official conventions to help nail down these issues. The basic naming conventions are basically just de facto standards based on long-running practice, but as we're seeing they're not 100% followed. Maybe this suggests that even these basic conventions need to go through a round of RFC, to resolve the kind of issue you're raising.

In any case, we certainly don't need to hold up the PR for this, since these problems already existed in the collections. But I would like to have an official resolution.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@alexcrichton I think @gankro laid out several of the important benefits of this design, but here are a few others:

  • It institutes a general convention for collections (which is followed in a lot of other places to): modules defining an important type are named after that type, and contain definitions only relevant to it.
  • In increases overall consistency and therefore makes things more mechanical: if I want stuff relevant to HashMap, I import hashmap, end of story. (Note that this would work even better if the word splitting matched, i.e. hash_map, as I'm arguing above.)

Basically, I think the fact that hash maps and sets lived in the same module is something you had to learn, even if it now seems convenient, and it only applies in certain cases (where there's both a set and a map implemented in the same way). Consistently splitting up the modules means there's less to learn/remember, and I think it actually improves ergonomics (as @gankro mentioned, making globs more useful).

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@gankro This looks good to me; let's resolve the module naming issue separately.

I want to give @alexcrichton a chance to weigh in on the design rationale before I r+.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 2, 2014

I would just push for alwayslowercase on module/file names, and maybe snake case for crates (since that seems more consistent). The snake_case convention on module names doesn't strike me as very consistently implemented. There's these collection modules like smallintmap (which I didn't change, it was always like that), as well as stdio and bitflags. Most modules are a single word, so it's hard to get a lot of data.

Module names strike me as a place where writeability is much more important than readability, as they're going to be sprinkled around a lot as namespaces, but their actual meaning is unimportant to the reader.

I'm not a huge fan of Btree of Dlist, as they're supposed to be shortenings of the proper names B-Tree and DoublyLinkedList. Although I've typo'd those enough times out of uncertainty to not have a strong resolve.

As a not-very-good data point, settling on this convention has less breakage for the collections, as alllowercase is the overwhelming convention there.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@gankro There are a couple of problems with just going for alwayslowercase. For one, as even its name suggests, it is often not very readable -- hence the PEP recommendation to sprinkle underscores when it "helps readability".

Another downside is it means proliferation of more, different naming styles. Why is snake case OK when typing function and method names (which are more commonly multiword) but not modules?

You're right that it's hard to get good data here, since most modules are single words. But snake case is definitely followed to some degree throughout (e.g. from_str, lru_cache, etc.) I think stdio is being understood as a single "word" or something -- at any rate there's no type like StdIo.

What about a convention that with snake case you never do foo_x_bar where x is a single character -- x_ would always just become x? That seems like it would deal with the worst offenders (like d_list) but leave us with the pretty reasonable tree_map.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 2, 2014

@aturon @alexcrichton I've pushed a second commit with the suggested changes.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

@gankro Awesome, I feel like that's a clear improvement and am glad to see the name SmallIntMap go away.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 2, 2014

ick, the change I made to reference.md is nonsensical. I think the whole example is nonsensical now since hashset and hashmap are separate, but I don't know what to replace it with. Never seen the mod keyword in a use before, and I can't parse what exactly it's supposed to be doing.

@alexcrichton
Copy link
Member

This all sounds great to me, thanks for the explanations @aturon and @gankro! r=me with some squashings

@Gankra Gankra force-pushed the collect-smash branch 2 times, most recently from d571304 to dda3a34 Compare November 2, 2014 20:23
@Gankra
Copy link
Contributor Author

Gankra commented Nov 2, 2014

All squashed. I was able to mostly verify the patch, but I've got a local corruption that's making a few run-pass consistently fail. Everything after that is untested. Need to figure that out, somehow.

@aturon
Copy link
Member

aturon commented Nov 2, 2014

Awesome, @gankro. Sending to @bors, we can see if there's any legit trouble.

* Moves multi-collection files into their own directory, and splits them into seperate files
* Changes exports so that each collection has its own module
* Adds underscores to public modules and filenames to match standard naming conventions

(that is, treemap::{TreeMap, TreeSet} => tree_map::TreeMap, tree_set::TreeSet)

* Renames PriorityQueue to BinaryHeap
* Renames SmallIntMap to VecMap
* Miscellanious fallout fixes

[breaking-change]
@Gankra
Copy link
Contributor Author

Gankra commented Nov 3, 2014

rebased for merge conflicts

@aturon
Copy link
Member

aturon commented Nov 3, 2014

RFC on the snake case convention we settled on: rust-lang/rfcs#430

@Gankra
Copy link
Contributor Author

Gankra commented Nov 3, 2014

Confirmed builds/checks on my lab machine.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 3, 2014
@bors bors merged commit 112c8a9 into rust-lang:master Nov 3, 2014
lnicola added a commit to lnicola/rust that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants