Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicharacter bitwise operators? #545

Closed
josh11b opened this issue May 21, 2021 · 17 comments
Closed

Multicharacter bitwise operators? #545

josh11b opened this issue May 21, 2021 · 17 comments
Labels
leads question A question for the leads team

Comments

@josh11b
Copy link
Contributor

josh11b commented May 21, 2021

This subject came up when discussing pointer syntax #523 and this doc, but there is plenty of need to free up symbols for other constructs:

  • lambdas
  • ownership/lifetimes
  • parameter passing (references? rvalues/move?)
  • sum types (might benefit from using | and/or ?)
  • error handling (which might be interested in ? and/or !)
  • generics
  • we want to reserve backtick (`) for markdown / documentation
  • attributes/annotations (maybe @?)
  • coroutines
  • metaprogramming
  • etc.

Note that Carbon goals and design decisions (such as reducing ambiguity and types as values) discourage reusing operators for multiple purposes.

We have a few opportunities to free up a few symbols already:

The bitwise operators were historically also used as boolean operators (search this history for "Neonatal C") which explains why they were given short names and low precedence.

A non-exhaustive list of possible ways of spelling bitwise operators in Carbon:

  • a \& b, a \| b, a \^ b, \~ b
  • a &: b, a |: b, a ^: b, ~: b
  • a .&. b, a .|. b, a .^. b, .~. b
  • a .*. b, a .+. b, a .!=. b, .!. b
  • a /\ b, a \/ b, a (+) b, -|a
  • keywords?

Note: I'm ignoring << and >> since they are already multiple characters.

Question: Should we keep the 1-character symbols for the bitwise operators (&, |, ^, ~) or switch to multiple-character bitwise operators?

Note: I am not asking to actually determine the spelling of bitwise operators if we do switch, except in so far as that would help decide the question.

A third choice would be to combine the ^ and ~ operators, and use a single symbol for both (prefix usage would be complement and infix would be xor).

Advantages of switching to multiple characters:

  • Have good uses for these symbols: & for combining interfaces and address-of, | for sum types and/or lambdas and/or another kind of bracket, ~ for move (C++ uses ~ for destructors, which is suggestive of ending the lifetime), and ^ might be used with pointers.
  • I believe the bitwise operators to be used relatively rarely, though I don't have good statistics. When I tried to estimate this, I got <3% of C++ files use bitwise & and <0.6% use bitwise ^, but I have low confidence in those numbers. (Let me know if collecting usage statistics would be valuable for making this decision, if I spend some time on it I can get some higher-confidence numbers.)

Advantages of preserving:

  • Familiarity for C++ programmers.
  • My guess is that in files that use bitwise operators, they may be used a lot. Similarly, people who use these operators may use them a lot.
  • Being taken seriously by users for low-level performance-oriented code.
@chandlerc
Copy link
Contributor

Not directly related to answering this question, but one thing to keep in mind is that even if we free up these symbols, we should be cautious as we begin to use them in other contexts that these usages aren't likely to be confusing or surprising for C/C++ programmers to read because of their historical usage as bitwise operators.

One easy way of making sure of that is to use them in Carbon as bitwise operators! But this is no motivation for that of course, just something I want us to be mindful of if/when we re-use those symbols. FWIW, many of the concrete examples so far don't seem at all concerning here such as & for combining interfaces, so I'm not trying to say any of the current ideas are problems.

@josh11b
Copy link
Contributor Author

josh11b commented May 24, 2021

It may be helpful to ask what is the alternative here. In particular, how open are we to overloading operators?

For example, & might be: interface combination, bitwise-and, and intersection for set types. Similarly | might be union for sets and sum for types in addition to bitwise-or.

Another consideration is that there are some bitwise operations for which we don't have operators, like bit rotation, popcount, etc.

@chandlerc
Copy link
Contributor

After thinking about this a bunch, I'm largely of the opinion that for familiarity it is reasonably important to keep & and | around for bitwise operations on integers of various sorts. I think the supporting the goals of Carbon around both being performance oriented and being familiar to C++ developers makes this pretty important. And as was pointed out in the question, I think when these do occur in code, they are reasonably dense and benefit from the lexical convenience. Last but not least, I think this will help signal familiarity to C and C++ programmers in helpful ways.

I also don't think this should preclude using & and | for other things in the language -- however this use should be aligned with plausible ways to overload these operators to represent intersection and union semantically.

In contrast, I think recovering the ~ symbol is extremely valuable for the language and will have an extremely minimal cost to familiarity or our other goals.

I think ^ could go either way and I don't feel particularly strongly about it.

Based on these thoughts, I'll suggest two possible approaches that folks can vote on in subsequent comments...

@chandlerc
Copy link
Contributor

chandlerc commented Jul 31, 2021

One candidate approach is to try to keep convenient and familiar punctuation syntax for all of these while reclaiming ~. It turns out Go provides a really nice idea of how to do this:

  • a & b - bitwise and
  • a | b - bitwise or
  • a ^ b - bitwise xor
  • ^b - bitwise complement

I quite like this because it follows from complement being the same as xor with all-ones. Since ^ continues to be used for xor, it seems less appealing to give it some other unary meaning. Here, the unary meaning fits together cleanly with the binary.

I did think about a few other punctuation candidates here...

  • ! (since we're going to use not for logical) - seems way too confusing
  • - with some "bits" type, but this just seems confusing and hard to read

One other variation is actually to force writing -1 ^ b explicitly, but that seems unnecessarily frustrating for developers.

@chandlerc
Copy link
Contributor

Another candidate approach would be to switch the less common operations to keyword operators. I think my favorite would be:

  • a & b - bitwise and
  • a | b - bitwise or
  • a xor b - bitwise xor
  • compl b - bitwise complement

These have the advantage of exactly matching the alternative operator representations from C++ which seems really nice. They won't collide with C++ variables, etc. And they seem readable and clear.

If we really want to consider alternatives, bnot is used in at least some other languages. But the vast majority that use a word here use not which we want to use for the logical operator (and I think is better there).

@chandlerc
Copy link
Contributor

FWIW, I mildly prefer the first of these two:

  • consistency
  • useful for overloaded set-like operations
  • reclaiming ^ and using it differently seems difficult without some confusion

But maybe the last point is my lack of imagination. =]

I'd be fine with either though, they both seem solid, reasonably non-inventive, and reasonably friendly to both new programmers and C++ programmers.

@josh11b
Copy link
Contributor Author

josh11b commented Jul 31, 2021

@chandlerc 0 ^ x == x not the complement of x

@chandlerc
Copy link
Contributor

@chandlerc 0 ^ x == x not the complement of x

Doh, got this backwards. Anyways, 1s ^ x. My point was more that you can use xor for this. LLVM actually does this: https://llvm.org/docs/LangRef.html#xor-instruction

@chandlerc
Copy link
Contributor

A third option from Kotlin is to use a member function like .inv() instead of compl or unary ^. That could be combined with any desired spelling of xor.

(I forgot about this earlier, sorry.)

@josh11b
Copy link
Contributor Author

josh11b commented Jul 31, 2021

I agree with reclaiming ~ is valuable (especially since it is more commonly used to write destructors in C++), and I don't think bitwise complement and xor between them deserve both ^ and ~. I am fine with either of the two options, I think they are both viable. I think it comes down to whether we have another use in mind for ^.

@chandlerc
Copy link
Contributor

I think it comes down to whether we have another use in mind for ^.

FWIW, I quite agree. And if we both end up using ^ and the time comes where we end up with a use case that really needs this, I'd be willing to churn users with a switch to xor in the same way I think we should be willing to claim new keywords when good syntax needs them. These are 100% toolable updates and that's part of Carbon's fundamental premise. I think even if we pick ^ today it should be because of a "YAGNI"-style attempt to not pre-suppose a future use case that we can't even articulate today.

@KateGregory
Copy link
Contributor

Just to point out, since no-one else has, that C++ allows bitand bitor xor and compl. Searching code bases to see if any of them are used might be interesting. I am fine with ^ for not-related things. IBM keyboards used to do a slightly different symbol for shift-6 (like a sideways 7) that was a mathematical "not".

@josh11b
Copy link
Contributor Author

josh11b commented Aug 1, 2021

I can accept that argument. I also find ^b similar enough to ¬b (see https://en.wikipedia.org/wiki/List_of_logic_symbols) or
"\overline b" (from https://en.wikipedia.org/wiki/Complement_(set_theory) ) that it reads fine as bit complement to me (assuming we aren't going to use a non-ASCII symbols like ¬).

@jonmeow
Copy link
Contributor

jonmeow commented Aug 2, 2021

Just to point out, since no-one else has, that C++ allows bitand bitor xor and compl. Searching code bases to see if any of them are used might be interesting.

I believe the practical answer is they're not used: a cursory search shows most uses of xor in C++ files, for example, are actually assembly xor. That's not to say it's not used at all (example), but they're very rare.

Note, this shouldn't be a surprise: and, or, and not are in a similar situation. They are rare, but do exist.

I think it should be assumed that both are essentially unused, <1% of source, possibly <0.1%.


Note, my own leaning would be towards either symbols or keywords, not a mix of both. I think a mix is more likely to cause confusion.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 2, 2021

We could call ^ the "bit flip operator". ^b flips all of b's bits, a ^ b flips just the bits set to one in the other argument.

@chandlerc
Copy link
Contributor

Calling this converged on:

  • a & b - bitwise and
  • a | b - bitwise or
  • a ^ b - bitwise xor
  • ^b - bitwise complement

@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
@jonmeow
Copy link
Contributor

jonmeow commented Aug 11, 2022

I believe this was included in #1191 and no longer needs to be tracked as "Needs proposal".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

4 participants