Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Simpler array hashing #26022

Merged
merged 8 commits into from
Aug 2, 2018
Merged

RFC: Simpler array hashing #26022

merged 8 commits into from
Aug 2, 2018

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Feb 13, 2018

This is a straw-man implementation of a simpler array hashing scheme. It's very basic and has room for further optimizations, but it's intention is to provide a starting point as an alternative to #25822. In short: This hashes the size of the array and then the first three distinct elements and their linear indices and then the last three distinct elements and their linear distance from the end of the array.

The exact scheme here is open to bikeshedding -- we could use more or less distinct elements. I use "distinct elements" as a way of ensuring that all relatively empty sparse arrays don't hash similarly, and I hash elements at both the beginning and end of the array because they're the simplest to get to and final elements will be the "most expensive" to discover that they differ if we have to fall back to an equality check. The most complicated part is keeping track of what you hashed in order to prevent running through the entire array twice (once forwards and once backwards).

@mbauman mbauman added the arrays [a, r, r, a, y, s] label Feb 13, 2018
@nalimilan
Copy link
Member

Thanks for this. I certainly appreciate the simplicity of this approach compared with the current one. I think we should keep in mind a third possible approach, though, which is to just drop O(1) hashing for ranges (it's not clear that's very useful in practice) and use a simple solution similar to what was implemented before #16401. As a hybrid solution, we could also continue to hash all elements for multidimensional arrays (as done both before and after #16401), and only change how vectors are hashed, since that's what is needed to hash ranges in O(1) time.

Assuming we adopt the general approach from this PR (i.e. do not hash all elements nor all differences between subsequent elements, but only a part of the information), several other quantities could be included in the hash, with the objective of making collisions less likely when modifying a single entry in the middle of the array:

  • hash the sum of all elements
  • hash the number of nonzero elements
  • hash the indices of series of nonzero elements (i.e. ranges of indices rather than individual indices)

These quantities have the advantage that they can be computed in O(1) time for (integer, at least) ranges and relatively efficiently for sparse matrices. But they require going over the full array, making the computation of hashes O(N) for general arrays while your PR is currently O(1).

Overall I think we should identify a few typical use cases to decide which approach is the best one. Without that it difficult to know what's the best tradeoff between computation cost and probability of collision. In general I'm a bit concerned about the fact that changing the value of entries in the middle of the array wouldn't change the hash: ideally collisions shouldn't happen in such trivial cases IMHO. But maybe that's fine in practice and the reduction in hashing cost is worth it. I'll just leave a pointer to JuliaLang/Distributed.jl#48, which AFAICT isn't a very correct use of array hashing, but which is going to become even more problematic if we make collisions much more likely (it could use a custom algorithm if needed).

Finally, let me note that we still have the possibility of continuing to hash all elements for multidimensional arrays (as was done before and after #16401), and only change how vectors are hashed, since that's what is needed to hash ranges in O(1) time.

Doing some research, I've found only two examples of array hashing in main languages:

# Efficient O(1) method equivalent to the O(N) AbstractArray fallback,
# which works only for ranges with regular step (RangeStepRegular)
function hash_range(r::AbstractRange, h::UInt)
h += hashaa_seed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I forgot to include the hashaa_seed in the computation

# But we cannot define those here, due to bootstrapping. They're defined later in iterators.jl.
function hashndistinct(A, I, n, hx)
_hashcheckbounds(A, I)
seen = Set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be better to use a Vector for this since the number of items is small, and it will avoid hashing them multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of using a specialized 3-vector to avoid allocations entirely, but figured we could decide on the direction and exact behavior first before optimizing it further.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Feb 14, 2018

This is a good optimization. These are some hash admixes worth consideration.

in some manner, one from one or more of these pairs may help

(a) the shape (dims) # reducingly hash each dim size
(α) the length (prod(dims))

(b) the initial 1st, 2nd, 7th and final 1st (end), 2nd, 7th (end-6) (or missings)
(β) the initial and final two and initial and final fn(length)th element
fn(length) = mod(maxmin(length, 61)...,) + 1

(edited to comport with reality, thank you for the note, Milan)

@nalimilan
Copy link
Member

@JeffreySarnoff (0), (a) and (α) do not matter for isequal, so they definitely cannot enter the hash.

@mbauman
Copy link
Member Author

mbauman commented Feb 14, 2018

So the options and their tradeoffs here are:

  • If we don't hash every single element, it's easy to mutate an array and have its hash remain the same. That's just fine, though, since hash collisions are allowed and will fall back to equality checks. We just don't want it to happen too often, since those equality checks could be expensive to repeatedly perform.
  • If we hash every single element, it's very expensive for sparse matrices to compute every single element. Making hashing ranges O(N) might make an otherwise cromulent structure like 1:typemax(Int) stall till the heat death of the universe if you happen to put it in a set.
  • If we do something clever, we have to be doubly clever in order to deal with heterogeneous arrays of things that might not support the cleverness or might overflow/underflow/change precision (e.g., subtraction for diff, addition for sum, promotion and precision difficulties in both cases). The only sort of cleverness I'd advocate for is the kind that only relies upon hashing and equality — like run-length encoding or distinct elements.

I'm becoming more convinced that an O(1) hashing method is the way to go here. I'd happily extend the number of elements to help convince assuage you.

@nalimilan
Copy link
Member

I generally agree, the current approach is too brittle and complex. I still a bit sad to move to an approach which makes it so easy to get collisions. Maybe I just need some time to accept this. ;-)

Another less fragile possibility which occurred to me would be to use another property of ranges, which is less strict than computing differences between subsequent elements but also less problematic to check: the fact that ranges are monotonically increasing or decreasing. We could continue to hash all elements for vectors which are not sorted and multidimensional arrays (keeping RLE for efficiency with sparse matrices of course), but use the O(1) approach for sorted vectors (for element types which support isless). In practice, it should be quite rare that a user would hash several ordered vectors of the same length and which start and end with the same elements, so this should dramatically reduce the risk of collisions.

@JeffreySarnoff
Copy link
Contributor

I agree that loosing discrimination is a negative (there are algorithms that thrive on some probable assurance of differents hashing differently, so things need be visited once to be useful with other things of a group).

Increasing the span at front and back that are hashed, and speckle-ing some of the ith quartile positions (or a few thereabout) should do a decent of job of discrimination, especially where the length>>some mod a_length_relevant_prime is used to pick the span between successive hash-sampled values (or value pairs).

Or for speed, keep a sequence of indices to pluck and let them wrap on the length.

@JeffBezanson
Copy link
Member

One option might be to use exponentially-spaced indices, and hash log(n) points.

@JeffreySarnoff
Copy link
Contributor

... and interweave those indicies, taking them from [1] increasing and from [end] decreasing, to disambiguate small vectors and similar arrays more successfully?

@JeffreySarnoff
Copy link
Contributor

struct EntityHash{T<:Union{UInt64, Missing}}
    eachentity::UInt64  # every hashable thing has this valued
    collective::T       # an aggregative thing may have this valued
                        #    lazily done only if needed or where
                        #    the e.g. write stream does the calc
end

Things that are items or elements have sizeof(EntityHash) == sizeof(UInt64)
Things that are collectives and are more fully hashed have twice that size

@martinholters
Copy link
Member

We could continue to hash all elements for vectors which are not sorted and multidimensional arrays (keeping RLE for efficiency with sparse matrices of course), but use the O(1) approach for sorted vectors (for element types which support isless).

Fails for the totally_not_five from #26034. (Not that that example seems particularly relevant, but the fact the it is so easy define an object which breaks the approach is concerning.)

@mbauman
Copy link
Member Author

mbauman commented Feb 16, 2018

Taking log(n) hashes is quite compelling — particularly if we increase distance between hashes as we iterate from the back of the array, since that weights the elements involved in the hash towards the "most expensive" to differentiate with isequal. The major downside I see there is that many sparse matrices will have fewer than log(n) stored values.

What if we used a combination of these approaches?

  • Given array A, consider the list of hashes hxs that contains the hashes of those elements that are distinct from their neighbor, working backwards from the end of the array. That is:
    hxs = [hash(A[end]); [hash(A[i]) for i in lastindex(A)-1:-1:firstindex(A) if hash(A[i]) != hash(A[i+1]) && A[i] != A[i+1]]
  • hash(A) is sum(hxs[[2^i for i=0:floor(Int, log2(end))]])

Of course, you would never compute it this way, but it can be done iteratively.

This has the slightly strange effect that we compute the hash of every element for Array but don't use them all in the computation, but I think it's way better than taking the difference of every element. The advantages are: fast iteration through sparse arrays (often just alternating between a hash of zero and a hash of a stored value), and we only need to hash O(log(N)) elements in computed arrays where we know successive elements are not equal.

@mbauman
Copy link
Member Author

mbauman commented Feb 16, 2018

The proposal in my last post still leaves a bad taste in my mouth. If we're going to do log(N), let's make it work for all array types. Here's another idea:

  • Iterate the array backwards
  • Add the hashes of the first 3 (or so) distinct values you see, stopping at index j
  • Launch into a blind log(N) hashing regime starting at index j, adding sum(hash.(A[[j - 2^i for i in 0:floor(log2(end-j)))]))

@StefanKarpinski
Copy link
Member

I'm a bit concerned that hashing log(N) entries in a sufficiently sparse array will tend to catch none of the non-zeros.

@mbauman
Copy link
Member Author

mbauman commented Feb 16, 2018

Right — that's my second proposal. In short: Change this PR (which does O(1) hashing of some small number of distinct elements from the front and back) to ignore the forward iteration and augment the backwards iteration with log(N) hashes after it's hashed the small number of distinct elements.

@nalimilan
Copy link
Member

That's an interesting solution. There are a few things I don't understand:

  • Why do you need to iterate backwards?
  • Why do you compare the hashes of subsequent elements before comparing them directly?
  • Why sum the hashes rather than combine them with hash?

@mbauman
Copy link
Member Author

mbauman commented Feb 16, 2018

Great questions:

  • I iterate backwards because of the asymmetry between == and hash. In the case of a hash collision, we fall back to ==. By default and most commonly, == iteratively compares its elements walking forwards through the array. This means that — if the arrays are indeed different — it'll probably find that faster. I would bet this is why C# hashes the last 8 elements.
  • I compare hashes because I had been thinking we were already computing them — either as part of the total array hash or in a Set to hold the distinct elements. Might as well use them!
  • I sum h += hash(A[i]) instead of chaining with hash(A[i], h) in order to allow for the reuse of the the hashes.

But you're right — it's probably not worth hashing elements that don't get used in the total hash, and we probably don't want to use a Set for such a limited number of elements and comparisons.

@nalimilan
Copy link
Member

I think we should proceed with any reasonable solution so that the API is stabilized and BigFloat can be moved to stdlib (#25716). We can always switch to a different approach in the future since it's not breaking.

@JeffreySarnoff
Copy link
Contributor

I'm a bit concerned that hashing log(N) entries in a sufficiently sparse array will tend to catch none of the non-zeros.

Unclear that log(N) is the "hatrack" .. the same would be true of sqrt(N) samples.

Milan's push to get this one done so other contingencies can move along makes sense. It all sounds non-blocking and pathway providing.

Maybe it is worthwhile to keep a short vector of linearized indices to nonzero values to provide informed indirection. 32 or 64 Int64s could do that for a sparse array of 10 trillion elements. If the data structure knows the index of its first and its final nonzero entries, two of those indices could be rewritten when that internal state is rewritten. Two can get two more (the next innermost) and the structure may find that useful too, giving the hash at worst 4 nonzero values and their linearized indexings -- 8 nonzero Ints to hash or better

@nalimilan
Copy link
Member

@mbauman Do you plan to finish this, or do you need help?

@JeffBezanson JeffBezanson added this to the 1.0.x milestone May 30, 2018
@mbauman
Copy link
Member Author

mbauman commented Jun 29, 2018

Yes, let's resurrect this. I have a WIP branch locally where I started implementing #26022 (comment) a while ago, but I stalled on it (and then forgot about it) because I didn't like how hard it was to either implement or describe. I'll take a second look this afternoon, and see if I can find some simplifications.

@StefanKarpinski
Copy link
Member

Here's another thought: hash arrays in such a manner that if an array approximates an range, it hashes like it and just take the hit of those two objects being in the same bucket. After all, what are the chances that you're hashing both arrays and ranges together and have an array that is almost equal to a range but not quite exactly equal to it? That's the only bad case with such an approach.

@oscardssmith
Copy link
Member

won't that fail because any bucket will have an edge that will have the same precision problems that your were trying to avoid?

…ments

    Goal: Hash approximately log(N) entries with a higher density of hashed elements
    weighted towards the end and special consideration for repeated values. Colliding
    hashes will often subsequently be compared by equality -- and equality between arrays
    works elementwise forwards and is short-circuiting. This means that a collision
    between arrays that differ by elements at the beginning is cheaper than one where the
    difference is towards the end. Furthermore, blindly choosing log(N) entries from a
    sparse array will likely only choose the same element repeatedly (zero in this case).

    To achieve this, we work backwards, starting by hashing the last element of the
    array. After hashing each element, we skip the next `fibskip` elements, where
    `fibskip` is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple
    ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension and
    only end up hashing one slice of the array (as might happen with powers of two).
    Finally, we find the next distinct value from the one we just hashed.
@mbauman
Copy link
Member Author

mbauman commented Jul 11, 2018

Thanks @nalimilan — I like the design you proposed in #26022 (comment) much better.

fibskip, prevfibskip = fibskip + prevfibskip, fibskip

# Find a key index with a value distinct from `elt` -- might be `keyidx` itself
keyidx = findprev(!isequal(elt), A, keyidx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IIUC this finds the first entry after keyidx which differs from the last hashed element. That's a bit different from what I had in mind: I was thinking about finding the first element which differs from the one at keyidx. I'm not sure it's really better, but the idea was that since in a sparse array it's likely that keyidx hits a structural zero, looking for the previous distinct element makes it likely you'll hash a non-zero entry. With your approach, if you hash a zero the first time, you will look for the previous non-zero entry the next time, but if you hit a zero entry the time after that, you'll happily hash it; so you'll end up hashing a zero half of the time, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly right. I think it makes the behavior a little more robust — otherwise the hashes of sparse arrays with a nonzero last element will more likely hash the same. I also think it's most likely that diagonals of sparse matrices are filled.

That said, this is now clearly not hashing enough elements. The test failures are from four-element arrays colliding. Gotta slow down the exponential a little bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly right. I think it makes the behavior a little more robust — otherwise the hashes of sparse arrays with a nonzero last element will more likely hash the same. I also think it's most likely that diagonals of sparse matrices are filled.

Sorry, I'm not sure I follow. Could you develop?

That said, this is now clearly not hashing enough elements. The test failures are from four-element arrays colliding. Gotta slow down the exponential a little bit.

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why use findprev, which requires all this keys vs. linear indices dance, instead of a plain loop? I imagine one reason could be that findprev could have a specialized method for sparse arrays which would skip empty ranges, but currently that's not the case. Is that what you have in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood your comment at first — I thought you were suggesting only unique'ing against the very first element. Now I understand that you mean to access the value after each skip, and then hash the next element that's different from it.

There are two reasons to do the keys vs. linear indices dance: one is findprev, but the other is that I want to hash index-value pairs to add a bit more information about the structure of the array. And of course, we cannot introduce a hashing difference between IndexLinear and IndexCartesian arrays. The fact that we then also allow arrays to opt into optimizations via findprev is a nice bonus, especially since it's going to be a pain to completely re-write your own hash optimization that has exactly the same behavior.

@mbauman
Copy link
Member Author

mbauman commented Jul 13, 2018

Ok, I've slowed down that exponential — by quite a bit. I now only grab the next number in the Fibonacci sequence after every 4096 hashes. I chose this number by considering:

  • The large anticipated penalty for a hash collision. I wanted to maximize the size of the array that gets 100% of its elements hashed… especially in light of remotecall_fetch vulnerable to hash collisions? Distributed.jl#48.
    • Every element in arrays smaller than length 4096 (64x64 or 8x8x8x8) is included in the hash computation.
    • For an array of size 8192 with no sequentially repeated elements, 75% of its elements are hashed: every other element is hashed in the first half of the array, and every element is hashed in the second half.
  • The time it takes to hash ranges with typemax(Int) elements as they used to be instantaneous — so I wanted to keep this at least somewhat tractable. With this choice on my beefy machine, it takes ~0.5 s to hash (1:typemax(Int64))/pi, and ~10 ms to hash 1:typemax(Int64).
    • For an array of a billion elements with no repeats, 99,915 elements are hashed (10^5/10^9 — 0.01%)
    • For an array of typemax(Int) elements with no repeats, 276,303 elements are hashed (10^5.5/10^19).

This is the balance point that seems reasonable to me — a power of two just because it's slightly cheaper to perform the mod.

@nalimilan
Copy link
Member

nalimilan commented Jul 14, 2018

Makes sense. I wonder whether we couldn't find a function which increases slowly for common sizes, but quite faster when sizes get really large. Or maybe we could just apply an arbitrary threshold beyond which we stop hashing entries? A Vector{UInt8} with typemax(Int) elements takes 8e6 TB, which hardware won't be able to handle for decades or centuries. It wouldn't be terrible if we stopped at typemax(Int)/1e6 elements.

EDIT: the goal being to ensure even the largest possible range doesn't take one second to hash.

# entirely-distinct 8000-element array will have ~75% of its elements hashed,
# with every other element hashed in the first half of the array. At the same
# time, hashing a `typemax(Int64)`-length Float64 range takes about a second.
if mod(n, 4096) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since performance is important here, why not replacing the mod with explicit bit twiddling? In this case I would suggest replacing this line with:

if (Int==Int64 && n<<52==0) || n<<20==0

That's because 4096 = 2^12 and 52 = 64 - 12 and 20 = 32 - 12. Also the Int==Int64 part will get inlined so this will be effectively replaced by either n<<52==0 or n<<20==0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use rem(n, 4096) == 0 the compiler takes care of the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget about the rem and %

@JeffBezanson
Copy link
Member

Bump. Could we add the test cases from #27865, #26011, and #16401 (comment) and merge this?

@mbauman
Copy link
Member Author

mbauman commented Jul 26, 2018

Ok, test cases pushed. There are, of course, many other schemes that we could dream up here… but triage was in support of merging this as it stands.

To help avoid JuliaLang/Distributed.jl#48, I'm putting together another PR that will basically make Distributed use its own hashing system — the constraints are a little different there since we don't need to worry about hash equality across different types. I don't think the two need to be merged at the same time, but I should have that ready to go by tomorrow.

@JeffBezanson JeffBezanson merged commit b0bf91e into master Aug 2, 2018
@JeffBezanson JeffBezanson deleted the mb/simplerarrayhashing branch August 2, 2018 15:34
KristofferC pushed a commit that referenced this pull request Feb 11, 2019
    Goal: Hash approximately log(N) entries with a higher density of hashed elements
    weighted towards the end and special consideration for repeated values. Colliding
    hashes will often subsequently be compared by equality -- and equality between arrays
    works elementwise forwards and is short-circuiting. This means that a collision
    between arrays that differ by elements at the beginning is cheaper than one where the
    difference is towards the end. Furthermore, blindly choosing log(N) entries from a
    sparse array will likely only choose the same element repeatedly (zero in this case).

    To achieve this, we work backwards, starting by hashing the last element of the
    array. After hashing each element, we skip the next `fibskip` elements, where
    `fibskip` is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple
    ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension and
    only end up hashing one slice of the array (as might happen with powers of two).
    Finally, we find the next distinct value from the one we just hashed.

Fixes #27865 and fixes #26011.

Fixes #26034
@nalimilan nalimilan mentioned this pull request Oct 9, 2023
5 tasks
nsajko added a commit to nsajko/julia that referenced this pull request Sep 25, 2024
Implementing `widen` isn't a requirement any more, since JuliaLang#26022.
oscardssmith pushed a commit that referenced this pull request Oct 10, 2024
Implementing `widen` isn't a requirement any more, since #26022.
KristofferC pushed a commit that referenced this pull request Oct 18, 2024
Implementing `widen` isn't a requirement any more, since #26022.

(cherry picked from commit e95860c)
udesou added a commit to mmtk/julia that referenced this pull request Oct 22, 2024
* Add filesystem func to transform a path to a URI (#55454)

In a few places across Base and the stdlib, we emit paths that we like
people to be able to click on in their terminal and editor. Up to this
point, we have relied on auto-filepath detection, but this does not
allow for alternative link text, such as contracted paths.

Doing so (via OSC 8 terminal links for example) requires filepath URI
encoding.

This functionality was previously part of a PR modifying stacktrace
printing (#51816), but after that became held up for unrelated reasons
and another PR appeared that would benefit from this utility (#55335),
I've split out this functionality so it can be used before the
stacktrace printing PR is resolved.

* constrain the path argument of `include` functions to `AbstractString` (#55466)

Each `Module` defined with `module` automatically gets an `include`
function with two methods. Each of those two methods takes a file path
as its last argument. Even though the path argument is unconstrained by
dispatch, it's documented as constrained with `::AbstractString`:

https://docs.julialang.org/en/v1.11-dev/base/base/#include

Furthermore, I think that any invocation of `include` with a
non-`AbstractString` path will necessarily throw a `MethodError`
eventually. Thus this change should be harmless.

Adding the type constraint to the path argument is an improvement
because any possible exception would be thrown earlier than before.

Apart from modules defined with `module`, the same issue is present with
the anonymous modules created by `evalfile`, which is also addressed.

Sidenote: `evalfile` seems to be completely untested apart from the test
added here.

Co-authored-by: Florian <[email protected]>

* Mmap: fix grow! for non file IOs (#55849)

Fixes https://github.com/JuliaLang/julia/issues/54203
Requires #55641

Based on
https://github.com/JuliaLang/julia/pull/55641#issuecomment-2334162489
cc. @JakeZw @ronisbr

---------

Co-authored-by: Jameson Nash <[email protected]>

* codegen: split gc roots from other bits on stack (#55767)

In order to help avoid memory provenance issues, and better utilize
stack space (somewhat), and use FCA less, change the preferred
representation of an immutable object to be a pair of
`<packed-data,roots>` values. This packing requires some care at the
boundaries and if the expected field alignment exceeds that of a
pointer. The change is expected to eventually make codegen more flexible
at representing unions of values with both bits and pointer regions.

Eventually we can also have someone improve the late-gc-lowering pass to
take advantage of this increased information accuracy, but currently it
will not be any better than before at laying out the frame.

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* WIP: Adding support for MMTk/Immix

* Refactoring to be considered before adding MMTk

* Adding fastpath allocation

* Fixing removed newlines

* Refactoring to be considered before adding MMTk

* Adding a few comments; Moving some functions to be closer together

* Fixing merge conflicts

* Applying changes from refactoring before adding MMTk

* Update TaskLocalRNG docstring according to #49110 (#55863)

Since #49110, which is included in 1.10 and 1.11, spawning a task no
longer advances the parent task's RNG state, so this statement in the
docs was incorrect.

* Root globals in toplevel exprs (#54433)

This fixes #54422, the code here assumes that top level exprs are always
rooted, but I don't see that referenced anywhere else, or guaranteed, so
conservatively always root objects that show up in code.

* codegen: fix alignment typos (#55880)

So easy to type jl_datatype_align to get the natural alignment instead
of julia_alignment to get the actual alignment. This should fix the
Revise workload.

Change is visible with
```
julia> code_llvm(Random.XoshiroSimd.forkRand, (Random.TaskLocalRNG, Base.Val{8}))
```

* Fix some corner cases of `isapprox` with unsigned integers (#55828)

* 🤖 [master] Bump the Pkg stdlib from ef9f76c17 to 51d4910c1 (#55896)

* Profile: fix order of fields in heapsnapshot & improve formatting (#55890)

* Profile: Improve generation of clickable terminal links (#55857)

* inference: add missing `TypeVar` handling for `instanceof_tfunc` (#55884)

I thought these sort of problems had been addressed by d60f92c, but it
seems some were missed. Specifically, `t.a` and `t.b` from `t::Union`
could be `TypeVar`, and if they are passed to a subroutine or recursed
without being unwrapped or rewrapped, errors like JuliaLang/julia#55882
could occur.

This commit resolves the issue by calling `unwraptv` in the `Union`
handling within `instanceof_tfunc`. I also found a similar issue inside
`nfields_tfunc`, so that has also been fixed, and test cases have been
added. While I haven't been able to make up a test case specifically for
the fix in `instanceof_tfunc`, I have confirmed that this commit
certainly fixes the issue reported in JuliaLang/julia#55882.

- fixes JuliaLang/julia#55882

* Install terminfo data under /usr/share/julia (#55881)

Just like all other libraries, we don't want internal Julia files to
mess with system files.

Introduced by https://github.com/JuliaLang/julia/pull/55411.

* expose metric to report reasons why full GCs were triggered (#55826)

Additional GC observability tool.

This will help us to diagnose why some of our servers are triggering so
many full GCs in certain circumstances.

* Revert "Improve printing of several arguments" (#55894)

Reverts JuliaLang/julia#55754 as it overrode some performance heuristics
which appeared to be giving a significant gain/loss in performance:
Closes https://github.com/JuliaLang/julia/issues/55893

* Do not trigger deprecation warnings in `Test.detect_ambiguities` and `Test.detect_unbound_args` (#55869)

#55868

* do not intentionally suppress errors in precompile script from being reported or failing the result (#55909)

I was slightly annoying that the build was set up to succeed if this
step failed, so I removed the error suppression and fixed up the script
slightly

* Remove eigvecs method for SymTridiagonal (#55903)

The fallback method does the same, so this specialized method isn't
necessary

* add --trim option for generating smaller binaries (#55047)

This adds a command line option `--trim` that builds images where code
is only included if it is statically reachable from methods marked using
the new function `entrypoint`. Compile-time errors are given for call
sites that are too dynamic to allow trimming the call graph (however
there is an `unsafe` option if you want to try building anyway to see
what happens).

The PR has two other components. One is changes to Base that generally
allow more code to be compiled in this mode. These changes will either
be merged in separate PRs or moved to a separate part of the workflow
(where we will build a custom system image for this purpose). The branch
is set up this way to make it easy to check out and try the
functionality.

The other component is everything in the `juliac/` directory, which
implements a compiler driver script based on this new option, along with
some examples and tests. This will eventually become a package "app"
that depends on PackageCompiler and provides a CLI for all of this
stuff, so it will not be merged here. To try an example:

```
julia contrib/juliac.jl --output-exe hello --trim test/trimming/hello.jl
```

When stripped the resulting executable is currently about 900kb on my
machine.

Also includes a lot of work by @topolarity

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>

* fix rawbigints OOB issues (#55917)

Fixes issues introduced in #50691 and found in #55906:
* use `@inbounds` and `@boundscheck` macros in rawbigints, for catching
OOB with `--check-bounds=yes`
* fix OOB in `truncate`

* prevent loading other extensions when precompiling an extension (#55589)

The current way of loading extensions when precompiling an extension
very easily leads to cycles. For example, if you have more than one
extension and you happen to transitively depend on the triggers of one
of your extensions you will immediately hit a cycle where the extensions
will try to load each other indefinitely. This is an issue because you
cannot directly influence your transitive dependency graph so from this
p.o.v the current system of loading extension is "unsound".

The test added here checks this scenario and we can now precompile and
load it without any warnings or issues.

Would have made https://github.com/JuliaLang/julia/issues/55517 a non
issue.

Fixes https://github.com/JuliaLang/julia/issues/55557

---------

Co-authored-by: KristofferC <[email protected]>

* TOML: Avoid type-pirating `Base.TOML.Parser` (#55892)

Since stdlibs can be duplicated but Base never is, `Base.require_stdlib`
makes type piracy even more complicated than it normally would be.

To adapt, this changes `TOML.Parser` to be a type defined by the TOML
stdlib, so that we can define methods on it without committing
type-piracy and avoid problems like Pkg.jl#4017

Resolves
https://github.com/JuliaLang/Pkg.jl/issues/4017#issuecomment-2377589989

* [FileWatching] fix PollingFileWatcher design and add workaround for a stat bug

What started as an innocent fix for a stat bug on Apple (#48667) turned
into a full blown investigation into the design problems with the libuv
backend for PollingFileWatcher, and writing my own implementation of it
instead which could avoid those singled-threaded concurrency bugs.

* [FileWatching] fix FileMonitor similarly and improve pidfile reliability

Previously pidfile used the same poll_interval as sleep to detect if
this code made any concurrency mistakes, but we do not really need to do
that once FileMonitor is fixed to be reliable in the presence of
parallel concurrency (instead of using watch_file).

* [FileWatching] reorganize file and add docs

* Add `--trace-dispatch` (#55848)

* relocation: account for trailing path separator in depot paths (#55355)

Fixes #55340

* change compiler to be stackless (#55575)

This change ensures the compiler uses very little stack, making it
compatible with running on any arbitrary system stack size and depths
much more reliably. It also could be further modified now to easily add
various forms of pause-able/resumable inference, since there is no
implicit state on the stack--everything is local and explicit now.

Whereas before, less than 900 frames would crash in less than a second:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
Internal error: during type inference of
f(Base.Val{1000})
Encountered stack overflow.
This might be caused by recursion over very long tuples or argument lists.

[23763] signal 6: Abort trap: 6
in expression starting at none:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 1 (Pool: 1; Big: 0); GC: 0
Abort trap: 6

real	0m0.233s
user	0m0.165s
sys	0m0.049s
````

Now: it is effectively unlimited, as long as you are willing to wait for
it:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(50000))'
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 10000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 20000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 40000 frames (may be slow).
real	7m4.988s

$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
real	0m0.214s
user	0m0.164s
sys	0m0.044s

$ time ./julia -e '@noinline f(::Val{N}) where {N} = N <= 0 ? GC.safepoint() : f(Val(N - 1)); f(Val(5000))'
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
real	0m8.609s
user	0m8.358s
sys	0m0.240s
```

* optimizer: simplify the finalizer inlining pass a bit (#55934)

Minor adjustments have been made to the algorithm of the finalizer
inlining pass. Previously, it required that the finalizer registration
dominate all uses, but this is not always necessary as far as the
finalizer inlining point dominates all the uses. So the check has been
relaxed. Other minor fixes have been made as well, but their importance
is low.

* Limit `@inbounds` to indexing in the dual-iterator branch in `copyto_unaliased!` (#55919)

This simplifies the `copyto_unalised!` implementation where the source
and destination have different `IndexStyle`s, and limits the `@inbounds`
to only the indexing operation. In particular, the iteration over
`eachindex(dest)` is not marked as `@inbounds` anymore. This seems to
help with performance when the destination uses Cartesian indexing.
Reduced implementation of the branch:
```julia
function copyto_proposed!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    for (destind, srcind) in zip(iterdest, itersrc)
        @inbounds dest[destind] = src[srcind]
    end
    dest
end

function copyto_current!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    @inbounds for a in src
        idx, state = ret::NTuple{2,Any}
        dest[idx] = a
        ret = iterate(iterdest, state)
    end
    dest
end

function copyto_current_limitinbounds!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    for isrc in itersrc
        idx, state = ret::NTuple{2,Any}
        @inbounds dest[idx] = src[isrc]
        ret = iterate(iterdest, state)
    end
    dest
end
```
```julia
julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> av = view(a, UnitRange.(axes(a))...);

julia> @btime copyto_current!($av, $b);
  617.704 ms (0 allocations: 0 bytes)

julia> @btime copyto_current_limitinbounds!($av, $b);
  304.146 ms (0 allocations: 0 bytes)

julia> @btime copyto_proposed!($av, $b);
  240.217 ms (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.12.0-DEV.1260
Commit 4a4ca9c8152 (2024-09-28 01:49 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = subl
```
I'm not quite certain why the proposed implementation here
(`copyto_proposed!`) is even faster than
`copyto_current_limitinbounds!`. In any case, `copyto_proposed!` is
easier to read, so I'm not complaining.

This fixes https://github.com/JuliaLang/julia/issues/53158

* Strong zero in Diagonal triple multiplication (#55927)

Currently, triple multiplication with a `LinearAlgebra.BandedMatrix`
sandwiched between two `Diagonal`s isn't associative, as this is
implemented using broadcasting, which doesn't assume a strong zero,
whereas the two-term matrix multiplication does.
```julia
julia> D = Diagonal(StepRangeLen(NaN, 0, 3));

julia> B = Bidiagonal(1:3, 1:2, :U);

julia> D * B * D
3×3 Matrix{Float64}:
 NaN  NaN  NaN
 NaN  NaN  NaN
 NaN  NaN  NaN

julia> (D * B) * D
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN

julia> D * (B * D)
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN
```
This PR ensures that the 3-term multiplication is evaluated as a
sequence of two-term multiplications, which fixes this issue. This also
improves performance, as only the bands need to be evaluated now.
```julia
julia> D = Diagonal(1:1000); B = Bidiagonal(1:1000, 1:999, :U);

julia> @btime $D * $B * $D;
  656.364 μs (11 allocations: 7.63 MiB) # v"1.12.0-DEV.1262"
  2.483 μs (12 allocations: 31.50 KiB) # This PR
```

* Fix dispatch on `alg` in Float16 Hermitian eigen (#55928)

Currently,
```julia
julia> using LinearAlgebra

julia> A = Hermitian(reshape(Float16[1:16;], 4, 4));

julia> eigen(A).values |> typeof
Vector{Float16} (alias for Array{Float16, 1})

julia> eigen(A, LinearAlgebra.QRIteration()).values |> typeof
Vector{Float32} (alias for Array{Float32, 1})
```
This PR moves the specialization on the `eltype` to an internal method,
so that firstly all `alg`s dispatch to that method, and secondly, there
are no ambiguities introduce by specializing the top-level `eigen`. The
latter currently causes test failures in `StaticArrays`
(https://github.com/JuliaArrays/StaticArrays.jl/actions/runs/11092206012/job/30816955210?pr=1279),
and should be fixed by this PR.

* Remove specialized `ishermitian` method for `Diagonal{<:Real}` (#55948)

The fallback method for `Diagonal{<:Number}` handles this already by
checking that the `diag` is real, so we don't need this additional
specialization.

* Fix logic in `?` docstring example (#55945)

* fix `unwrap_macrocalls` (#55950)

The implementation of `unwrap_macrocalls` has assumed that what
`:macrocall` wraps is always an `Expr` object, but that is not
necessarily correct:
```julia
julia> Base.@assume_effects :nothrow @show 42
ERROR: LoadError: TypeError: in typeassert, expected Expr, got a value of type Int64
Stacktrace:
 [1] unwrap_macrocalls(ex::Expr)
   @ Base ./expr.jl:906
 [2] var"@assume_effects"(__source__::LineNumberNode, __module__::Module, args::Vararg{Any})
   @ Base ./expr.jl:756
in expression starting at REPL[1]:1
```
This commit addresses this issue.

* make faster BigFloats (#55906)

We can coalesce the two required allocations for the MFPR BigFloat API
design into one allocation, hopefully giving a easy performance boost.
It would have been slightly easier and more efficient if MPFR BigFloat
was already a VLA instead of containing a pointer here, but that does
not prevent the optimization.

* Add propagate_inbounds_meta to atomic genericmemory ops (#55902)

`memoryref(mem, i)` will otherwise emit a boundscheck.

```
; │ @ /home/vchuravy/WorkstealingQueues/src/CLL.jl:53 within `setindex_atomic!` @ genericmemory.jl:329
; │┌ @ boot.jl:545 within `memoryref`
    %ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
    %ptls_load = load ptr, ptr %ptls_field, align 8
    %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 552, i32 32, i64 23456076646928) #9
    %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i64, ptr %"box::GenericMemoryRef", i64 -1
    store atomic i64 23456076646928, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
    store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
    %.repack8 = getelementptr inbounds { ptr, ptr }, ptr %"box::GenericMemoryRef", i64 0, i32 1
    store ptr %memoryref_mem, ptr %.repack8, align 8
    call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %7)
    unreachable
```

For the Julia code:

```julia
function Base.setindex_atomic!(buf::WSBuffer{T}, order::Symbol, val::T, idx::Int64) where T
    @inbounds Base.setindex_atomic!(buf.buffer, order, val,((idx - 1) & buf.mask) + 1)
end
```

from
https://github.com/gbaraldi/WorkstealingQueues.jl/blob/0ebc57237cf0c90feedf99e4338577d04b67805b/src/CLL.jl#L41

* fix rounding mode in construction of `BigFloat` from pi (#55911)

The default argument of the method was outdated, reading the global
default rounding directly, bypassing the `ScopedValue` stuff.

* fix `nonsetable_type_hint_handler` (#55962)

The current implementation is wrong, causing it to display inappropriate
hints like the following:
```julia
julia> s = Some("foo");

julia> s[] = "bar"
ERROR: MethodError: no method matching setindex!(::Some{String}, ::String)
The function `setindex!` exists, but no method is defined for this combination of argument types.
You attempted to index the type String, rather than an instance of the type. Make sure you create the type using its constructor: d = String([...]) rather than d = String
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1
```

* REPL: make UndefVarError aware of imported modules (#55932)

* fix test/staged.jl (#55967)

In particular, the implementation of `overdub_generator54341` was
dangerous. This fixes it up.

* Explicitly store a module's location (#55963)

Revise wants to know what file a module's `module` definition is in.
Currently it does this by looking at the source location for the
implicitly generated `eval` method. This is terrible for two reasons:

1. The method may not exist if the module is a baremodule (which is not
particularly common, which is probably why we haven't seen it).
2. The fact that the implicitly generated `eval` method has this
location information is an implementation detail that I'd like to get
rid of (#55949).

This PR adds explicit file/line info to `Module`, so that Revise doesn't
have to use the hack anymore.

* mergewith: add single argument example to docstring (#55964)

I ran into this edge case. I though it should be documented.
---------

Co-authored-by: Lilith Orion Hafner <[email protected]>

* [build] avoid libedit linkage and align libccalllazy* SONAMEs (#55968)

While building the 1.11.0-rc4 in Homebrew[^1] in preparation for 1.11.0
release (and to confirm Sequoia successfully builds) I noticed some odd
linkage for our Linux builds, which included of:

1. LLVM libraries were linking to `libedit.so`, e.g.
    ```
    Dynamic Section:
      NEEDED       libedit.so.0
      NEEDED       libz.so.1
      NEEDED       libzstd.so.1
      NEEDED       libstdc++.so.6
      NEEDED       libm.so.6
      NEEDED       libgcc_s.so.1
      NEEDED       libc.so.6
      NEEDED       ld-linux-x86-64.so.2
      SONAME       libLLVM-16jl.so
    ```
    CMakeCache.txt showed
    ```
    //Use libedit if available.
    LLVM_ENABLE_LIBEDIT:BOOL=ON
    ```
Which might be overriding `HAVE_LIBEDIT` at
https://github.com/JuliaLang/llvm-project/blob/julia-release/16.x/llvm/cmake/config-ix.cmake#L222-L225.
So just added `LLVM_ENABLE_LIBEDIT`

2. Wasn't sure if there was a reason for this but `libccalllazy*` had
mismatched SONAME:
    ```console
    ❯ objdump -p lib/julia/libccalllazy* | rg '\.so'
    lib/julia/libccalllazybar.so:	file format elf64-x86-64
      NEEDED       ccalllazyfoo.so
      SONAME       ccalllazybar.so
    lib/julia/libccalllazyfoo.so:	file format elf64-x86-64
      SONAME       ccalllazyfoo.so
    ```
    Modifying this, but can drop if intentional.

---

[^1]: https://github.com/Homebrew/homebrew-core/pull/192116

* Add missing `copy!(::AbstractMatrix, ::UniformScaling)` method (#55970)

Hi everyone! First PR to Julia here.

It was noticed in a Slack thread yesterday
that `copy!(A, I)` doesn't work, but `copyto!(A, I)` does. This PR adds
the missing method for `copy!(::AbstractMatrix, ::UniformScaling)`,
which simply defers to `copyto!`, and corresponding tests.

I added a `compat` notice for Julia 1.12.

---------

Co-authored-by: Lilith Orion Hafner <[email protected]>

* Add forward progress update to NEWS.md (#54089)

Closes #40009 which was left open because of the needs news tag.

---------

Co-authored-by: Ian Butterworth <[email protected]>

* Fix an intermittent test failure in `core` test (#55973)

The test wants to assert that `Module` is not resolved in `Main`, but
other tests do resolve this identifier, so the test can fail depending
on test order (and I've been seeing such failures on CI recently). Fix
that by running the test in a fresh subprocess.

* fix comma logic in time_print (#55977)

Minor formatting fix

* optimizer: fix up the inlining algorithm to use correct `nargs`/`isva` (#55976)

It appears that inlining.jl was not updated in JuliaLang/julia#54341.
Specifically, using `nargs`/`isva` from `mi.def::Method` in
`ir_prepare_inlining!` causes the following error to occur:
```julia
function generate_lambda_ex(world::UInt, source::LineNumberNode,
                            argnames, spnames, @nospecialize body)
    stub = Core.GeneratedFunctionStub(identity, Core.svec(argnames...), Core.svec(spnames...))
    return stub(world, source, body)
end
function overdubbee54341(a, b)
    return a + b
end
const overdubee_codeinfo54341 = code_lowered(overdubbee54341, Tuple{Any, Any})[1]
function overdub_generator54341(world::UInt, source::LineNumberNode, selftype, fargtypes)
    if length(fargtypes) != 2
        return generate_lambda_ex(world, source,
            (:overdub54341, :args), (), :(error("Wrong number of arguments")))
    else
        return copy(overdubee_codeinfo54341)
    end
end
@eval function overdub54341(args...)
    $(Expr(:meta, :generated, overdub_generator54341))
    $(Expr(:meta, :generated_only))
end
topfunc(x) = overdub54341(x, 2)
```
```julia
julia> topfunc(1)
Internal error: during type inference of
topfunc(Int64)
Encountered unexpected error in runtime:
BoundsError(a=Array{Any, 1}(dims=(2,), mem=Memory{Any}(8, 0x10632e780)[SSAValue(2), SSAValue(3), #<null>, #<null>, #<null>, #<null>, #<null>, #<null>]), i=(3,))
throw_boundserror at ./essentials.jl:14
getindex at ./essentials.jl:909 [inlined]
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1798
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1852
ir_inline_item! at ./compiler/ssair/inlining.jl:386
...
```

This commit updates the abstract interpretation and inlining algorithm
to use the `nargs`/`isva` values held by `CodeInfo`. Similar
modifications have also been made to EscapeAnalysis.jl.

@nanosoldier `runbenchmarks("inference", vs=":master")`

* Add `.zed` directory to `.gitignore` (#55974)

Similar to the `vscode` config directory, we may ignore the `zed`
directory as well.

* typeintersect: reduce unneeded allocations from `merge_env`

`merge_env` and `final_merge_env` could be skipped
for emptiness test or if we know there's only 1 valid Union state.

* typeintersect: trunc env before nested `intersect_all` if valid.

This only covers the simplest cases. We might want a full dependence analysis and keep env length minimum in the future.

* `@time` actually fix time report commas & add tests (#55982)

https://github.com/JuliaLang/julia/pull/55977 looked simple but wasn't
quite right because of a bad pattern in the lock conflicts report
section.

So fix and add tests.

* adjust EA to JuliaLang/julia#52527 (#55986)

`EnterNode.catch_dest` can now be `0` after the `try`/`catch` elision
feature implemented in JuliaLang/julia#52527, and we actually need to
adjust `EscapeAnalysis.compute_frameinfo` too.

* Improvements to JITLink

Seeing what this will look like, since it has a number of features
(delayed compilation, concurrent compilation) that are starting to
become important, so it would be nice to switch to only supporting one
common implementation of memory management.

Refs #50248

I am expecting https://github.com/llvm/llvm-project/issues/63236 may
cause some problems, since we reconfigured some CI machines to minimize
that issue, but it is still likely relevant.

* rewrite catchjmp asm to use normal relocations instead of manual editing

* add logic to prefer loading modules that are already loaded (#55908)

Iterate over the list of existing loaded modules for PkgId whenever
loading a new module for PkgId, so that we will use that existing
build_id content if it otherwise passes the other stale_checks.

* Apple: fix bus error on smaller readonly file in unix (#55859)

Enables the fix for #28245 in #44354 for Apple now that the Julia bugs are
fixed by #55641 and #55877.

Closes #28245

* Add `Float16` to `Base.HWReal` (#55929)

* docs: make mod an operator (#55988)

* InteractiveUtils: add `@trace_compile` and `@trace_dispatch` (#55915)

* Profile: document heap snapshot viewing tools (#55743)

* [REPL] Fix #55850 by using `safe_realpath` instead of `abspath` in `projname` (#55851)

* optimizer: enable load forwarding with the `finalizer` elision (#55991)

When the finalizer elision pass is used, load forwarding is not
performed currently, regardless of whether the pass succeeds or not. But
this is not necessary, and by keeping the `setfield!` call, we can
safely forward `getfield` even if finalizer elision is tried.

* Avoid `stat`-ing stdlib path if it's unreadable (#55992)

* doc: manual: cmd: fix Markdown in table entry for `--trim` (#55979)

* Avoid conversions to `Float64` in non-literal powers of `Float16` (#55994)

Co-authored-by: Alex Arslan <[email protected]>

* Remove unreachable error branch in memset calls (and in repeat) (#55985)

Some places use the pattern memset(A, v, length(A)), which requires a
conversion UInt(length(A)). This is technically fallible, but can't
actually fail when A is a Memory or Array.
Remove the dead error branch by casting to UInt instead.

Similarly, in repeat(x, r), r is first checked to be nonnegative, then
converted to UInt, then used in multiple calls where it is converted to
UInt each time. Here, only do it once.

* fix up docstring of `mod` (#56000)

* fix typos (#56008)

these are all in markdown files

Co-authored-by: spaette <[email protected]>

* Vectorise random vectors of `Float16` (#55997)

* Clarify `div` docstring for floating-point input (#55918)

Closes #55837

This is a variant of the warning found in the `fld` docstring clarifying
floating-point behaviour.

* improve getproperty(Pairs) warnings (#55989)

- Only call `depwarn` if the field is `itr` or `data`; otherwise let the field error happen as normal
- Give a more specific deprecation warning.

* Document type-piracy / type-leakage restrictions for `require_stdlib` (#56005)

I was a recent offender in
https://github.com/JuliaLang/Pkg.jl/issues/4017#issuecomment-2377589989

This PR tries to lay down some guidelines for the behavior that stdlibs
and the callers of `require_stdlib` must adhere to to avoid "duplicate
stdlib" bugs

These bugs are particularly nasty because they are experienced
semi-rarely and under pretty specific circumstances (they only occur
when `require_stdlib` loads another copy of a stdlib, often in a
particular order and/or with a particular state of your pre-compile /
loading cache) so they may make it a long way through a pre-release
cycle without an actionable bug report.

* [LinearAlgebra] Remove unreliable doctests (#56011)

The exact textual representation of the output of these doctests depend
on the specific kernel used by the BLAS backend, and can vary between
versions of OpenBLAS (as it did in #41973), or between different CPUs,
which makes these doctests unreliable.

Fix #55998.

* cleanup functions of Hermitian matrices (#55951)

The functions of Hermitian matrices are a bit of a mess. For example, if
we have a Hermitian matrix `a` with negative eigenvalues, `a^0.5`
doesn't produce the `Symmetric` wrapper, but `sqrt(a)` does. On the
other hand, if we have a positive definite `b`, `b^0.5` will be
`Hermitian`, but `sqrt(b)` will be `Symmetric`:
```julia
using LinearAlgebra
a = Hermitian([1.0 2.0;2.0 1.0])
a^0.5
sqrt(a)
b = Hermitian([2.0 1.0; 1.0 2.0])
b^0.5
sqrt(b)
```
This sort of arbitrary assignment of wrappers happens with pretty much
all functions defined there. There's also some oddities, such as `cis`
being the only function defined for `SymTridiagonal`, even though all
`eigen`-based functions work, and `cbrt` being the only function not
defined for complex Hermitian matrices.

I did a cleanup: I defined all functions for `SymTridiagonal` and
`Hermitian{<:Complex}`, and always assigned the appropriate wrapper,
preserving the input one when possible.

There's an inconsistency remaining that I didn't fix, that only `sqrt`
and `log` accept a tolerance argument, as changing that is probably
breaking.

There were also hardly any tests that I could find (only `exp`, `log`,
`cis`, and `sqrt`). I'm happy to add them if it's desired.

* Fix no-arg `ScopedValues.@with` within a scope (#56019)

Fixes https://github.com/JuliaLang/julia/issues/56017

* LinearAlgebra: make matprod_dest public (#55537)

Currently, in a matrix multiplication `A * B`, we use `B` to construct
the destination. However, this may not produce the optimal destination
type, and is essentially single-dispatch. Letting packages specialize
`matprod_dest` would help us obtain the optimal type by dispatching on
both the arguments. This may significantly improve performance in the
matrix multiplication. As an example:
```julia
julia> using LinearAlgebra, FillArrays, SparseArrays

julia> F = Fill(3, 10, 10);

julia> s = sprand(10, 10, 0.1);

julia> @btime $F * $s;
  15.225 μs (10 allocations: 4.14 KiB)

julia> typeof(F * s)
SparseMatrixCSC{Float64, Int64}

julia> nnz(F * s)
80

julia> VERSION
v"1.12.0-DEV.1074"
```
In this case, the destination is a sparse matrix with 80% of its
elements filled and being set one-by-one, which is terrible for
performance. Instead, if we specialize `matprod_dest` to return a dense
destination, we may obtain
```julia
julia> LinearAlgebra.matprod_dest(F::FillArrays.AbstractFill, S::SparseMatrixCSC, ::Type{T}) where {T} = Matrix{T}(undef, size(F,1), size(S,2))

julia> @btime $F * $s;
  754.632 ns (2 allocations: 944 bytes)

julia> typeof(F * s)
Matrix{Float64}
```
Potentially, this may be improved further by specializing `mul!`, but
this is a 20x improvement just by choosing the right destination.

Since this is being made public, we may want to bikeshed on an
appropriate name for the function.

* Sockets: Warn when local network access not granted. (#56023)

Works around https://github.com/JuliaLang/julia/issues/56022

* Update test due to switch to intel syntax by default in #48103 (#55993)

* add require_lock call to maybe_loaded_precompile (#56027)

If we expect this to be a public API
(https://github.com/timholy/Revise.jl for some reason is trying to
access this state), we should lock around it for consistency with the
other similar functions.

Needed for https://github.com/timholy/Revise.jl/pull/856

* fix `power_by_squaring`: use `promote` instead of type inference (#55634)

Fixes #53504

Fixes #55633

* Don't show keymap `@error` for hints (#56041)

It's too disruptive to show errors for hints. The error will still be
shown if tab is pressed.

Helps issues like https://github.com/JuliaLang/julia/issues/56037

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* Remove extern from free_stack declaration in julia_internal.h

* Putting everything that is common GC tls into gc-tls-common.h

* Typo

* Adding gc-tls-common.h to Makefile as a public header

* Removing gc-tls-common fields from gc-tls-mmtk.h

* Fix typo in sockets tests. (#56038)

* EA: use `is_mutation_free_argtype` for the escapability check (#56028)

EA has been using `isbitstype` for type-level escapability checks, but a
better criterion (`is_mutation_free`) is available these days, so we
would like to use that instead.

* effects: fix `Base.@_noub_meta` (#56061)

This had the incorrect number of arguments to `Expr(:purity, ...)`
causing it to be silently ignored.

* effects: improve `:noub_if_noinbounds` documentation (#56060)

Just a small touch-up

* Disallow assigning asymmetric values to SymTridiagonal (#56068)

Currently, we can assign an asymmetric value to a `SymTridiagonal`,
which goes against what `setindex!` is expected to do. This is because
`SymTridiagonal` symmetrizes the values along the diagonal, so setting a
diagonal entry to an asymmetric value would lead to a subsequent
`getindex` producing a different result.
```julia
julia> s = SMatrix{2,2}(1:4);

julia> S = SymTridiagonal(fill(s,4), fill(s,3))
4×4 SymTridiagonal{SMatrix{2, 2, Int64, 4}, Vector{SMatrix{2, 2, Int64, 4}}}:
 [1 3; 3 4]  [1 3; 2 4]      ⋅           ⋅     
 [1 2; 3 4]  [1 3; 3 4]  [1 3; 2 4]      ⋅     
     ⋅       [1 2; 3 4]  [1 3; 3 4]  [1 3; 2 4]
     ⋅           ⋅       [1 2; 3 4]  [1 3; 3 4]

julia> S[1,1] = s
2×2 SMatrix{2, 2, Int64, 4} with indices SOneTo(2)×SOneTo(2):
 1  3
 2  4

julia> S[1,1] == s
false

julia> S[1,1]
2×2 Symmetric{Int64, SMatrix{2, 2, Int64, 4}} with indices SOneTo(2)×SOneTo(2):
 1  3
 3  4
```
After this PR,
```julia
julia> S[1,1] = s
ERROR: ArgumentError: cannot set a diagonal entry of a SymTridiagonal to an asymmetric value
```

* Remove unused matrix type params in diag methods (#56048)

These parameters are not used in the method, and are unnecessary for
dispatch.

* LinearAlgebra: diagzero for non-OneTo axes (#55252)

Currently, the off-diagonal zeros for a block-`Diagonal` matrix is
computed using `diagzero`, which calls `zeros` for the sizes of the
elements. This returns an `Array`, unless one specializes `diagzero` for
the custom `Diagonal` matrix type.

This PR defines a `zeroslike` function that dispatches on the axes of
the elements, which lets packages specialize on the axes to return
custom `AbstractArray`s. Choosing to specialize on the `eltype` avoids
the need to specialize on the container, and allows packages to return
appropriate types for custom axis types.

With this,
```julia
julia> LinearAlgebra.zeroslike(::Type{S}, ax::Tuple{SOneTo, Vararg{SOneTo}}) where {S<:SMatrix} = SMatrix{map(length, ax)...}(ntuple(_->zero(eltype(S)), prod(length, ax)))

julia> D = Diagonal(fill(SMatrix{2,3}(1:6), 2))
2×2 Diagonal{SMatrix{2, 3, Int64, 6}, Vector{SMatrix{2, 3, Int64, 6}}}:
 [1 3 5; 2 4 6]        ⋅       
       ⋅         [1 3 5; 2 4 6]

julia> D[1,2] # now an SMatrix
2×3 SMatrix{2, 3, Int64, 6} with indices SOneTo(2)×SOneTo(3):
 0  0  0
 0  0  0

julia> LinearAlgebra.zeroslike(::Type{S}, ax::Tuple{SOneTo, Vararg{SOneTo}}) where {S<:MMatrix} = MMatrix{map(length, ax)...}(ntuple(_->zero(eltype(S)), prod(length, ax)))

julia> D = Diagonal(fill(MMatrix{2,3}(1:6), 2))
2×2 Diagonal{MMatrix{2, 3, Int64, 6}, Vector{MMatrix{2, 3, Int64, 6}}}:
 [1 3 5; 2 4 6]        ⋅       
       ⋅         [1 3 5; 2 4 6]

julia> D[1,2] # now an MMatrix
2×3 MMatrix{2, 3, Int64, 6} with indices SOneTo(2)×SOneTo(3):
 0  0  0
 0  0  0
```
The reason this can't be the default behavior is that we are not
guaranteed that there exists a `similar` method that accepts the
combination of axes. This is why we have to fall back to using the
sizes, unless a specialized method is provided by a package.

One positive outcome of this is that indexing into such a block-diagonal
matrix will now usually be type-stable, which mitigates
https://github.com/JuliaLang/julia/issues/45535 to some extent (although
it doesn't resolve the issue).

I've also updated the `getindex` for `Bidiagonal` to use `diagzero`,
instead of the similarly defined `bidiagzero` function that it was
using. Structured block matrices may now use `diagzero` uniformly to
generate the zero elements.

* Multi-argument `gcdx(a, b, c...)` (#55935)

Previously, `gcdx` only worked for two arguments - but the underlying
idea extends to any (nonzero) number of arguments. Similarly, `gcd`
already works for 1, 2, 3+ arguments.

This PR implements the 1 and 3+ argument versions of `gcdx`, following
the [wiki
page](https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#The_case_of_more_than_two_numbers)
for the Extended Euclidean algorithm.

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* Remove extern from free_stack declaration in julia_internal.h

* Putting everything that is common GC tls into gc-tls-common.h

* Typo

* Adding gc-tls-common.h to Makefile as a public header

* Adding jl_full_sweep_reasons since timing.jl depends on it

* Fixing issue with jl_full_sweep_reasons (missing constants)

* fix `_growbeg!` unncessary resizing (#56029)

This was very explicitly designed such that if there was a bunch of
extra space at the end of the array, we would copy rather than
allocating, but by making `newmemlen` be at least
`overallocation(memlen)` rather than `overallocation(len)`, this branch
was never hit. found by https://github.com/JuliaLang/julia/issues/56026

* REPL: hide any prints to stdio during `complete_line` (#55959)

* teach llvm-alloc-helpers about `gc_loaded` (#56030)

combined with https://github.com/JuliaLang/julia/pull/55913, the
compiler is smart enough to fully remove
```
function f()
    m = Memory{Int}(undef, 3)
    @inbounds m[1] = 2
    @inbounds m[2] = 2
    @inbounds m[3] = 4
    @inbounds return m[1] + m[2] + m[3]
end
```

* mpfr: prevent changing precision (#56049)

Changing precision requires reallocating the data field, which is better
done by making a new BigFloat (since they are conceptually immutable
anyways). Also do a bit a cleanup while here.

Closes #56044

* stackwalk: fix jl_thread_suspend_and_get_state race (#56047)

There was a missing re-assignment of old = -1; at the end of that loop
which means in the ABA case, we accidentally actually acquire the lock
on the thread despite not actually having stopped the thread; or in the
counter-case, we try to run through this logic with old==-1 on the next
iteration, and that isn't valid either (jl_thread_suspend_and_get_state
should return failure and the loop will abort too early).

Fix #56046

* irrationals: restrict assume effects annotations to known types (#55886)

Other changes:
* replace `:total` with the less powerful `:foldable`
* add an `<:Integer` dispatch constraint on the `rationalize` method,
closes #55872
* replace `Rational{<:Integer}` with just `Rational`, they're equal

Other issues, related to `BigFloat` precision, are still present in
irrationals.jl, to be fixed by followup PRs, including #55853.

Fixes #55874

* update `hash` doc string: `widen` not required any more (#55867)

Implementing `widen` isn't a requirement any more, since #26022.

* Merge `diag` methods for triangular matrices (#56086)

* slightly improve inference in precompilation code (#56084)

Avoids the

```
11: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Base.Precompilation.ExplicitEnv(::String) (84 children)
```

shown in
https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120

Co-authored-by: KristofferC <[email protected]>

* avoid defining `convert(Vector{String}, ...)` in LibGit2 (#56082)

This is a weird conversion function to define. Seems cleaner to use the
iteration interface for this. Also avoids some invalidations
(https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120)

Co-authored-by: KristofferC <[email protected]>

* array: inline `convert` where possible (#56034)

This improves a common scenario, where someone wants to `push!` a
poorly-typed object onto a well-typed Vector.

For example:
```julia
const NT = @NamedTuple{x::Int,y::Any}
foo(v::Vector{NT}, x::Int, @nospecialize(y)) = push!(v, (; x, y))
```

The `(; x, y)` is slightly poorly-typed here. It could have any type for
its `.y` field before it is converted inside the `push!` to a NamedTuple
with `y::Any`

Without this PR, the dispatch for this `push!` cannot be inferred:
```julia
julia> code_typed(foo, (Vector{NT}, Int, Any))[1]
 CodeInfo(
1 ─ ...
│   %4 = %new(%3, x, y)::NamedTuple{(:x, :y), <:Tuple{Int64, Any}}
│   %5 = Main.push!(v, %4)::Vector{@NamedTuple{x::Int64, y}}
└──      return %5
) => Vector{@NamedTuple{x::Int64, y}}
```

With this PR, the above dynamic call is fully statically resolved and
inlined (and therefore `--trim` compatible)

* Remove some unnecessary `real` specializations for structured matrices (#56083)

The `real(::AbstractArray{<:Rea})` fallback method should handle these
cases correctly.

* Combine `diag` methods for `SymTridiagonal` (#56014)

Currently, there are two branches, one for an `eltype` that is a
`Number`, and the other that deals with generic `eltype`s. They do
similar things, so we may combine these, and use branches wherever
necessary to retain the performance. We also may replace explicit
materialized arrays by generators in `copyto!`. Overall, this improves
performance in `diag` for matrices of matrices, whereas the performance
in the common case of matrices of numbers remains unchanged.
```julia
julia> using StaticArrays, LinearAlgebra

julia> s = SMatrix{2,2}(1:4);

julia> S = SymTridiagonal(fill(s,100), fill(s,99));

julia> @btime diag($S);
  1.292 μs (5 allocations: 7.16 KiB) # nightly, v"1.12.0-DEV.1317"
  685.012 ns (3 allocations: 3.19 KiB) # This PR
```
This PR also allows computing the `diag` for more values of the band
index `n`:
```julia
julia> diag(S,99)
1-element Vector{SMatrix{2, 2, Int64, 4}}:
 [0 0; 0 0]
```
This would work as long as `getindex` works for the `SymTridiagonal` for
that band, and the zero element may be converted to the `eltype`.

* fix `Vararg{T,T} where T` crashing `code_typed` (#56081)

Not sure this is the right place to fix this error, perhaps
`match.spec_types` should always be a tuple of valid types?

fixes #55916

---------

Co-authored-by: Jameson Nash <[email protected]>

* [libblastrampoline_jll] Upgrade to v5.11.1 (#56094)

v5.11.1 is a patch release with a couple of RISC-V fixes.

* Revert "REPL: hide any prints to stdio during `complete_line`" (#56102)

* Remove warning from c when binding is ambiguous (#56103)

* make `Base.ANSIIterator` have a concrete field (#56088)

Avoids the invalidation

```
   backedges: 1: superseding sizeof(s::AbstractString) @ Base strings/basic.jl:177 with MethodInstance for sizeof(::AbstractString) (75 children)
```

shown in
https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120.

Co-authored-by: KristofferC <[email protected]>

* Subtype: some performance tuning. (#56007)

The main motivation of this PR is to fix #55807.
dc689fe8700f70f4a4e2dbaaf270f26b87e79e04 tries to remove the slow
`may_contain_union_decision` check by re-organizing the code path. Now
the fast path has been removed and most of its optimization has been
integrated into the preserved slow path.
Since the slow path stores all inner ∃ decisions on the outer most R
stack, there might be overflow risk.
aee69a41441b4306ba3ee5e845bc96cb45d9b327 should fix that concern.

The reported MWE now becomes
```julia
  0.000002 seconds
  0.000040 seconds (105 allocations: 4.828 KiB, 52.00% compilation time)
  0.000023 seconds (105 allocations: 4.828 KiB, 49.36% compilation time)
  0.000026 seconds (105 allocations: 4.828 KiB, 50.38% compilation time)
  0.000027 seconds (105 allocations: 4.828 KiB, 54.95% compilation time)
  0.000019 seconds (106 allocations: 4.922 KiB, 49.73% compilation time)
  0.000024 seconds (105 allocations: 4.828 KiB, 52.24% compilation time)
```

Local bench also shows that 72855cd slightly accelerates
`OmniPackage.jl`'s loading
```julia
julia> @time using OmniPackage
# v1.11rc4
 20.525278 seconds (25.36 M allocations: 1.606 GiB, 8.48% gc time, 12.89% compilation time: 77% of which was recompilation)
# v1.11rc4+aee69a4+72855cd 
 19.527871 seconds (24.92 M allocations: 1.593 GiB, 8.88% gc time, 15.13% compilation time: 82% of which was recompilation)
```

* rearrange jl_delete_thread to be thread-safe (#56097)

Prior to this, especially on macOS, the gc-safepoint here would cause
the process to segfault as we had already freed the current_task state.
Rearrange this code so that the GC interactions (except for the atomic
store to current_task) are all handled before entering GC safe, and then
signaling the thread is deleted (via setting current_task = NULL,
published by jl_unlock_profile_wr to other threads) is last.

```
ERROR: Exception handler triggered on unmanaged thread.
Process 53827 stopped
* thread #5, stop reason = EXC_BAD_ACCESS (code=2, address=0x100018008)
    frame #0: 0x0000000100b74344 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_set(ptls=0x000000011f8b3200, state='\x02', old_state=<unavailable>) at julia_threads.h:272:9 [opt]
   269 	    assert(old_state != JL_GC_CONCURRENT_COLLECTOR_THREAD);
   270 	    jl_atomic_store_release(&ptls->gc_state, state);
   271 	    if (state == JL_GC_STATE_UNSAFE || old_state == JL_GC_STATE_UNSAFE)
-> 272 	        jl_gc_safepoint_(ptls);
   273 	    return old_state;
   274 	}
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
Target 0: (julia) stopped.
(lldb) up
frame #1: 0x0000000100b74320 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_save_and_set(ptls=0x000000011f8b3200, state='\x02') at julia_threads.h:278:12 [opt]
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
   276 	                                              int8_t state)
   277 	{
-> 278 	    return jl_gc_state_set(ptls, state, jl_atomic_load_relaxed(&ptls->gc_state));
   279 	}
   280 	#ifdef __clang_gcanalyzer__
   281 	// these might not be a safepoint (if they are no-op safe=>safe transitions), but we have to assume it could be (statically)
(lldb)
frame #2: 0x0000000100b7431c libjulia-internal.1.12.0.dylib`jl_delete_thread(value=0x000000011f8b3200) at threading.c:537:11 [opt]
   534 	    ptls->root_task = NULL;
   535 	    jl_free_thread_gc_state(ptls);
   536 	    // then park in safe-region
-> 537 	    (void)jl_gc_safe_enter(ptls);
   538 	}
```

(test incorporated into https://github.com/JuliaLang/julia/pull/55793)

* OpenBLAS: Use dynamic architecture support on AArch64. (#56107)

We already do so on Yggdrasil, so this just makes both source and binary
builds behave similarly.

Closes https://github.com/JuliaLang/julia/issues/56075

* IRShow: label builtin / intrinsic / dynamic calls in `code_typed` (#56036)

This makes it much easier to spot dynamic dispatches

* 🤖 [master] Bump the Pkg stdlib from 51d4910c1 to fbaa2e337 (#56124)

* Fix type instability of closures capturing types (2) (#40985)

Instead of closures lowering to `typeof` for the types of captured
fields, this introduces a new function `_typeof_captured_variable` that
returns `Type{T}` if `T` is a type (w/o free typevars).

- replaces/closes #35970
- fixes #23618

---------

Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Shuhei Kadowaki <[email protected]>

* Remove debug error statement from Makefile. (#56127)

* align markdown table (#56122)

@<!-- -->gbaraldi `#51197`
@<!-- -->spaette `#56008`

fix innocuous malalignment of table after those pulls were merged

* Improve IOBuffer docs (#56024)

Based on the discussion in #55978, I have tried to clarify the
documentation of `IOBuffer`.

* Comment out url and fix typo in stackwalk.c (#56131)

Introduced in #55623

* libgit2: Always use the bundled PCRE library. (#56129)

This is how Yggdrasil builds the library.

* Update JLL build versions (#56133)

This commit encompasses the following changes:
- Updating the JLL build version for Clang, dSFMT, GMP, LibUV,
LibUnwind, LLD, LLVM, libLLVM, MbedTLS, MPFR, OpenBLAS, OpenLibm, p7zip,
PCRE2, SuiteSparse, and Zlib.
- Updating CompilerSupportLibraries to v1.2.0. The library versions
contained in this release of CSL don't differ from v1.1.1, the only
difference is that v1.2.0 includes FreeBSD AArch64.
- Updating nghttp2 from 1.60.0 to 1.63.0. See
[here](https://github.com/nghttp2/nghttp2/releases) for changes between
these versions.
- Adding `aarch64-unknown-freebsd` to the list of triplets to check when
refreshing checksums.

Note that dependencies that link to MbedTLS (Curl, LibSSH2, LibGit2) are
excluded here. They'll be updated once a resolution is reached for the
OpenSSL switching saga. Once that happens, FreeBSD AArch64 should be
able to be built without any dependency source builds.

* typo in `Compiler.Effects` doc string: `checkbounds` -> `boundscheck` (#56140)

Follows up on #56060

* HISTORY: fix missing links (#56137)

* OpenBLAS: Fix cross-compilation detection for source build. (#56139)

We may be cross-compiling Linux-to-Linux, in which case `BUILD_OS` ==
`OS`, so look at `XC_HOST` to determine whether we're cross compiling.

* `diag` for `BandedMatrix`es for off-limit bands (#56065)

Currently, one can only obtain the `diag` for a `BandedMatrix` (such as
a `Diagonal`) when the band index is bounded by the size of the matrix.
This PR relaxes this requirement to match the behavior for arrays, where
`diag` returns an empty vector for a large band index instead of
throwing an error.
```julia
julia> D = Diagonal(ones(4))
4×4 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅    ⋅    ⋅ 
  ⋅   1.0   ⋅    ⋅ 
  ⋅    ⋅   1.0   ⋅ 
  ⋅    ⋅    ⋅   1.0

julia> diag(D, 10)
Float64[]

julia> diag(Array(D), 10)
Float64[]
```
Something similar for `SymTridiagonal` is being done in
https://github.com/JuliaLang/julia/pull/56014

* Port progress bar improvements from Pkg (#56125)

Includes changes from https://github.com/JuliaLang/Pkg.jl/pull/4038 and
https://github.com/JuliaLang/Pkg.jl/pull/4044.

Co-authored-by: Kristoffer Carlsson <[email protected]>

* Add support for LLVM 19 (#55650)

Co-authored-by: Zentrik <[email protected]>

* 🤖 [master] Bump the Pkg stdlib from fbaa2e337 to 27c1b1ee5 (#56146)

* HISTORY entry for deletion of `length(::Stateful)` (#55861)

xref #47790

xref #51747

xref #54953

xref #55858

* ntuple: ensure eltype is always `Int` (#55901)

Fixes #55790

* Improve remarks of the alloc opt pass slightly. (#55995)

The Value printer LLVM uses just prints the kind of instruction so it
just shows call.

---------

Co-authored-by: Oscar Smith <[email protected]>

* Implement Base.fd() for TCPSocket, UDPSocket, and TCPServer (#53721)

This is quite handy if you want to pass off the file descriptor to a C
library. I also added a warning to the `fd()` docstring to warn folks
about duplicating the file descriptor first.

* Fix `JULIA_CPU_TARGET` being propagated to workers precompiling stdlib pkgimages (#54093)

Apparently (thanks ChatGPT) each line in a makefile is executed in a
separate shell so adding an `export` line on one line does not propagate
to the next line.

* Merge tr methods for triangular matrices (#56154)

Since the methods do identical things, we don't need multiple of these.

* Reduce duplication in triangular indexing methods (#56152)

This uses an orthogonal design to reduce code duplication in the
indexing methods for triangular matrices.

* update LLVM docs (#56162)

dump with raw=true so you don't get random erorrs, and show how to run
single modules.

---------

Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>

* Fix zero elements for block-matrix kron involving Diagonal (#55941)

Currently, it's assumed that the zero element is identical for the
matrix, but this is not necessary if the elements are matrices
themselves and have different sizes. This PR ensures that `kron` for a
`Diagonal` has the correct zero elements.
Current:
```julia
julia> D = Diagonal(1:2)
2×2 Diagonal{Int64, UnitRange{Int64}}:
 1  ⋅
 ⋅  2

julia> B = reshape([ones(2,2), ones(3,2), ones(2,3), ones(3,3)], 2, 2);

julia> size.(kron(D, B))
4×4 Matrix{Tuple{Int64, Int64}}:
 (2, 2)  (2, 3)  (2, 2)  (2, 2)
 (3, 2)  (3, 3)  (2, 2)  (2, 2)
 (2, 2)  (2, 2)  (2, 2)  (2, 3)
 (2, 2)  (2, 2)  (3, 2)  (3, 3)
``` 
This PR
```julia
julia> size.(kron(D, B))
4×4 Matrix{Tuple{Int64, Int64}}:
 (2, 2)  (2, 3)  (2, 2)  (2, 3)
 (3, 2)  (3, 3)  (3, 2)  (3, 3)
 (2, 2)  (2, 3)  (2, 2)  (2, 3)
 (3, 2)  (3, 3)  (3, 2)  (3, 3)
```
Note the differences e.g. in the `CartesianIndex(4,1)`,
`CartesianIndex(3,2)` and `CartesianIndex(3,3)` elements.

* Call `MulAddMul` instead of multiplication in _generic_matmatmul! (#56089)

Fix https://github.com/JuliaLang/julia/issues/56085 by calling a newly
created `MulAddMul` object that only wraps the `alpha` (with `beta` set
to `false`). This avoids the explicit multiplication if `alpha` is known
to be `isone`.

* improve `allunique`'s type stability (#56161)

Caught by https://github.com/aviatesk/JET.jl/issues/667.

* Add invalidation barriers for `displaysize` and `implicit_typeinfo` (#56159)

These are invalidated by our own stdlibs (Dates and REPL) unfortunately
so we need to put this barrier in.

This fix is _very_ un-satisfying, because it doesn't do anything to
solve this problem for downstream libraries that use e.g. `displaysize`.
To fix that, I think we need a way to make sure callers get these
invalidation barriers by default...

* Fix markdown list in installation.md (#56165)

Documenter.jl requires all trailing list content to follow the same
indentation as the header. So, in the current view
(https://docs.julialang.org/en/v1/manual/installation/#Command-line-arguments)
the list appears broken.

* [Random] Add more comments and a helper function in Xoshiro code (#56144)

Follow up to #55994 and #55997. This should basically be a
non-functional change and I see no performance difference, but the
comments and the definition of a helper function should make the code
easier to follow (I initially struggled in #55997) and extend to other
types.

* add objects to concisely specify initialization

PerProcess: once per process
PerThread: once per thread id
PerTask: once per task object

* add precompile support for recording fields to change

Somewhat generalizes our support for changing Ptr to C_NULL. Not
particularly fast, since it is just using the builtins implementation of
setfield, and delaying the actual stores, but it should suffice.

* improve OncePer implementation

Address reviewer feedback, add more fixes and more tests,
rename to add Once prefix.

* fix use-after-free in test (detected in win32 CI)

* Make loading work when stdlib deps are missing in the manifest (#56148)

Closes https://github.com/JuliaLang/julia/issues/56109 

Simulating a bad manifest by having `LibGit2_jll` missing as a dep of
`LibGit2` in my default env, say because the manifest was generated by a
different julia version or different master julia commit.

## This PR, it just works
```
julia> using Revise

julia>
```
i.e.
```
% JULIA_DEBUG=loading ./julia --startup-file=no
julia> using Revise
...
┌ Debug: Stdlib LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433] is trying to load `LibGit2_jll`
│ which is not listed as a dep in the load path manifests, so resorting to search
│ in the stdlib Project.tomls for true deps
└ @ Base loading.jl:387
┌ Debug: LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433] indeed depends on LibGit2_jll in project /Users/ian/Documents/GitHub/julia/usr/share/julia/stdlib/v1.12/LibGit2/Project.toml
└ @ Base loading.jl:395
...

julia>
```

## Master
```
julia> using Revise
Info Given Revise was explicitly requested, output will be shown live
ERROR: LoadError: ArgumentError: Package LibGit2 does not have LibGit2_jll in its dependencies:
- Note that the following manifests in the load path were resolved with a potentially
  different DEV version of the current version, which may be the cause of the error.
  Try to re-resolve them in the current version, or consider deleting them if that fails:
    /Users/ian/.julia/environments/v1.12/Manifest.toml
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have LibGit2 checked out for development and have
  added LibGit2_jll as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with LibGit2
...
```

* Remove llvm-muladd pass and move it's functionality to to llvm-simdloop (#55802)

Closes https://github.com/JuliaLang/julia/issues/55785

I'm not sure if we want to backport this like this. Because that removes
some functionality (the pass itself). So LLVM.jl and friends might need
annoying version code. We can maybe keep the code there and just not run
the pass in a backport.

* Fix implicit `convert(String, ...)` in several places (#56174)

This removes several `convert(String, ...)` from this code, which really
shouldn't be something we invalidate on in the first place (see
https://github.com/JuliaLang/julia/issues/56173) but this is still an
improvement in code quality so let's take it.

* Change annotations to use a NamedTuple (#55741)

Due to popular demand, the type of annotations is to be changed from a
`Tuple{UnitRange{Int}, Pair{Symbol, Any}}` to a `NamedTuple{(:region,
:label, :value), Tuple{UnitRange{Int}, Symbol,
Any}}`.

This requires the expected code churn to `strings/annotated.jl`, and
some changes to the StyledStrings and JuliaSyntaxHighlighting libraries.

Closes #55249 and closes #55245.

* Getting rid of mmtk_julia.c in the binding and moving it to gc-mmtk.c

* Trying to organize and label the code in gc-mmtk.c

* Remove redundant `convert` in `_setindex!` (#56178)

Follow up to #56034, ref:
https://github.com/JuliaLang/julia/pull/56034#discussion_r1798573573.

---------

Co-authored-by: Cody Tapscott <[email protected]>

* Improve type inference of Artifacts.jl (#56118)

This also has some changes that move platform selection to compile time
together with
https://github.com/JuliaPackaging/JLLWrappers.jl/commit/45cc04963f3c99d4eb902f97528fe16fc37002cc,
move the platform selection to compile time.

(this helps juliac a ton)

* Initial support for RISC-V (#56105)

Rebase and extension of @alexfanqi's initial work on porting Julia to
RISC-V. Requires LLVM 19.

Tested on a VisionFive2, built with:

```make
MARCH := rv64gc_zba_zbb
MCPU := sifive-u74

USE_BINARYBUILDER:=0

DEPS_GIT = llvm
override LLVM_VER=19.1.1
override LLVM_BRANCH=julia-release/19.x
override LLVM_SHA1=julia-release/19.x
```

```julia-repl
❯ ./julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.1374 (2024-10-14)
 _/ |\__'_|_|_|\__'_|  |  riscv/25092a3982* (fork: 1 commits, 0 days)
|__/                   |

julia> versioninfo(; verbose=true)
Julia Version 1.12.0-DEV.1374
Commit 25092a3982* (2024-10-14 09:57 UTC)
Platform Info:
  OS: Linux (riscv64-unknown-linux-gnu)
  uname: Linux 6.11.3-1-riscv64 #1 SMP Debian 6.11.3-1 (2024-10-10) riscv64 unknown
  CPU: …
KristofferC pushed a commit that referenced this pull request Oct 29, 2024
Implementing `widen` isn't a requirement any more, since #26022.

(cherry picked from commit e95860c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] hashing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants