For VarInfo, fix merge and allow push!!ing new Symbols #690

mhauru · 2024-10-14T16:40:08Z

This introduces two unrelated changes to VarInfo, that came up in the context of TuringLang/Turing.jl#2328:

Fix how gidset is handled in merge
Allow pushing new values to TypedVarInfo that introduce a new symbol.

The first I think should be uncontroversial. The second is saying that if we have a TypedVarInfo that for instance so far only has vi.metadata = (:x => some_metadata), you should still be able to do push!!(vi, @varname(y), val, dist, gidset), and it should introduce a new entry in the NamedTuple. This is different from how TypedVarInfo is usually created, and might in some cases create Metadata objects with quite narrow element types. However, something like this would be needed for cases where new variables may be appear between samples (see the aforementioned PR), and I don't really see a downside to allowing this.

coveralls · 2024-10-14T17:02:17Z

Pull Request Test Coverage Report for Build 11331660892

Details

16 of 16 (100.0%) changed or added relevant lines in 1 file are covered.
25 unchanged lines in 4 files lost coverage.
Overall coverage decreased (-2.0%) to 77.424%

Files with Coverage Reduction	New Missed Lines	%
src/model.jl	1	93.68%
src/varinfo.jl	6	80.24%
src/simple_varinfo.jl	6	85.57%
src/threadsafe.jl	12	55.17%

Totals
Change from base Build 11327761860:	-2.0%
Covered Lines:	3258
Relevant Lines:	4208

💛 - Coveralls

codecov · 2024-10-14T17:04:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.12%. Comparing base (1d10278) to head (78f12c5).
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #690      +/-   ##
==========================================
+ Coverage   79.02%   79.12%   +0.10%     
==========================================
  Files          30       30              
  Lines        4200     4211      +11     
==========================================
+ Hits         3319     3332      +13     
+ Misses        881      879       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coveralls · 2024-10-14T17:05:13Z

Pull Request Test Coverage Report for Build 11347620918

Details

16 of 16 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 79.542%

Totals
Change from base Build 11327761860:	0.1%
Covered Lines:	3332
Relevant Lines:	4189

💛 - Coveralls

sunxd3 · 2024-10-15T03:40:44Z

This looks very sensible to me, but I am not familiar with gids, couldn't really give a good review.

Maybe @torfjelde can also have a quick look?

mhauru · 2024-10-15T09:48:40Z

I was thinking of this after work, and should have clarified that this still fails:

        untyped_vi = VarInfo()
        untyped_vi = push!!(untyped_vi, @varname(x), 1.0, Normal(), Selector())
        typed_vi = TypedVarInfo(untyped_vi)
        typed_vi = push!!(typed_vi, @varname(y[1]), 2.0, Normal(), Selector())
        typed_vi = push!!(typed_vi, @varname(y[2]), 2.0, MvNormal(0, 1), Selector())

with

ERROR: MethodError: Cannot `convert` an object of type MvNormal{Int64, PDMats.ScalMat{Int64}, FillArrays.Zeros{Int64, 1, Tuple{Base.OneTo{Int64}}}} to an object of type Normal{Float64}

This is what I meant by having very narrow eltypes. I have some ideas (and code in place in VarNamedVector) for how to support the above too, but I think that's a separate issue that needs a different solution, since we do want to keep the eltypes of a TypedVarInfo concrete whenever possible. Also, this is much less important to support.

penelopeysm

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Would something like this be better? Also unsure about conventions around f! and f!!, return types and mutation. There's probably also a better name for the inner function.

function merge_metadata(metadata_left::Metadata, metadata_right::Metadata)
    # ... (not touching these lines)

    # Range offset.
    offset = 0

    # Initialize metadata struct
    merged_metadata = Metadata(idcs, vns, ranges, vals, dists, gids, orders, flags)

    function update_metadata_from!(dest, source, vn, offset)
        vals_right = getindex_internal(source, vn)
        append!(dest.vals, vals_right)
        r = (offset + 1):(offset + length(vals_right))
        push!(dest.ranges, r)
        new_offset = r[end]
        dist_right = getdist(source, vn)
        push!(dest.dists, dist_right)
        gid = source.gids[getidx(source, vn)]
        push!(dest.gids, gid)
        push!(dest.orders, getorder(source, vn))
        for k in keys(flags)
            push!(dest.flags[k], is_flagged(source, vn, k))
        end
        return new_offset
    end

    for (idx, vn) in enumerate(vns_both)
        # `idcs`
        idcs[vn] = idx
        # `vns`
        push!(vns, vn)
        if vn in vns_left && vn in vns_right
            # `vals`: only valid if they're the same length.
            vals_left = getindex_internal(metadata_left, vn)
            vals_right = getindex_internal(metadata_right, vn)
            @assert length(vals_left) == length(vals_right)
            offset = update_metadata_from!(merged_metadata, metadata_right, vn, offset)
        elseif vn in vns_left
            offset = update_metadata_from!(merged_metadata, metadata_left, vn, offset)
        else
            offset = update_metadata_from!(merged_metadata, metadata_right, vn, offset)
        end
    end

    return merged_metadata
end

I ran the test/varinfo.jl tests on this and they passed, although I don't know what other downstream breakage this could cause.

penelopeysm · 2024-10-15T10:40:02Z

Definitely a matter for another PR, but I also feel a bit uncomfortable having "del" and "trans" everywhere: I would feel better if flags was defined as

struct MetadataFlags
    del::BitVector
    trans::BitVector
end

torfjelde · 2024-10-15T10:42:36Z

This is different from how TypedVarInfo is usually created, and might in some cases create Metadata objects with quite narrow element types. However, something like this would be needed for cases where new variables may be appear between samples (see the aforementioned PR), and I don't really see a downside to allowing this.

I guess I'm somewhat surprised this is needed to achieve feature parity with the current way we're doing things 😕 As in, the issue we've encountered in the mentioned PR doesn't require push to allow adding new named tuple entries (when using TypedVarInfo).

torfjelde · 2024-10-15T10:46:55Z

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Very much like this @penelopeysm ! But maybe easiest to just do a seperate PR to keep things + commits nice and simple.

torfjelde

LGTM! Don't have any objections except for a question regarding the push!! implementation. Don't have a strong opinion as to what's correct here, so approving.

Maybe worth bumping patch version?

torfjelde · 2024-10-15T10:44:37Z

src/varinfo.jl

+    end
+
+    sym = getsym(vn)
+    if vi isa TypedVarInfo && ~haskey(vi.metadata, sym)


I guess one downside of this ordeal is that it introduces a discrepancy between push!! and push!, the latter which cannot handle new named tuple entries.

I'm at peace with discrepancies in how ! and !! work, if the !! allows you to do something that can not be done with mutation, but is in line with the semantics of the ! version as well. Said differently, if the only reason ! doesn't do something is because it can not (since it must rely on mutation), then I think it's fine for !! to handle it.

mhauru · 2024-10-15T14:04:42Z

Seems like there's a lot of code repetition which could potentially be cleaned up, too.

Would something like this be better? [...]

Yeah, I think that would be an improvement. Also agree with @torfjelde that might be worth a separate PR. Note also that all this stuff is on its way out once the Gibbs sampler stuff gets sorted (the whole Metadata type is due for removal). Not opposed to doing the refactor though, especially if its low effort or you already have it worked out.

Definitely a matter for another PR, but I also feel a bit uncomfortable having "del" and "trans" everywhere: I would feel better if flags was defined as

Agree on this too, though maybe not worth fixing since Metadata is on its way out anyway?

I guess I'm somewhat surprised this is needed to achieve feature parity with the current way we're doing things 😕 As in, the issue we've encountered in the mentioned PR doesn't require push to allow adding new named tuple entries (when using TypedVarInfo).

I'm not sure if it will be used by the final solution to the Gibbs PR, but I would used it in my current attempt at a fix, where it would allow a sub-VarInfo for one sampler to capture new variables that are introduced but that are not under the domain of that sampler. The sub-VarInfo would then pass the new variables back to the global VarInfo, and at the next iteration the correct sampler would pick them up.

I would usually avoid introducing new features to DynamicPPL when I'm still unsure if the upstream Turing.jl code will actually need them, but to me this sounds like generally an improvement to the semantics of VarInfo, even if it ends up going unused in Gibbs.

Maybe worth bumping patch version?

Good point, done.

torfjelde · 2024-10-16T16:20:19Z

I'm not sure if it will be used by the final solution to the Gibbs PR, but I would used it in my current attempt at a fix, where it would allow a sub-VarInfo for one sampler to capture new variables that are introduced but that are not under the domain of that sampler. The sub-VarInfo would then pass the new variables back to the global VarInfo, and at the next iteration the correct sampler would pick them up.

I guess we spoke briefly about this yesterday, but I was more thinking that this wasn't explicitly used currently since the tests would be failiing in that case (as the current Gibbs sampler doesn't support this). But yeah, as we concluded yesterday, seems like a strict improvement to add this 👍

* Fix treatment of gid in merge(::Metadata) * Allowing pushing new symbols to TypedVarInfo * Bump patch version to 0.30.1

* Allow empty subsets of VarInfos (#692) * Allow empty subsets of VarInfos * Run JuliaFormatter Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * For VarInfo, fix merge and allow push!!ing new Symbols (#690) * Fix treatment of gid in merge(::Metadata) * Allowing pushing new symbols to TypedVarInfo * Bump patch version to 0.30.1 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

mhauru requested review from yebai, sunxd3 and penelopeysm October 14, 2024 16:40

mhauru added 2 commits October 14, 2024 17:41

Fix treatment of gid in merge(::Metadata)

bd4baf1

Allowing pushing new symbols to TypedVarInfo

d804ef1

mhauru force-pushed the mhauru/gid-merge-fix branch from dfa6338 to d804ef1 Compare October 14, 2024 16:42

penelopeysm reviewed Oct 15, 2024

View reviewed changes

torfjelde approved these changes Oct 15, 2024

View reviewed changes

Bump patch version to 0.30.1

78f12c5

mhauru enabled auto-merge October 17, 2024 08:06

mhauru added this pull request to the merge queue Oct 17, 2024

Merged via the queue into master with commit 27ba772 Oct 17, 2024
14 checks passed

mhauru deleted the mhauru/gid-merge-fix branch October 17, 2024 08:38

mhauru added a commit that referenced this pull request Oct 17, 2024

For VarInfo, fix merge and allow push!!ing new Symbols (#690)

4650230

* Fix treatment of gid in merge(::Metadata) * Allowing pushing new symbols to TypedVarInfo * Bump patch version to 0.30.1

mhauru mentioned this pull request Oct 17, 2024

Backports for 0.28 #694

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For VarInfo, fix merge and allow push!!ing new Symbols #690

For VarInfo, fix merge and allow push!!ing new Symbols #690

mhauru commented Oct 14, 2024

coveralls commented Oct 14, 2024

codecov bot commented Oct 14, 2024 •

edited

Loading

coveralls commented Oct 14, 2024 •

edited

Loading

sunxd3 commented Oct 15, 2024

mhauru commented Oct 15, 2024

penelopeysm left a comment •

edited

Loading

penelopeysm commented Oct 15, 2024

torfjelde commented Oct 15, 2024

torfjelde commented Oct 15, 2024

torfjelde left a comment

torfjelde Oct 15, 2024

mhauru Oct 15, 2024

mhauru commented Oct 15, 2024

torfjelde commented Oct 16, 2024

For VarInfo, fix merge and allow push!!ing new Symbols #690

For VarInfo, fix merge and allow push!!ing new Symbols #690

Conversation

mhauru commented Oct 14, 2024

coveralls commented Oct 14, 2024

Pull Request Test Coverage Report for Build 11331660892

Details

💛 - Coveralls

codecov bot commented Oct 14, 2024 • edited Loading

Codecov Report

coveralls commented Oct 14, 2024 • edited Loading

Pull Request Test Coverage Report for Build 11347620918

Details

💛 - Coveralls

sunxd3 commented Oct 15, 2024

mhauru commented Oct 15, 2024

penelopeysm left a comment • edited Loading

Choose a reason for hiding this comment

penelopeysm commented Oct 15, 2024

torfjelde commented Oct 15, 2024

torfjelde commented Oct 15, 2024

torfjelde left a comment

Choose a reason for hiding this comment

torfjelde Oct 15, 2024

Choose a reason for hiding this comment

mhauru Oct 15, 2024

Choose a reason for hiding this comment

mhauru commented Oct 15, 2024

torfjelde commented Oct 16, 2024

codecov bot commented Oct 14, 2024 •

edited

Loading

coveralls commented Oct 14, 2024 •

edited

Loading

penelopeysm left a comment •

edited

Loading