Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: use BoundedArray less #18067

Merged
merged 8 commits into from
Nov 22, 2023
Merged

std: use BoundedArray less #18067

merged 8 commits into from
Nov 22, 2023

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Nov 22, 2023

Here are 8 places where I found that the use of BoundedArray was better served by doing something else instead.

In fact, this is all the uses of BoundedArray in the standard library and compiler codebase. I am tempted to remove the API entirely since it in practice leads to less than ideal code, however, I think there is nothing fundamentally wrong with the idea of abstracting over an array and a length, so I think it can remain in the standard library for now.

@andrewrk andrewrk requested a review from squeek502 as a code owner November 22, 2023 04:52
@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Nov 22, 2023
Comment on lines 636 to 645
/// Initialize with externally-managed memory. The buffer determines the
/// capacity, and the length is set to zero.
pub fn initBuffer(buffer: []T) Self {
return .{
.items = buffer[0..0],
.capacity = buffer.len,
};
}
Copy link
Collaborator

@squeek502 squeek502 Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the resulting ArrayListUnmanaged will be full of footguns, as any function that can resize the list can try to resize or free the current items slice, which will either be illegal behavior (freeing a pointer to a stack-allocated array) or will likely lead to a double free later on (since the caller will likely free the buffer as well).

Would it make sense to have something like a FixedBufferList type for this use-case that can't resize but still has the convenience functions of ArrayList (append, appendSlice, Writer, etc)? Would probably end up being pretty similar to BoundedArray but would have items/capacity like ArrayList instead of having an array as a field.

Maybe the dangerous functions taking an Allocator is enough of a clue that they shouldn't be used after initBuffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's enough to add to add sentence to the doc comment; something like 'It is undefined behavior to call any method with an allocator parameter on the result, use the *AssumeCapacity variants.'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what happens in safe build modes:

const std = @import("std");

test "invalid free" {
    var buffer: [1]u8 = undefined;
    var list = std.ArrayListUnmanaged(u8).initBuffer(&buffer);

    try list.append(std.testing.allocator, 100);
    try list.append(std.testing.allocator, 200);
}
Test [1/1] test.invalid free... thread 1811364 panic: Invalid free
/home/andy/dev/zig/lib/std/heap/general_purpose_allocator.zig:567:21: 0x22654f in resizeLarge (test)
                    @panic("Invalid free");
                    ^
/home/andy/dev/zig/lib/std/heap/general_purpose_allocator.zig:735:40: 0x223264 in resize (test)
                return self.resizeLarge(old_mem, log2_old_align, new_size, ret_addr);
                                       ^
/home/andy/dev/zig/lib/std/mem/Allocator.zig:92:30: 0x270f7b in resize__anon_8229 (test)
    return self.vtable.resize(self.ptr, buf, log2_buf_align, new_len, ret_addr);
                             ^
/home/andy/dev/zig/lib/std/array_list.zig:1013:33: 0x2348f4 in ensureTotalCapacityPrecise (test)
            if (allocator.resize(old_memory, new_capacity)) {
                                ^
/home/andy/dev/zig/lib/std/array_list.zig:992:51: 0x22af88 in ensureTotalCapacity (test)
            return self.ensureTotalCapacityPrecise(allocator, better_capacity);
                                                  ^
/home/andy/dev/zig/lib/std/array_list.zig:1045:41: 0x227c7b in addOne (test)
            try self.ensureTotalCapacity(allocator, newlen);
                                        ^
/home/andy/dev/zig/lib/std/array_list.zig:804:49: 0x222d95 in append (test)
            const new_item_ptr = try self.addOne(allocator);
                                                ^
/home/andy/dev/zig/lib/test.zig:8:20: 0x222cab in test.invalid free (test)

I think it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: this should be Slice instead of []T.

@DivergentClouds
Copy link

Just as a note, one of the arguments in favor of #15528 was using bounded arrays. This isn't an argument for or against this PR.

@notcancername
Copy link
Contributor

notcancername commented Nov 22, 2023

Seems like a reasonable change. The initBuffer function has been long overdue. std.BoundedArray should probably be superseded by a bounded alloca as suggested in #225, which would address the issues of arrays and alloca respectively.

@jedisct1
Copy link
Contributor

With slices+usize or ArrayListUnmanaged, the backing buffer and the number of active items are distinct things, the only possible link between them possibly being some unwritten naming convention.

A major advantage of BoundedArray is that the structure can easily and safely be copied. Copying an ArrayListUnmanaged would result in dangling pointers or inconsistencies with different instances backed by the same buffer.

jedisct1 added a commit to jedisct1/zig-hpke that referenced this pull request Nov 22, 2023
@andrewrk andrewrk force-pushed the use-BoundedArray-less branch from eeb07cb to ab2911e Compare November 22, 2023 18:26
@andrewrk andrewrk changed the title std: remove BoundedArray std: use BoundedArray less Nov 22, 2023
This is useful when you want to have an array list backed by a fixed
slice of memory and no Allocator will be used.

It's an alternative to BoundedArray as you will see in the following
commit.
We definitely want ArrayList in the standard library. Do we want
BoundedArray? Maybe, maybe not. But that makes ArrayList a more stable
dependency for std.fs.
* Take advantage of multi-object for loops.
* Remove use of BoundedArray since it had no meaningful impact on safety
  or readability.
* Simplify some complex expressions, such as using `!` to invert a
  boolean value.
@andrewrk andrewrk force-pushed the use-BoundedArray-less branch from ab2911e to 8acae69 Compare November 22, 2023 18:32
In this case it improved maintainability because magic number `4` is no
longer repeated 3 times, and there is no longer a redundant branch in
the loop.
It incorrectly, was returning an error, when it actually wanted to
assert that the array bounds were not exceeded. Fixed by using ArrayList
instead.
This simplifies the logic. For example, in generateExactWidthType, it no
longer has to save a previous length and then use defer to reset the
length after mutating it.
I consider this change to be neutral. It inlines a bit of logic that
previously was abstracted, however, it also eliminates an instance of
`catch unreachable` as well as a clumsy failed pop / append in favor of
writing directly to the appropriate array element.
These arrays don't really all have an upper bound of 16; in fact they
have different upper bounds. Presumably the reason 16 was used for all
of them was to avoid code bloat with BoundedArray. Well, now even more
code bloat has been eliminated because now it's using
`ArrayList([]const u8)` which is certainly instantiated elsewhere.
Furthermore, the different corrected upper bounds can be specified at
each instance of the array list.
@andrewrk andrewrk force-pushed the use-BoundedArray-less branch from 8acae69 to a34a51e Compare November 22, 2023 18:33
@andrewrk andrewrk merged commit e4977f3 into master Nov 22, 2023
10 checks passed
@andrewrk andrewrk deleted the use-BoundedArray-less branch November 22, 2023 18:34
@andrewrk
Copy link
Member Author

@Vexu feel free to take these Aro patches or drop them on the next update. I don't care either way.

@ehaas
Copy link
Contributor

ehaas commented Nov 22, 2023

Whenever I've used std.BoundedArray it's been purely due to convenience / laziness (just a little bit less typing). e.g. the following are basically equivalent:

const std = @import("std");

test {
    var bounded_array: std.BoundedArray(u32, 4) = .{};
    bounded_array.appendAssumeCapacity(42);

    var buf: [16]u8 = undefined;
    var fib = std.heap.FixedBufferAllocator.init(&buf);
    var list = std.ArrayListUnmanaged(u32).initCapacity(fib.allocator(), 4) catch unreachable; // stack allocation already succeeded
    list.appendAssumeCapacity(42);
}

The downside is that it's a new type which makes it difficult to use them interchangeably with ArrayLists, which are much more common, as well as the code smells identified here which are actually better implemented with just arrays. So I like these changes. My one suggestion would be that if std.BoundedArray is kept, it should include a doc comment indicating that it should only be used if one needs the value semantics identified by @jedisct1

@nektro
Copy link
Contributor

nektro commented Nov 22, 2023

something that would really help the bloat is if comptime function instantiation was not tied to the type per se such that std.BoundedArray(u32, 4) and std.BoundedArray(u32, 6) were able to reuse many of their methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants