Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive: improvements for chrony story #724

Merged
merged 33 commits into from
May 19, 2022
Merged

Interactive: improvements for chrony story #724

merged 33 commits into from
May 19, 2022

Conversation

sim642
Copy link
Member

@sim642 sim642 commented May 5, 2022

Changes

  1. Fix var_eq and symb_locks such that they are not confused about IndexPI expressions.
  2. Add __goblint_assume_join special Goblint builtin to threadJoins analysis to annotate thread being joined even if Goblint doesn't consider it unique. This can be used as an annotation after a pthread_join loop.
  3. Make realloc argument read and free accesses non-transitive (we really need Redesign of LibraryFunctions specification #719).
  4. Add option sem.unknown_function.read.args for also ignoring read accesses from unknown function arguments (sem.unknown_function.invalidate.args from Add option sem.unknown_function.invalidate.args #707 only concerns invalidation, i.e. writing).

src/analyses/threadJoins.ml Outdated Show resolved Hide resolved
src/cdomains/exp.ml Outdated Show resolved Hide resolved
src/cdomains/exp.ml Outdated Show resolved Hide resolved
Comment on lines 533 to 534
| BinOp ((PlusPI | IndexPI), e1, e2, _) ->
eq_set_clos e1 s (* TODO: what about e2? add to some Index offset to all? *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe comment on the impact that this has now? Are we tracking equalities involving pointer arithmetic? If yes, how confident are we that we do useful things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These additions aren't changing the equalities that are tracked, but just allows them to be used with array indexing via pointer addition (which CIL sometimes inserts instead of actual Index offset that the analysis already handles).

It's needed for the following pattern from one of the added tests:

  pthread_mutex_lock(&entry->refs_mutex);
  entry[0].refs++; // NORACE
  pthread_mutex_unlock(&entry->refs_mutex);

The symb_locks analysis uses the EqualSet query to look up equalities for entry[0].refs, but the old behavior meant that an empty set was returned, so we couldn't infer that it has the same prefix as the locking expression.

A less trivial case of it comes from the minimized chrony test, where the locking actually happens outside the function where the access is, so there's already an equality ip_addrs = inst->addresses known to the analysis and there's a symbolic lock inst->mutex, but then the access ip_addrs[0] isn't found to be protected because to find that out, the equality needs to be used inside the index expression.
Note that the index itself is completely irrelevant because all that's needed is the common inst prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are a bit too eager to declare access as non-racy here, consider the example I added:

void cache_entry_addref(struct cache_entry *entry) {
  pthread_mutex_lock(&entry->refs_mutex);
  entry->refs++; // NORACE
  (*entry).refs++; // NORACE
  entry[1].refs++; // RACE
  pthread_mutex_unlock(&entry->refs_mutex);
}

These accesses at entry[1] are racy, as the protecting mutex for that element is not held (I fixed the loop to only run until 9, so it's not UB). Also passing a t* and then using array access on it is a very common idiom in C afaik.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite an evil example, but I'm rather skeptical that we can do much about this. The function in isolation, as presented here, just gets a pointer and then accesses entry[1]. For an arbitrary pointer that may point to any memory and therefore a write to it may invalidate any memory, including any other program variable (any global, any local of any stack frame, any dynamically allocated memory). Of course in this particular case it doesn't, but in Goblint we have no way of knowing that, because for addresses (like the pointer here) we don't keep track of their context (whether they're pointing into an array or not).

I think our much broader assumption is that pointer arithmetic stays "within the direct object" (for the lack of a better definition) regardless of its context, because we simply don't know about the broader memory layout (even if in this case it seems obvious). To make weaker assumptions than that, we would either need some crazy possible memory layout analysis or, lacking that, assume the worst and invalidate everything (not just globals!).

Also passing a t* and then using array access on it is a very common idiom in C afaik.

Is it though? If you want to use it as an array, you might as well pass it as an array (which is a pointer under the hood anyway), to make it clear that it's an array (usually with another length argument), not a pointer to an isolated object.

Now thinking about bad things, if the argument were an array and we copy the array domain into the function and mutated it, do those writes even propagate back to the caller's array domain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assumption we should be making is that

    1. pointer arithmetic never leaves the corresponding varinfo (or blob for malloced things),
    1. and never can cross struct boundaries once one has descended (because the compiler may have inserted padding), so no adding magic offsets to get from struct member a to a in the next element of the array unless a is the first struct element.
      That seems not too strong but also not as radical as setting everything to top. (And should be possible as our addresses maintain offsets and have abstract values at the granularity of variables anyway).

I think if we assume anything more, we cannot seriously claim we are sound anymore. I personally would be extremely unhappy with Goblint accepting this program as race-free! (How do @vesalvojdani @stilscher @jerhard feel here?)

Should we put this on the agenda at the Gobcon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking about bad things, if the argument were an array and we copy the array domain into the function and mutated it, do those writes even propagate back to the caller's array domain?

I'd hope so. We pass a pointer to the array, hence the entire(!) array will go to the callee, we update things there and should then copy the value over in combine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this is somehow also related to #434? Because the thing is, this change simply allows the entry[0] or entry[1] expression to preserve its prefix (previously we gave up on index access of pointer). Index access on array type has been previously supported for a long time and this shouldn't be different: symbolic locking happens on the basis of common prefix between the mutex and the access (and we have at least one paper on this!). So I'm quite surprised how this can suddenly be going wrong.

cc: @vesalvojdani

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope so. We pass a pointer to the array, hence the entire(!) array will go to the callee, we update things there and should then copy the value over in combine?

I haven't dug into this now, but this is exactly where I'm worried. Sure, if you take a pointer of the array and pass that to a function, it gets passed along as reachable and combined back. But what if you pass the array directly (without making a pointer to it first). In the caller, the local domain contains an array abstraction directly under the array variable name, there isn't an address domain containing an address to the array contents. This is analogous to how an integer variable contains the integer abstraction directly. And when they are passed as arguments, they are copied — there's a copy of the integer's abstraction directly under the callee's argument and copy of the array's abstraction directly under the callee's argument. On combine, the internal integer abstraction is simply removed and so would the callee's (modified) array one. Or do we have some special logic somewhere there specifically for arrays? I don't recall at least.

I personally would be extremely unhappy with Goblint accepting this program as race-free!

Indeed in the non-UB test you added, we should not. It's just the symbolic locks that then might be assuming something stronger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there's a granularity difference at play here as well:

  • The cache_entry tests in our suite use an array of structs, where each struct has a mutex protecting its data.
  • The chrony DNS_Async_Instance struct, for which this is needed, is a standalone struct (not one in an array), where the single mutex protects an entire array inside the struct.

Because they are structurally different like that, I'd hope we can actually be sound for the tests you added and the story in chrony. I'll have to dig into how exactly these different granularity locking schemes, and per-field vs per-index symbolic locks are supposed to work, because these cases involve both at the same time (but in different order!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case for chrony is easier than the more general one, where the unsoundness problem came in. I now fixed that using very ad-hoc handling that's still sufficient for chrony.

@sim642 sim642 merged commit ab3ddff into interactive May 19, 2022
@sim642 sim642 deleted the chrony branch May 19, 2022 14:15
sim642 added a commit that referenced this pull request May 19, 2022
@sim642 sim642 mentioned this pull request May 19, 2022
5 tasks
sim642 added a commit that referenced this pull request May 19, 2022
@sim642 sim642 mentioned this pull request Aug 4, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants