Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAT solver cost #820

Merged
merged 9 commits into from
Oct 2, 2018
Merged

SAT solver cost #820

merged 9 commits into from
Oct 2, 2018

Conversation

ampli
Copy link
Member

@ampli ampli commented Oct 2, 2018

Fix the SAT parser disjunct cost to take into account null expression cost. See (1) in this post in issue #188. (It seems I didn't open a new issue for that and instead continued the discussion in the same issue.)

Also fix the total cost per disjunct to be according to the calculation that is done in the classic parser, including the cost-cutoff calculation (see issue #783).

The result is that the reported costs (can be inspected with !disjunct) should now be identical to those reported by the classic parser. Note that because duplicate disjuncts are not discarded, and they may have different costs (in the classic parser only the one with the lowest cost is retained) the same parse can be shown multiple times, with different costs.

I had a difficulty to word the ChangeLog description.
I also used there "Modify" and not "fixed" because I am not convinced yet that the cost-cutoff calculation method that is currently used is "correct" (or "optimal").

I did comparison tests (SAT vs classic parsers) when I produce all the possible solutions for each input sentence. The order of the parses is of course different, and also the SAT parser produces many duplicates. But by using md5 hashes of each parse I validated that they produce exactly the same parses. I didn't try to programmatically match the reported disjunct costs - I only inspected them manually and they seem identical (if you compare to the linkage with the lower total cost among the duplicates).

If you find different costs please tell me and I will try to investigate/fix that.

Some remaining problems (in addition to the open one from issue #188):

  1. I suspect that the cost-guiding solution preference is currently broken (the intention is that the SAT solver will produce solutions with lower cost first.) I still don't know how to fix that.
  2. The reported connector order (as displayed by !disjuncts) is not correct (so the original disjuncts cannot be correctly reconstructed and used). I know how to fix that.
  3. Panic timeout is not implemented (will be fixed when fixing Fixing panic mode #785 is done).
  4. "Robust parsing" (parsing with nulls) is not implemented. I started this work but still need to learn more some SAT related material in order to make an efficient implementation.
  5. Improving the speed in various ways (there is very much room for that).

ampli added 9 commits October 2, 2018 19:59
When distributing over AND_type, attribute the cost to the first
and'ed term only (instead of to each of them).

The cost calculation in generate_satisfaction_for_expression() is left
intact for now, because changing it interferes with the cost cutoff
calculation.
There cost is to be recovered on linkage extraction if they happen to
reside on a participating disjunct.
The SAT-parser already fills in this info.
This fix allows to discard an unintended linkage.
It is to be used to discard linkages with cost > cost_max (not actually
done, at least for now).

In this opportunity an malloc() of the linkage struct is eliminated
by allocating it on the stack.

Also discard the FIXME on create_linkage, because compute_chosen_disjuncts()
doesn't exists now and its replacement process_linkages() is too different.
Not actually used because it is currently incompatible to the classic
parser (in which a total linkage cost can be greater than cost_cutoff).
/* Loop until a good linkage is found.
* Insane (mixed alternatives) linkages are always ignored.
* Disconnected linkages are normally ignored, unless
* !test=linkage-disconnected is used (and they are sane) */
bool linkage_ok;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should almost certainly initialize here, e.g. linkage_ok = false.

@linas linas merged commit 2141564 into opencog:master Oct 2, 2018
@linas
Copy link
Member

linas commented Oct 2, 2018

The changelog entry is fine. A 'corporate-speak' variation of if it might be

Revise the SAT parser cost model to align it with the classic parser.

I'm still nervous about the finer details of how the costs are being computed and handled. I'm working on the theoretical side (you've seen the skippy.pdf, I'm sure) and it's not obvious that what I'm thinking of there actually matches what the code is doing. However, I don't have time to think about this now. Just be aware that, in the long run, costs need to behave as if they were log-probabilities, or rather, as if they were mutual-information. More or less. Details TBD.

@ampli
Copy link
Member Author

ampli commented Oct 2, 2018

It anyway must be reassigned inside the loop for each iteration (line 1584).

However, I admit it is not clear enough. I'm not sure that adding an initial assignment will make it clearer,
since it is not actually needed.

So the question remains: How to make it clearer?

@linas
Copy link
Member

linas commented Oct 2, 2018

It's fine. I'm reading the diffs, not the full code, so things like loop structure are not particularly visible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants