-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connector enumeration + length-limit setup #673
Conversation
Add printing pass and word numbers.
- The enumeration itself is not used yet. - More shortcuts per connector descriptor are to be added. - The SAT parser code doesn't compile with these changes - to be fixed.
It is zeroed by zero_connector_table(). New connectors are inserted in front of the list.
Move to it tableNext from the connector structure. Also tune it to be 32 bytes. uc_hush/uc_num and str_hash can be defined later with typedef, to enable creating a version which supports more connector types.
Also, use actual connector descriptor table entries (less and more efficient code).
Define 3 new ones: connector_get_lc_start() connector_get_uc_start() connector_uc_hash()
uc_num is the ordinal number of a UC connector. Use this accessor when the code depends on the ordinal number and not a cached hash value.
On the test sentence: The problem is, or rather one of the problems, for there are many, a sizeable proportion of which are continually clogging up the civil, commercial, and criminal courts in all areas of the Galaxy, and especially, where possible, the more corrupt ones, this. it runs 2.5% faster when doing this packing.
... instead of for every connector.
This prevents too long lines. No need for _get, as we don't have or need connector set functions.
This gain a few percents of performance on long sentences.
... at verbosity=102. It may provide insides on the implementation effectively and how to improve it.
The string-hash uses the same definition just because the alignment allows for that.
Apply them in the order they appear in the dict, to allow more specific definitions to override previous ones.
Also add a legend.
Also refactor the connector descriptor table code.
- Change a local link-parser alias "lg" to link-parser. - Break some long commands with backslash/newline. - Add info on -v=102 (print connectors). - Update info on debug configuration. - Fix markdown of intra-word underline (with it and without it there is a problem in some markdown readers...). - Fix info rot in the example of sane_linkage_morphism().
The first linkage contains special.n instead of special.a. The reason is that now the expected linkage doesn't appear in the first 100 linkages, which is the default win the LG Python binding (it appears as number 108...). Fix it by setting the limit to 300, in a hope that this is enough future proof.
} else { | ||
for (l=e->u.l; l!=NULL; l=l->next) { | ||
build_connector_set_from_expression(conset, l->e); | ||
get_connectors_from_expression(conlist, cl_size, l->e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be slightly faster to return cl_size as a return value, rather than passing a pointer. When you increment a value at a location, the compiler has to assume a worst-case of possible aliasing, and constantly write it out to memory (cache). By contrast, a return value will usually stay in registers, maybe be written to stack occasionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will fix that.
} | ||
|
||
/* ======================================================== */ | ||
|
||
static bool connector_encode_lc(const char *lc_string, condesc_t *desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a description of what this does? It seems to be taking the lower-case part of a connector, normally an 8-bit quantity, and packing it into a 7-bit thing .. that seems like a lot of work, for not much savings. Since it is lower-case, you could, in principle, pack down to 5 bits, for more savings ... but is it really worth the effort? or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, never mind. I see what you are doing .. you have to copy the LC chars into the u64 anyway, and you have to compute the mask, anyway, and, so at almost no extra cost, you can shrink to 7 bits. But then, at almost no extra cost, you can subtract 0x60 and pack in 5 bits. On the other hand, I do not believe we have have more than maybe 5 or 6 lower-case letters... so this is not all that necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, at almost no extra cost, you can subtract 0x60 and pack in 5 bits. On the other hand, I do not believe we have have more than maybe 5 or 6 lower-case letters... so this is not all that necessary...
The LC part is not really only lower case letters. It may contain also numbers. So we have 36 possible characters, that can be packed into 6 bits. However, this is more complex than just removing the most significant bit since there is a gape between the values if the lower case letters and the numbers.
Not doing this packing means maximum 8 characters in the LC part, and I agree 9 is not a big improvement and may not even worth the slight overhead of doing it once. The reason I included it is copying from an old code I wrote, that included packing the UC part in 6 bits (the same packing code can be used). Packing the UC part is not needed now because the connector enumeration idea is better.
So I will remove this packing (or ifdef it out).
Also note that all[*] the connector related computations, including LC packing, are now done only once - at the dictionary read time, instead of per sentence (or even per connector) as before.
[*] Besides short_length/all_short_connectors, which can also be computed and cached only when changed, but I didn't implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Maybe its not that important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, at almost no extra cost, you can subtract 0x60 and pack in 5 bits. On the other hand, I do not believe we have have more than maybe 5 or 6 lower-case letters... so this is not all that necessary...
With such a sustruction method (e.g substructing 64 and taking the the absolute) it is possible to pack in 6 bits instead of 7 (I was wrong in saying this is more complex due to the gape), and also the longest LC part is 5 - the version (and domain) encoding. So with this packing 32-bits can hold 5 characters. However, this will prevent versions like V5v4v10
...
BTW, I once hacked the dict reading and added #define
for dict definitions (a very small change) (#
is already a special character for the dict).
Another BTW, for not opening an issue for that, I noted that reading the id
dict fails
with: Connectors beginning with "ID" are forbidden
. The easiest fix for that is to forbid only connectors in the format IDX when X is a capital letter string (as only that may clash with idiom connectors).
Is such a fix is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID: apparently, its needed, so that we can set the indonesian locale. So yes.
We do need version strings like V5v4v10
...
Note that the statistically-generated dictionaries currently use only upper-case letters, and have zero lower-case letters in them. And there are a lot of them: I haven't counted recently, but maybe hundreds of thousands (or more). The goal of learning is to shrink this number, but ...
* If 0, short_length (a Parse_Option) is used. If | ||
* all_short==true (a Parse_Option), length_limit | ||
* is clipped to short_length. */ | ||
char head_depended; /* 'h' for head, 'd' for depended, or '\0' if none */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typoe: "dependent", not "depended"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it,
const char *constring, int hash) | ||
{ | ||
*h = (condesc_t *)malloc(sizeof(condesc_t)); | ||
memset(*h, 0, sizeof(condesc_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sort-of surprised by this malloc. Why not just change condesc_table_alloc
so that it mallocs an array of condesc_t
instead of an array of pointers to condesc_t
? That way, you can skip this smaller alloc entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know in advance how many different connectors are in the dict, so this hash table can grow. I didn't want to pre-allocate a an array that cannot grow.
However, I now see it can be done in another, more efficient, way:
Make a big malloc for the initial guessed number of connectors, as set in dictionary_six_str().
If exceeded, realloc double space. Use it as a condesc_t
pool. Optionally, at the end, realloc to the exact used amount, in order to free the extra space. Another possibility is allocate by groups of condesc_t
elements, say 256 and in that way absolutely most of the malloc overhead is saved.
The above realloc way is simpler. I can implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I did not realize it was a hash table. It seemed to be getting used as a sequential list. I suppose I misread the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, hang on. condesc_t
doesn't have a next
pointer in it, so it cannot be chained, so it cannot be used in a hash table, unless you can guarantee zero collisions..... A hash table with zero-or-one items in each bucket ... Hmmm. Are you using some trick to spill to the next empty bucket? There was some old code that did that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so you are saying that condesc_t is large enough that it is better to have an array of pointers to them, rather than an array of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drawbacks of not using array of pointers:
- The size of condesc_t is 32, so a table of say 20K items (i think this what it may end up for "ru") will have a significant memory overhead. Since it is too large for an L2 CPU cache, searching in it may cause several times more CPU cache misses (with 8 pointers per cache line you may normally have no more than 3 misses).
- The sorting, which is essential here, is more costly (need to swap the elements content during sort).
However, I didn't made a benchmark that compares the different ways of doing it.
There is an additional way of doing it that maybe worth exploring: Use the dict creating code.
The idea is just to put the connector strings into their own dict, and enumerate them by listing this dict words (similarly to what !!*
command does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's OK. You can leave this alone; I was just reading through the code, got a little confused, and needed to ask a few questions. but this seems OK as is, No change needed.
On a different note, the tests/memleak
unit test crashes with a memory double-free. That's important.
|
||
n = make_or_node(&dict->exp_list, plu, min); | ||
} | ||
else | ||
{ | ||
dict_error(dict, "Unknown connector direction type."); | ||
dict_error(dict, "Unknown connector direction type '%c'."); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connector direction is never given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
Since the second argument of dict_error2() is never used, I can change this call to dict_error2(), and change dict_error2() to print an additional value nicely.
A better alternative can be to change dict_error() to a printf-like function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, hang on. condesc_t doesn't have a next pointer in it, so it cannot be chained, so it cannot be used in a hash table, unless you can guarantee zero collisions..... A hash table with zero-or-one items in each bucket ... Hmmm. Are you using some trick to spill to the next empty bucket? There was some old code that did that.
It is similar to the string_set code, in that it uses empty bucket (but in the case of my code I use a constant stride of 1, since it is more cache friendly, see below). It is also how it is done in the fast-matcher connector lists per word.
This can be improved by using Robin Hood placement (that I use in a WIP to replace the memoizing table of do_count()), and intend to try it here too.
The problem of using a next pointer for chaining is that it is very cache unfriendly, due to random memory access (the chained element has a very low chance to be in a close memory location). On the other hand, searching in an "in place" hash table accesses sequentially a relatively small memory region (very few elements are needed to be inspected in average and the worst case is small, if the table is kept, e.g, under 50% used), and due to the sequential access the CPUs knows to prefetch the next memory to be accessed.
|
||
#define CACHELINE 64 | ||
size_t dsize = dcnt * sizeof(Disjunct); | ||
dsize = (dsize+CACHELINE)&~(CACHELINE-1); /* Align connector block. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you align, but then below, you add csize of unknown alignment. I'm sort-of confused, because you then use what malloc provided, without actually aligning -- malloc might provide an address that is not 64-byte aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory arrangement in this allocation is:
DISJUNCT_ARRAY ALIGNEMNT_PAD CONNECTOR_ARRAY
The Connector struct is 32 bytes (and can be shrinked to 16, 8 and even 4 bytes - I would like to test all).
In order not to split connectors over two cache lines, I want CONNECTOR_ARRAY to start at 64-byte boundary. I get this here by adding an alignment pad at the end of the disjunct allocation.
(As far as I remember the Disjunct struc size is something like 44 bytes so there is no point to align its start and it doesn't usually end at the desired boundary.)
So it seems fine to me.
If desired, I can add a description like the above to the function comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember the Disjunct struc size is something like 44
(It is actually 56.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. If the code likes right to you, then OK, it looked strange to me but I did not try very hard to figure it uout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it will be clearer after I add comments to it.
Per discussion at PR opencog#673.
Address problems discussed in #673
I could not find (in order to link to it here) the issue in which I originally proposed the connector enumeration idea (you named this idea "enumeration" there).
For length-limit setup, see issue #632.
Main changes in this PR:
While reading the dictionary, put the fixed properties of each connector in a central table, and
precompute all properties which their value is now computed for each parsed sentence.
In the connector structure, use a pointer to this table (called in the code the connector descriptor table).
This also enables using the slot number of a connector in this table as a "perfect hash".
short_length/all_short_connectors parse options).
some connectors in the English dict.
I didn't check if it may improve the result of the power pruning, but it reduces its CPU time.
Some additional changes:
- a WIP).
Additional possible improvements using the new infrastructure:
a thread_local pointer is added, to point to the connector enumeration table.
improve the match cache hit ratio).