forked from SWI-Prolog/packages-semweb
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
100 lines (76 loc) · 3.69 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---+ Document hookable rdf_load/2
---+ Concurrent version of RDF-DB.
---++ Dealing with predicate clouds:
* Predicate clouds represent a set of interlinked predicates, regardless
of the generation.
* A predicate-cloud has
- A linked list of lifespan+reachability-matrices
* If two predicate clouds, both having predicates needs merging:
- Register that repeated querying is needed with alternate hashes.
Measurements using old subPropetyOf indexing and none (repeated queries):
- Using bench.pl, create(1000000) (1M), 100,000 subjects, 10 preds.
- All preds linked to p_root
- rdf(+,+,-) (first) all (non-root) predicates: 10.7 --> 2.3 sec. (old-->new)
- rdf_has(+,+,+) (failing) on root 1.5 --> 7.5.
Could implement as rdf(+,-,+). Would yield about 2.4 sec.
---++ Transaction semantics
* All goals in a transaction see the common database at the
generation of the transaction start and all modifications inside
the transaction.
* All goals in a _nested_ transaction see
- Common DB until start of outer transaction
- Changes made by the parent
- Changes inside the nested transaction
* Update semantics (TBD: Check with implementation)
- A transaction maintains a list of affected triples.
- A transaction is assigned a transaction generation, which count
from near the maximum generation.
- Each added triple is born at the transaction-generation
- Each triples deleted died at the transaction-generation
- If a transaction is _committed_ we scan the affected triples and
- Each added triple is born at the current generation
- Each deleted triple died at the current generation
- If a transaction is _rolled_back_
- Each added triple is born at the max generation
- GC will take care of them
- Each deleted triple died at the max generation (as initial)
* Transaction generation
- Must be far enough away from `now'; allocated near the end.
- All triples in a transaction are from the same generation.
- Nested transactions need a generation > parent
- Multiple threads may run multiple transactions concurrently.
- Groups: max-threads, max-nested? Ok: 1000*1000 = 1m is still
nothing.
- Outer transaction visible: generation of start
- Inner transaction visible: generation of start of outer +
generations of parent transactions.
---+ Adding triples
- Consider distribution of work between link_triple() and
add_triples(). There is a lot of room for optimization here.
- Speedup atom-table handling in Prolog
- Atomic atom-reference update
- Eventually: lock-free access to existing atoms
---+ Deleting triples
- Set deleted generation to current; increment current generation.
- GC does the remainder of the admin updates.
---+ Duplicates:
- Test for duplicates; set flag on all of them. [OK]
- Avoid duplicate answers by maintaining a table, only adding [OK]
triples that are flagged as duplicates.
- Implement duplicate elimination purely on new_answer() [OK]
---+ Hash-table design
- Implement optimization (like for triples for additional tables)
- resource-db
- predicate-db
- graph-db
- Perform expensive tests for need to rehash in a thread.
---+ Enhance loading
- Initial state: loading. Calls to add_triples simply load
triples; no indexing or duplicates are maintained.
- First query creates the hash it requires at the appropriate size.
- First delete or resized hash-table starts GC.
- Duplicate admin is started if new_answer() finds too many
duplicates. It is created using a thread.
---+ References
Lock-free library: http://www.liblfds.org/
Useful stuff: http://eternallyconfuzzled.com/tuts/datastructures/