forked from sbeamer/gapbs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNotes
170 lines (137 loc) · 6.64 KB
/
Notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
Connor Masterson
ideas on compression to reduce peak memory usage
1. change EdgeList to store (NodeID, DestID - NodeID) to compress EdgeList
- with large potentially NodeID's the difference could end up being smaller than the
DestID. e.g. instead of EdgePair <123456789, 123456799> => <123456789, 10>
but this would add an encoding and decoding step, could hurt performance if the
DestID - NodeID results in large number because this wouldn't yield a
lot of compression while still requiring the encoding and decoding. could
turn out to be unreliable.
- this is more of just a more specific application that came from learning about
difference encoding. changing the format of the EdgeList will require decoding
later on.
2. difference encoding
- {v - v0, v1 - v0, v2 - v1, ... } could be used to compress CSR. series of
k-bit blocks. first block could be negative so it has a sign bit as well
as the continue bit and data bits. stick with byte codes, not nibble codes.
this will speed up decoding because of byte alignment. hopefully decoding
can be parallelized.
(run-length encoded byte-codes require more storage but speed up decoding.
more details can be found in section 8 of Julian Shun's thesis.)
NOTE: makes more sense if DestID and NodeID are of type int.
3. reference encoding
- could be good for veritces with high degrees and similar adjacency lists
but would add extra computation, specifically in the form of comparisons.
might also require more jumping around in memory if adjacency lists are
constantly just pointing to other adjacency lists. (pointer chasing)
4. reordering for locality
- apparently this is np-hard so we'll get back to this later.
pseudo code for building graph in place and minimizing peak memory usage.
will attempt to do this with difference encoding. if this were implemented,
it would require that proper decoding be implemented as well. hopefully
that decoding won't become a bottle neck.
void MakeCSR(const EdgeList &el, bool transpose, DestID_*** index, DestID_** neighs) {
// this part stays the same
pvector<NodeID_> degrees = CountDegrees(el, transpose);
pvector<SGOffset> offsets = ParallelPrefixSum(degrees);
*neighs = new DestID_[offsets[num_nodes_]];
*index = CSRGraph<NodeID_, DestID_>::GenIndex(offsets, *neighs);
// this part is where the difference encoding is implemented
typename prev = some constant //perhaps the base mem address of the EdgeList
#pragma omp parallel for
for (auto it = el.begin(); it < el.end(); it++) {
Edge e = *it;
if (symmetrize_ || (!symmetrize_ && !transpose))
(*neighs)[fetch_and_add(offsets[e.u], 1)] = e.v - prev;
prev = e.v;
if (symmetrize_ || (!symmetrize_ && transpose))
source = GetSource(e) //this way we dont call it twice
(*neighs)[fetch_and_add(offsets[static_cast<NodeID_>(e.v)], 1)] = source - prev;
prev = source;
}
}
notes on this implementation: this trades peak mem usage for execution time because now
decoding must be implemented. hopefully the speedup gained from going to disk less often
outweighs the slowdown incurred from decoding. Whether or not this implementation yields
a speedup may depend on the system it is running on. perhaps some heuristic based off of
the specs of the system can determine the behavior the program chooses. Also, for decoding,
that initial constant value will have to be known by whatever function decodes the *neighs
array. Also, this does not actually split the number into k-byte blocks. that would require
the help of another function, written in pseudo code below.
void splitIntoKByteBlocks(&neighs[], int index, int v, ){
const uint8_t continueBit = 128; // 0b10000000
const uint8_t signBit = 64; // 0b01000000
uint8_t b = 6 most significant bits of v
if (b < 0) {
b += signBit;
}
if (v > 0) {
b += continueBit;
}
insert b into neighs at index and increment index
// this would be difficult to parallelize
while(v > 0){
b = 7 most significant bits of v
if (v > 0){
b += continueBit;
}
insert b into neighs at index and increment index
}
}
notes on this helper function: this function, while helping to save space, will seriously
hurt performance. this is beacause inserting elements into array will require that
everything after in the array will have to be manually shifted. this problem would be
alleviated if the neighbors array was instead a linked list. because of this, it was not
added into the above pseudo code for MakeCSR().
follow up/revision: this will have to be changed to work from the lsb to the msb. could be
base case that checks if v is initally less than 32 (0b00111111) and then can set the sign
bit for that one and only block, otherwise enter the while loop and work from there.
notes from talk on 7-6-20
read data twice
dcsr (double compressed)
static, dynamic, streaming graph (data set is too big to fit in memory)
vector size vs capacity
first touch policy: gets allocated close to wherever asks for data first
edgelist becomes neighbors (col) array
static dynamic conversion cast
row -> index // wikipedia terms -> our terms
col -> neighs
in place method 3
do it in vectors. vector of vectors
PseudoCode 2.0
void MakeCSRInPlace(const EdgeList &el, bool transpose, DestID_*** index, DestID_** neighs){
decltype() overWriteEL = el.data();
for (auto it = el.begin(); it < el.end(); it++)
Edge readFromEL = *it;
if (symmetrize_ || (!symmetrize_ && !transpose))
// (*neighs)[fetch_and_add(offsets[e.u], 1)] = e.v;
overWriteEL = e.v;
if (symmetrize_ || (!symmetrize_ && transpose))
// (*neighs)[fetch_and_add(offsets[static_cast<NodeID_>(e.v)], 1)] = GetSource(e);
overWriteEL = GetSource(e);
overWriteEL++;
}
Notes 8-19-20
symmetrize options
- two offsets
- check if edge already exists during first pass, add to some other pvector
- build graph one way, check if inverse edges exist, add to list if they dont exist
then add back and sort again
Scott's proposed algorithm
1. for all edges, local search to check if invers needed
2. make luist of needed edge editions
3. adjust offsets, fill in neighbors from back to front
4. resort local neighbors, (this could be done in the prev step)
Pseudocode for Scott's algorithm 8-21-2020
1.
pvector<Edge> neededInverseEdges
for v in vertices:
- for n in outneighs:
- - for outneighs of n:
- - - if v is not outneigh of n:
- - - - add Edge(n, v) to neededInverseEdges
new algorithm to get around missingInv getting too big
1. same pass over, identifying needed inverses
- store a count of needed inverses for each vertex
2. space out existing edge list to accomadate
3. pass over spaced out list and add inverses