forked from rickli92/PLSQR3.2013.09
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
235 lines (207 loc) · 9.37 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
===============
* Parallel LSQR v3 (PLSQR3)
*
* (Hwang-Ho) He Huang
* Liqiang Wang
* Department of Computer Science, University of Wyoming
*
* John M. Dennis
* National Center for Atmospheric Research. Boulder, CO
*
* En-Jui Lee ([email protected])
* Po Chen ([email protected] ; [email protected])
* Department of Geology and Geophysics
* University of Wyoming
*
* Last update: 9/24/2013
*
* References:
*
* En-Jui Lee, He Huang, John M. Dennis, Po Chen, Liqiang Wang,
* An optimized parallel LSQR algorithm for seismic tomography,
* Computers & Geosciences, Volume 61, December 2013, Pages 184-197,
* ISSN 0098-3004, http://dx.doi.org/10.1016/j.cageo.2013.08.013.
* (http://www.sciencedirect.com/science/article/pii/S0098300413002409)
*
* Huang, H., Dennis, J.M., Wang, L., Chen, P., 2013.
* A scalable parallel LSQR algorithm for solving large-scale linear system for tomographic problems: a case study in seismic tomography.
* In: Proceedings of the 2013 International Conference on Computational Science (ICCS). Procedia Computer Science.
*
* He Huang, Liqiang Wang, En-Jui Lee, and Po Chen.
* An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR).
* In the 2012 International Conference on Computational Science (ICCS) (main track).
* Procedia Computer Science, Elsevier, 2012.
*
=================
||PLSQR3 Manual||
=================
source codes of PLSQR3 : PLSQR3.2013.09/source
tools for running PLSQR3 : PLSQR3.2013.09/PLSQR3_tools
example dataset for PLSQR3 testing : PLSQR3.2013.09/data
A. Input files of kernel matrix
(1). Kernel matrix data (ONLY store non-zero elements in kernel matrix)
1-based column and row indexing, sort by COLUMN
Format: binary
(could use programs in PLSQR3.2013.09/PLSQR3_tools/kernel_format to generate inputs)
Example:(in ASCII)
rowIdx(int) colIdx(int) value(double)
4 8 7.708820e-01
5 8 9.082630e-01
3 10 2.271540e-01
7 25 6.604270e-01
1 26 6.365470e-01
9 26 4.711560e-01
..........
(2). Information file of kernel matrix data
1st column is column index;
2nd column is the number of nonzero in this column;
3rd column is the displacement (offset) index from the beginning of the data file.
(if the third column is zero, that means the number of nonzero in this column is zero)
Format: ASCII
Example:
(int) (int) (long long)
......
21 0 0
22 1 1
23 1 2
24 1 3
......
684 6 655
685 4 661
686 5 665
687 0 0
......
B. Input files of damping matrix
(1). Row-sorted damping matrix data (ONLY store non-zero elements in damping matrix)
1-based column and row indexing, sort by ROW
Format: binary
Example:(in ASCII)
rowIdx(int) colIdx(int) val(double)
1 1 1.0
2 1 1.0
2 2 -2.0
2 3 1.0
3 1 1.0
3 10 -2.0
3 19 1.0
4 1 1.0
.........
(2). Column-sorted damping matrix data (ONLY store non-zero elements)
1-based column and row indexing, sort by COLUMN
Format: binary
Example:(in ASCII)
rowIdx(int) colIdx(int) val(double)
1 1 1.0
2 1 1.0
3 1 1.0
4 1 1.0
5 1 0.5
6 1 0.5
7 1 0.5
2 2 -2.0
8 2 1.0
.........
(3). Number of non-zero in each row for row-based damping matrix data
Format: binary (double)
Example: (in ASCII)
nnzPerRow(int)
1
3
3
3
.....
(4). Number of non-zero in each column for column-based damping matrix data
Format: binary (double)
Example:(in ASCII)
nnzPerColumn(int)
7
8
11
11
.......
C. Input of the measurement vector
Measurement values that correspond to kernel matrix (the values that correspond to damping matrix
are zrros and will be generated by the progeam)
Format: ASCII
Example:
measurement(ASCII)
-0.9897
-1.8150
0.0829
-0.2884
-0.6363
........
D. Execution command
mpiexec -np 16 /EXE/PATH/PLSQR3 -dir /YOUR/DATA/PATH -ker_f matrix_bycol.mm2.bin -ker_i matrix_bycol.mm2.info -damp_f damp_row_data.bin -damp_f_bycol damp_col_data.bin -damp_i damp_row_info.bin -damp_i_bycol damp_col_info.bin -b_k measurement.list -row_k 100 -row_d 1910672 -col 302940 -itn 100 -row_ptn damp_row.index -col_ptn ker_col.index
-dir: data directory, all the data files must be in this directory
-ker_f: kernel binary file, sort by column (details in A(1))
-ker_i: kernel information (details in A(2))
-damp_f: damping binary file, sort by row (details in B(1))
-damp_f_bycol: damping binary file, sort by column (details in B(2))
-damp_i: damping information for row-sorted damping matrix (details in B(3))
-damp_i_bycol: damping information for column-sorted damping matrix (details in B(4))
-b_k: measurement vector (details in C)
-row_k: kernel row number
-row_d: damping row number
-col: colume number
-itn: iteration number
-row_ptn: optional, row partition file (details in E(3)), if ignored, use even partition
-col_ptn: optional, col partition file (details in E(3)), if ignored, use even partition
Note: if -row_ptn and -col_ptn is not provided, then the program evenly partition row and column.
E. Other programs
(1). convert kernel matrix to PLSQR3 input format
source codes: PLSQR3.2013.09/PLSQR3_tools/kernel_format
1.1. convert ASCII kernel (row) files to binary (input for next setp)
execution command: ker_ascii2bin kernel_list
mpiexec -np 16 ker_ascii2bin ker.list
input "kernel_list": 1st row is number of kernel file and the rest of rows are name of kernel files.
input example:
100
AZ.BZN_CI.PER_BB.APBPnz
AZ.CPE_CI.BFS_BB.APBPnz
AZ.CPE_CI.MUR_BB.APBPnz
AZ.CRY_CI.BAR_BB.APBPnz
....
format of kernel files(ASCII):
colIdx ix iy iz values
4494 39 46 1 -8.181353e-07
4495 40 46 1 -1.029945e-06
4496 41 46 1 -1.101910e-06
4497 42 46 1 -1.090375e-06
.......
1.2. collect column information
execution commond: Ker2PLSQR3_preprocess kernel/path/ binary_kernel_list matrix_column_number
Ker2PLSQR3_preprocess PLSQR3.2013.09/data ker_bin.list 302940
input "binary_kernel_list": 1st column is name of binary kernel files; 2nd row is it's number of non-zero elements
input example:
AZ.BZN_CI.PER_BB.APBPnz.bin 8437
AZ.CPE_CI.BFS_BB.APBPnz.bin 13062
AZ.CPE_CI.MUR_BB.APBPnz.bin 8957
AZ.CRY_CI.BAR_BB.APBPnz.bin 12193
AZ.CRY_CI.SDR_BB.APBPnz.bin 10485
......
1.3. convert binary files to PLSQR3 input format
execution commond: Ker2PLSQR3 kernel/path binary_kernel_list matrix_column_number output_of_1.2
Ker2PLSQR3 PLSQR3.2013.09/data ker_bin.list 302940 PLSQR3.2013.09/data/col_info.txt
outputs are input files of PLSQR3
(2). reordered damping matrix for PLSQR
source code: PLSQR3.2013.09/PLSQR3_tools/damping_format
execution commond: damping_binary.py 1 1 damp 99 153 10 2 1.0 1.0 1.0 1.0 1.0 1.0 1.0
NOTE:this code only generates identity & Laplacian damping
(3). load balancing
source code: PLSQR3.2013.09/PLSQR3_tools/load_balancing
execution commond: load_balance_col_nz kernel_info_file info_file_of_col-sorted_damping col_number damping_row_number kernel_non-zero_number damping_non-zero_number col_ratio elem_ratio processor_number
load_balance_col_nz matrix_bycol_v4.mm2.info damp_v7I_D001_S001_col_data.bin 38093195 261330576 24384107533 818542016 2.125 1.45 640
there are two output files:
ker_col.index: kernel column and vector x partition
damp_row.index : damping row and vector y partition
in "ker_col.index" file, the column range of the kernel matrix for each core are stored in each row. For example,
1 12729592 ==> column range of the kernel matrix (vector x) for the first core
12729593 16414717 ==> column range of the kernel matrix (vector x) for the second core
Note that the value starts from 1.
in "damp_row.index" file, the row range of the damping for each core are stored in each row. For example,
1 83530320 ==> row range of the damping matrix (vector y) for the first core
83530321 109251895 ==> row range of the damping matrix (vector y) for the second core