Format change for XDENS #160

bast · 2018-03-28T09:59:03Z

I would like to make it possible to read in density matrices in a "compressed" format where we skip zeros and very small numbers, in other words some sparse format. This would make it possible to put benchmarks into this repository and not explode disk space.

bast · 2018-04-03T16:26:51Z

I have renamed the issue subject since a more general solution to my problem is to change the format of XDENS. My problem is that we have gigantic files but most numbers in there are insignificant for tests and benchmarks.

My suggestion:

we change the XDENS format but don't break calculations with old/present format
we document and version the format
first line of XDENS would contain the format version
the file would then list matrix dimensions
followed by matrix row, matrix column, and the value

The advantage of this format is that one could leave out insignificant values and only provide the significant numbers. With this we could put benchmarks into the repo.

Suggestions/opinions?

bast · 2018-04-03T16:29:02Z

The code would check if first line contains a version. If not, default to old code. If yes, engage new read-in which allows leaving out all the zeros.

bast · 2018-04-03T16:39:08Z

The other bonus would be that you actually see which matrix row and column a number corresponds to. Now you just see a series of numbers and have no idea to what they belong to unless you look into the code.

heikef · 2018-04-05T16:38:41Z

We have to keep in mind that the structure for open-shell cases in XDENS might vary. Did you consider this? What sort of format do you have in mind? If we can switch it on/off and use it only for benchmarking purpose eg. sharing the benchmark it might be indeed useful. Going completely for the new format option is something that needs a bit more time from my point of view. All scripts/interfaces to other QM codes use the current format.

heikef · 2018-04-05T16:40:37Z

Your idea sounds reasonable, I do not understand yet what new format you have in mind and how and if it would affect applications.

heikef · 2018-04-05T16:43:11Z

We have an XDENS related pull request. This is probably related to this issue.
#162
It is about reading in very large XDENS files. So for this type of situation the modifications suggested by @bast could indeed be very useful.

bast · 2018-04-05T21:27:39Z

Well in this issue I want to replace very large XDENS files by small files with essentially the same result. Please note that most numbers in XDENS files contribute nothing to the final result. The proposal that I suggest would not break existing formats. The code would check whether this is old or new format and be able to read both.

heikef · 2018-04-06T14:58:52Z

I like your idea. If this works properly we would not need to merge the pull request #162. Again: did you consider open-shell as well? Or would this just be an convenient closed-shell option? I know in principle it should not matter but as far as I remember the ordering on XDENS changes for open-shell cases so I have to ask this question to be on the save side.
Otherwise, assuming that the old stuff, which is most likely needed for interfacing to other programs is not destroyed we should go for the option you suggest. Again: what is different in your format? To me it sound it is "just" a different way to read in things throwing immediately small numbers away.
In this case we have to keep in mind that the slicing procedure partly depends on very small numbers so perhaps you want to remove / ignore stuff that is needed.

bast · 2018-04-06T15:03:58Z

In the new format the order of elements will not matter.

heikef · 2018-04-06T15:15:36Z

What about open-shell?

heikef · 2018-04-06T15:16:06Z

The order of basis functions matters.

bast · 2018-04-06T16:54:16Z

Not for my suggested format :-)

heikef · 2018-04-08T16:41:25Z

As indicated above it is not clear to me what your new format exactly is. Time for a call next week. :)

bast · 2018-04-08T16:48:44Z

OK now XDENS is just a sequence of numbers

number
number
...
number

Problems:

you need all of them but most will be tiny
you need to know the order

What I suggest is:

format version on first line, followed by (row, column, number)
1 2 number
3 4 number
137 137 number

In this case we give 3 elements (1, 2), (3, 4), and (137, 137).

Advantages:

you don't have to list all (all elements not given default to zero)
they can come in any order you like

heikef · 2018-04-08T17:04:43Z

We have currently an XDENS file. Do you want to read and modify it in the alternative format? If yes you need to change probably as well the code inside gimic to deal with the new xdens file. What about the slicing procedure for the integration? The precision there probably will depend on the defined threshhold for the numbers to throw away.

bast · 2018-04-08T17:10:09Z

Yes, this would require modifying the code reading it. The computations will of course depend on the cutoff but one does not have to use a cutoff. My suggestion only enables to use a cutoff.

heikef · 2018-04-09T15:31:39Z

That is fine. We can go for it.

bast changed the title ~~Implement reading of sparse matrices~~ Format change for XDENS Apr 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format change for XDENS #160

Format change for XDENS #160

bast commented Mar 28, 2018

bast commented Apr 3, 2018

bast commented Apr 3, 2018

bast commented Apr 3, 2018

heikef commented Apr 5, 2018

heikef commented Apr 5, 2018

heikef commented Apr 5, 2018 •

edited

Loading

bast commented Apr 5, 2018

heikef commented Apr 6, 2018

bast commented Apr 6, 2018

heikef commented Apr 6, 2018

heikef commented Apr 6, 2018

bast commented Apr 6, 2018

heikef commented Apr 8, 2018

bast commented Apr 8, 2018

heikef commented Apr 8, 2018

bast commented Apr 8, 2018

heikef commented Apr 9, 2018

Format change for XDENS #160

Format change for XDENS #160

Comments

bast commented Mar 28, 2018

bast commented Apr 3, 2018

bast commented Apr 3, 2018

bast commented Apr 3, 2018

heikef commented Apr 5, 2018

heikef commented Apr 5, 2018

heikef commented Apr 5, 2018 • edited Loading

bast commented Apr 5, 2018

heikef commented Apr 6, 2018

bast commented Apr 6, 2018

heikef commented Apr 6, 2018

heikef commented Apr 6, 2018

bast commented Apr 6, 2018

heikef commented Apr 8, 2018

bast commented Apr 8, 2018

heikef commented Apr 8, 2018

bast commented Apr 8, 2018

heikef commented Apr 9, 2018

heikef commented Apr 5, 2018 •

edited

Loading