Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format change for XDENS #160

Open
bast opened this issue Mar 28, 2018 · 17 comments
Open

Format change for XDENS #160

bast opened this issue Mar 28, 2018 · 17 comments

Comments

@bast
Copy link
Member

bast commented Mar 28, 2018

I would like to make it possible to read in density matrices in a "compressed" format where we skip zeros and very small numbers, in other words some sparse format. This would make it possible to put benchmarks into this repository and not explode disk space.

@bast bast changed the title Implement reading of sparse matrices Format change for XDENS Apr 3, 2018
@bast
Copy link
Member Author

bast commented Apr 3, 2018

I have renamed the issue subject since a more general solution to my problem is to change the format of XDENS. My problem is that we have gigantic files but most numbers in there are insignificant for tests and benchmarks.

My suggestion:

  • we change the XDENS format but don't break calculations with old/present format
  • we document and version the format
  • first line of XDENS would contain the format version
  • the file would then list matrix dimensions
  • followed by matrix row, matrix column, and the value

The advantage of this format is that one could leave out insignificant values and only provide the significant numbers. With this we could put benchmarks into the repo.

Suggestions/opinions?

@bast
Copy link
Member Author

bast commented Apr 3, 2018

The code would check if first line contains a version. If not, default to old code. If yes, engage new read-in which allows leaving out all the zeros.

@bast
Copy link
Member Author

bast commented Apr 3, 2018

The other bonus would be that you actually see which matrix row and column a number corresponds to. Now you just see a series of numbers and have no idea to what they belong to unless you look into the code.

@heikef
Copy link
Contributor

heikef commented Apr 5, 2018

We have to keep in mind that the structure for open-shell cases in XDENS might vary. Did you consider this? What sort of format do you have in mind? If we can switch it on/off and use it only for benchmarking purpose eg. sharing the benchmark it might be indeed useful. Going completely for the new format option is something that needs a bit more time from my point of view. All scripts/interfaces to other QM codes use the current format.

@heikef
Copy link
Contributor

heikef commented Apr 5, 2018

Your idea sounds reasonable, I do not understand yet what new format you have in mind and how and if it would affect applications.

@heikef
Copy link
Contributor

heikef commented Apr 5, 2018

We have an XDENS related pull request. This is probably related to this issue.
#162
It is about reading in very large XDENS files. So for this type of situation the modifications suggested by @bast could indeed be very useful.

@bast
Copy link
Member Author

bast commented Apr 5, 2018

Well in this issue I want to replace very large XDENS files by small files with essentially the same result. Please note that most numbers in XDENS files contribute nothing to the final result. The proposal that I suggest would not break existing formats. The code would check whether this is old or new format and be able to read both.

@heikef
Copy link
Contributor

heikef commented Apr 6, 2018

I like your idea. If this works properly we would not need to merge the pull request #162. Again: did you consider open-shell as well? Or would this just be an convenient closed-shell option? I know in principle it should not matter but as far as I remember the ordering on XDENS changes for open-shell cases so I have to ask this question to be on the save side.
Otherwise, assuming that the old stuff, which is most likely needed for interfacing to other programs is not destroyed we should go for the option you suggest. Again: what is different in your format? To me it sound it is "just" a different way to read in things throwing immediately small numbers away.
In this case we have to keep in mind that the slicing procedure partly depends on very small numbers so perhaps you want to remove / ignore stuff that is needed.

@bast
Copy link
Member Author

bast commented Apr 6, 2018

In the new format the order of elements will not matter.

@heikef
Copy link
Contributor

heikef commented Apr 6, 2018

What about open-shell?

@heikef
Copy link
Contributor

heikef commented Apr 6, 2018

The order of basis functions matters.

@bast
Copy link
Member Author

bast commented Apr 6, 2018

Not for my suggested format :-)

@heikef
Copy link
Contributor

heikef commented Apr 8, 2018

As indicated above it is not clear to me what your new format exactly is. Time for a call next week. :)

@bast
Copy link
Member Author

bast commented Apr 8, 2018

OK now XDENS is just a sequence of numbers

number
number
...
number

Problems:

  • you need all of them but most will be tiny
  • you need to know the order

What I suggest is:

format version on first line, followed by (row, column, number)
1 2 number
3 4 number
137 137 number

In this case we give 3 elements (1, 2), (3, 4), and (137, 137).

Advantages:

  • you don't have to list all (all elements not given default to zero)
  • they can come in any order you like

@heikef
Copy link
Contributor

heikef commented Apr 8, 2018

We have currently an XDENS file. Do you want to read and modify it in the alternative format? If yes you need to change probably as well the code inside gimic to deal with the new xdens file. What about the slicing procedure for the integration? The precision there probably will depend on the defined threshhold for the numbers to throw away.

@bast
Copy link
Member Author

bast commented Apr 8, 2018

Yes, this would require modifying the code reading it. The computations will of course depend on the cutoff but one does not have to use a cutoff. My suggestion only enables to use a cutoff.

@heikef
Copy link
Contributor

heikef commented Apr 9, 2018

That is fine. We can go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants