Si maker redesign #275

SiebeLeDe · 2024-06-29T20:21:34Z

Redesigns the SI maker to include an xyz format, which is only useful for singlepoint, geometryoptimization, and transitionstatesearches.

Includes a protocol interface with solely the "format" method so that other formatters can be made without having to change the design.

Furtermore, various type fixes are included in this pull request (including failing imports, log iterator typing in the log module), and documentation have been added to the multi_keys

… energy and enthalpy order (now enthalpy is first)

…formatters

…ganized the data module to be more structured and easier to read.

…ion for a XYZ writer. This class will be used to format the words in the input text. The class should have "write" method

…nt only the value of the property key and vibrations

src/tcutility/analysis/report/formatters/xyz.py

YHordijk · 2024-07-02T15:42:40Z

src/tcutility/analysis/report/report.py

+
+
+class SI:
+    def __init__(self, path: Union[str, pl.Path], append_mode: bool = False, font: str = "Arial", format: WordFormatter = XYZFormatter()) -> None:


the format argument here now defaults to the XYZformatter. The issue I see with this is that an SI can be more than only the xyz-coordinates. For example, we might want to add tables, figures, text, etc. It might be better for each type of section to have its own formatter. For example, the SI class can have the xyz_formatter, table_formatter, figure_formatter, etc. arguments. This way we can easily swap between different styles of tables, and xyz-coordinate sections.

Aha I see what you mean and I agree about that we should take into account different ways of writing the SI such as a table or figure formatter. Do you have ideas about what the arguments are when trying to write a table or figure?

I say this because one option is to introduce methods including "add_table", "add_figure", each having a default formatter similar to the xyz formatter.

The way I do it now is that the SI class has the add_table method which returns a DocxTable object (which we could rename DocxTableFormatter). The add_table method requires no further arguments, but we could add, for example, the number of the table.
This class is then responsible for writing the tables and has a few methods that should make it easy to add data to and format the table. The idea would then be to add a bunch of these that have different styles, for example maybe some of them have no lines on top, or use a different font, etc.
For Figures, it would be a similar story. We could make a DoxcFigureFormatter class that could have different styles. It could have methods such as add_picture(self, path: str, index: str) where index is for example the letters you commonly see for multi-figures (A, B, C, ...) etc.

SI class

class SI: def __init__(self, file='test.docx', overwrite=False): self.file = file if not os.path.exists(file) or overwrite: self.doc = docx.Document() else: self.doc = docx.Document(file) self.doc.styles['Normal'].font.name = 'Times New Roman' self.doc.styles['Normal'].font.size = Pt(12) self.doc.styles['Normal'].paragraph_format.space_after = 0 self.html_parser = htmldocx.HtmlToDocx() self.figure_number = 1 self.table_number = 0 def __enter__(self): return self def __exit__(self, *args): for section in self.doc.sections: # 1.9 cm to word point units section.left_margin = int(1.9 * 360_000) section.right_margin = int(1.9 * 360_000) self.doc.save(self.file) def add_table(self): self.table_number += 1 return DocxTable(self.doc, table_number=self.table_number) def add_pictures(self, paths, caption=None, width=None, height=None): p = self.doc.add_paragraph() r = p.add_run() width = int(width * 360_000) if width else None height = int(height * 360_000) if height else None for path in ensure_list(paths): r.add_picture(path, width=width, height=height) self.html_parser.add_html_to_document(f'Figure S{self.figure_number}. ' + caption, self.doc) for run in self.doc.paragraphs[-1].runs: if run.text.startswith('Figure'): add_bookmark(run, f'Figure S{self.figure_number}') break self.figure_number += 1 def add_page_break(self): self.doc.add_page_break() def write_paragraph(self, paragraph, text): for txt, settings in parse_text(text): run = paragraph.add_run(txt) for key, value in settings.items(): setattr(run.font, key, value) return paragraph def add_xyz(self, obj: str or dict, title: str): """ Add the coordinates and information about a calculation to the SI. It will add the electronic bond energy, Gibb's free energy, enthalpy and imaginary mode, as well as the coordinates of the molecule. Args: obj: a string specifying a calculation directory or a `TCutility.results.Result` object from a calculation. title: title to be written before the coordinates and information. """ if isinstance(obj, str): obj = results.read(obj) # title is always bold s = f"{title} " # add electronic energy. E should be bold and italics. Unit will be kcal mol^-1 E = str(round(obj.properties.energy.bond, 1)).replace("-", "—") s += f"E = {E} kcal mol—1 " # add Gibbs and enthalpy if we have them if obj.properties.energy.gibbs: G = str(round(obj.properties.energy.gibbs, 1)).replace("-", "—") s += f"G = {G} kcal mol—1 " if obj.properties.energy.enthalpy: H = str(round(obj.properties.energy.enthalpy, 1)).replace("-", "—") s += f"H = {H} kcal mol—1 " # add imaginary frequency if we have one if obj.properties.vibrations.number_of_imaginary_modes == 1: freq = abs(round(obj.properties.vibrations.frequencies[0])) s += f"νimag = {freq}i cm—1" # remove trailing line breaks s = s.removesuffix(" ") # coords should be written in mono-type font with 8 decimals and 4 spaces between each coordinate s += "<pre>" for atom in obj.molecule.output: s += f"{atom.symbol:2} {atom.coords[0]: .8f} {atom.coords[1]: .8f} {atom.coords[2]: .8f} " s += "</pre>" self.html_parser.add_html_to_document(s, self.doc)

DocxTable class

class DocxTable: def __init__(self, file='test.docx', table_number='x', font_size=Pt(10.5)): self.file = file self.dont_save = False if isinstance(file, docx.document.Document): self.dont_save = True self.doc = file else: if not os.path.exists(file): self.doc = docx.Document() else: self.doc = docx.Document(file) self.caption = '' self.table_number = table_number self.font_size = font_size self.columns = [] self.column_options = [] self.rows = [] self.mergers = [] self.html_parser = htmldocx.HtmlToDocx() def __enter__(self): return self def __exit__(self, *args): self.write() for run in self.doc.paragraphs[-1].runs: if run.text.startswith('Table'): add_bookmark(run, f'Table S{self.table_number}') break def add_column(self, name, **kwargs): self.columns.append(['single', name, kwargs]) self.column_options.append(kwargs) def add_column_group(self, group_name, column_names, **kwargs): self.columns.append(['grouped', group_name, column_names]) [self.column_options.append(kwargs) for _ in column_names] def add_row(self, data): self.rows.append(['data', data]) def add_header_row(self, name): self.rows.append(['header', name]) def add_empty_row(self): self.rows.append(['empty']) def merge_cells(self, x, y): if isinstance(x, int): x = (x, x) if isinstance(y, int): y = (y, y) self.mergers.append([x, y]) def _correct_size(self): num_cols = 2 * len([col for col in self.columns if col[0] == 'single']) + sum([len(col[2]) + 1 for col in self.columns if col[0] == 'grouped']) - 1 num_rows = 2 + len(self.rows) for _ in range(num_cols - len(self.tab.columns)): self.tab.add_column(int(360_000 * .1)) for _ in range(num_rows - len(self.tab.rows)): self.tab.add_row() def write(self): self.html_parser.add_html_to_document(f'Table S{self.table_number}. ' + self.caption, self.doc) self.doc.paragraphs[-1].alignment = WD_ALIGN_PARAGRAPH.JUSTIFY self.tab = self.doc.add_table(1, 1) self._correct_size() for x, y in self.mergers: self.tab.cell(x[0], y[0]).merge(self.tab.cell(x[1], y[1])) num_cols = 2 * len([col for col in self.columns if col[0] == 'single']) + sum([len(col[2]) + 1 for col in self.columns if col[0] == 'grouped']) - 1 num_rows = 2 + len(self.rows) # create a table self.tab.alignment = WD_TABLE_ALIGNMENT.CENTER # write the column headers spacing_columns = [] col_idx = 0 for col in self.columns: if col[0] == 'single': self.write_cell(1, col_idx, col[1], bold=True, font_size=self.font_size) spacing_columns.append(col_idx + 1) col_idx += 2 if col[0] == 'grouped': cell = self.write_cell(0, (col_idx, col_idx + len(col[2]) - 1), col[1], bold=True, bottom={'sz': 12, 'val': 'single', 'color': '#000000'}, font_size=self.font_size) for i, val in enumerate(col[2]): self.write_cell(1, col_idx + i, val, bold=True, font_size=self.font_size) spacing_columns.append(col_idx + len(col[2])) col_idx += len(col[2]) + 1 # set the lines for the top and bottom header rows for i in range(num_cols): set_cell_border(self.tab.cell(0, i), top={'sz': 12, 'val': 'single', 'color': '#000000'}) set_cell_border(self.tab.cell(1, i), bottom={'sz': 12, 'val': 'single', 'color': '#000000'}) for j, row in enumerate(self.rows): if row[0] == 'data': for i in range(num_cols): if i in spacing_columns: continue num_spacing_past = len([k for k in spacing_columns if (k - 1) < i]) self.write_cell(j + 2, i, row[1][i - num_spacing_past], font_size=self.font_size, **self.column_options[i - num_spacing_past]) if row[0] == 'header': cell = self.write_cell(j + 2, (0, num_cols-1), row[1], bold=True, top={'sz': 12, 'val': 'single', 'color': '#000000'}, bottom={'sz': 12, 'val': 'single', 'color': '#000000'}, bkgr_color='F2F2F2', font_size=self.font_size) if row[0] == 'empty': for i in range(num_cols): self.write_cell(j + 2, i, '') set_repeat_table_header(self.tab.rows[0]) set_repeat_table_header(self.tab.rows[1]) if not self.dont_save: self.doc.save(self.file) def write_cell(self, row, col, text, alignment='center', vert_alignment='center', bold=None, italic=None, bkgr_color=None, font_size=None, **kwargs): if isinstance(row, int) and isinstance(col, int): cell = self.tab.cell(row, col) else: if isinstance(row, int): row = (row, row) if isinstance(col, int): col = (col, col) cell = self.tab.cell(row[0], col[0]).merge(self.tab.cell(row[1], col[1])) if len(cell.paragraphs[0].runs) > 0: return alignment = { 'center': WD_ALIGN_PARAGRAPH.CENTER, 'left': WD_ALIGN_PARAGRAPH.LEFT, 'right': WD_ALIGN_PARAGRAPH.RIGHT, }[alignment] vert_alignment = { 'center': WD_ALIGN_VERTICAL.CENTER, 'top': WD_ALIGN_VERTICAL.TOP, 'bottom': WD_ALIGN_VERTICAL.BOTTOM, }[vert_alignment] self.html_parser.add_html_to_cell(text.replace('-', '–'), cell) cell.paragraphs[0].alignment = alignment cell.vertical_alignment = vert_alignment for run in cell.paragraphs[0].runs: if bold: run.bold = bold if italic: run.italic = italic if font_size: run.font.size = font_size set_cell_border(cell, **kwargs) if bkgr_color is not None: color_cell(cell, bkgr_color) return cell

Example usage

import random from tcutility import formula xcs = ['OLYP', 'CAM-B3LYP', 'BMK', 'M06-2X', 'MN12-SX'] radicals = ['CH3*', 'NH2*', 'OH*', 'SH*'] substrates = ['C2H2', 'C2H4'] # create a new SI and add a new table to it with SI('example.docx', overwrite=True) as main: with main.add_table() as table: # set the table caption table.caption = 'This is an example Table create with TCutility.report. Calculated at the QRO-CCSD(T)/CBS+ level of theory. Energies given in kcal mol-1.' # set up the column headers table.add_column('XC') table.add_column_group(formula.molecule('C2H2'), [formula.molecule(rad) for rad in radicals]) table.add_column_group(formula.molecule('C2H4'), [formula.molecule(rad) for rad in radicals]) # make up some random data for xc in xcs: # we can add header rows if xc == 'M06-2X': table.add_header_row('Below is M06-2X') row = [xc] for substrate in substrates: for radical in radicals: row.append(f'{2*random.random() - 1: .2f}') # and normal data rows table.add_row(row) # and also empty rows if xc == 'M06-2X': table.add_empty_row()

src/tcutility/analysis/report/report.py

YHordijk · 2024-07-03T09:09:23Z

Great additions! I think it will soon be in a ready state

…nymore

…tion)

SiebeLeDe and others added 19 commits May 26, 2024 20:30

Added format write as dependency injection. Also fixed the gibbs free…

4a6b9c6

… energy and enthalpy order (now enthalpy is first)

Added platformdirs >4.2.1

4989eba

Simplified writing XYZ files

759f29d

Added an example for report

852544d

Renamed report to _report to avoid naming conflicts and restructured …

7983899

…formatters

Shifting character widths, heights etc. to the data module. Also reor…

904638e

…ganized the data module to be more structured and easier to read.

Addd a protocol class for abstract Word Formatters and an implementat…

a06069b

…ion for a XYZ writer. This class will be used to format the words in the input text. The class should have "write" method

Simplified the SI writer for coordinates

70e39c7

added a checker if the calculation is suitable for a xyz writer

939bbd7

Does not work well. It prints the whole results object. I want to pri…

39cb9d1

…nt only the value of the property key and vibrations

Improved xyz formatting writer

67af4cc

Better formatting imaginary frequencies and titles

3bb0084

added more multikeys example to the docs

2e373c5

fixed reference to tcutility typing module vs python typing module

9985a4c

Various type fixes

17536f4

Improved the handling of title and make examples up to date

ce4a6b0

Better folder management and fixed import issues

cf42a43

line too long fix

ead3197

Updated workflow files

575805d

SiebeLeDe requested a review from YHordijk July 2, 2024 14:40