Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Si maker redesign #275

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Si maker redesign #275

wants to merge 25 commits into from

Conversation

SiebeLeDe
Copy link
Contributor

Redesigns the SI maker to include an xyz format, which is only useful for singlepoint, geometryoptimization, and transitionstatesearches.

Includes a protocol interface with solely the "format" method so that other formatters can be made without having to change the design.

Furtermore, various type fixes are included in this pull request (including failing imports, log iterator typing in the log module), and documentation have been added to the multi_keys

@SiebeLeDe SiebeLeDe requested a review from YHordijk July 2, 2024 14:40


class SI:
def __init__(self, path: Union[str, pl.Path], append_mode: bool = False, font: str = "Arial", format: WordFormatter = XYZFormatter()) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format argument here now defaults to the XYZformatter. The issue I see with this is that an SI can be more than only the xyz-coordinates. For example, we might want to add tables, figures, text, etc. It might be better for each type of section to have its own formatter. For example, the SI class can have the xyz_formatter, table_formatter, figure_formatter, etc. arguments. This way we can easily swap between different styles of tables, and xyz-coordinate sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha I see what you mean and I agree about that we should take into account different ways of writing the SI such as a table or figure formatter. Do you have ideas about what the arguments are when trying to write a table or figure?

I say this because one option is to introduce methods including "add_table", "add_figure", each having a default formatter similar to the xyz formatter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I do it now is that the SI class has the add_table method which returns a DocxTable object (which we could rename DocxTableFormatter). The add_table method requires no further arguments, but we could add, for example, the number of the table.
This class is then responsible for writing the tables and has a few methods that should make it easy to add data to and format the table. The idea would then be to add a bunch of these that have different styles, for example maybe some of them have no lines on top, or use a different font, etc.
For Figures, it would be a similar story. We could make a DoxcFigureFormatter class that could have different styles. It could have methods such as add_picture(self, path: str, index: str) where index is for example the letters you commonly see for multi-figures (A, B, C, ...) etc.

SI class
class SI:
    def __init__(self, file='test.docx', overwrite=False):
        self.file = file
        if not os.path.exists(file) or overwrite:
            self.doc = docx.Document()
        else:
            self.doc = docx.Document(file)

        self.doc.styles['Normal'].font.name = 'Times New Roman'
        self.doc.styles['Normal'].font.size = Pt(12)
        self.doc.styles['Normal'].paragraph_format.space_after = 0
        self.html_parser = htmldocx.HtmlToDocx()

        self.figure_number = 1
        self.table_number = 0

    def __enter__(self):
        return self

    def __exit__(self, *args):
        for section in self.doc.sections:
            # 1.9 cm to word point units
            section.left_margin = int(1.9 * 360_000)
            section.right_margin = int(1.9 * 360_000)

        self.doc.save(self.file)

    def add_table(self):
        self.table_number += 1
        return DocxTable(self.doc, table_number=self.table_number)

    def add_pictures(self, paths, caption=None, width=None, height=None):
        p = self.doc.add_paragraph()
        r = p.add_run()

        width = int(width * 360_000) if width else None
        height = int(height * 360_000) if height else None
        for path in ensure_list(paths):
            r.add_picture(path, width=width, height=height)

        self.html_parser.add_html_to_document(f'<b>Figure S{self.figure_number}.</b> ' + caption, self.doc)
        for run in self.doc.paragraphs[-1].runs:
            if run.text.startswith('Figure'):
                add_bookmark(run, f'Figure S{self.figure_number}')
                break
        self.figure_number += 1

    def add_page_break(self):
        self.doc.add_page_break()

    def write_paragraph(self, paragraph, text):
        for txt, settings in parse_text(text):
            run = paragraph.add_run(txt)
            for key, value in settings.items():
                setattr(run.font, key, value)
        return paragraph

    def add_xyz(self, obj: str or dict, title: str):
        """
        Add the coordinates and information about a calculation to the SI.
        It will add the electronic bond energy, Gibb's free energy, enthalpy and imaginary mode, as well as the coordinates of the molecule.

        Args:
                obj: a string specifying a calculation directory or a `TCutility.results.Result` object from a calculation.
                title: title to be written before the coordinates and information.
        """
        if isinstance(obj, str):
            obj = results.read(obj)

        # title is always bold
        s = f"<b>{title}</b><br>"

        # add electronic energy. E should be bold and italics. Unit will be kcal mol^-1
        E = str(round(obj.properties.energy.bond, 1)).replace("-", "—")
        s += f"<b><i>E</i></b> = {E} kcal mol<sup>—1</sup><br>"

        # add Gibbs and enthalpy if we have them
        if obj.properties.energy.gibbs:
            G = str(round(obj.properties.energy.gibbs, 1)).replace("-", "—")
            s += f"<b><i>G</i></b> = {G} kcal mol<sup>—1</sup><br>"
        if obj.properties.energy.enthalpy:
            H = str(round(obj.properties.energy.enthalpy, 1)).replace("-", "—")
            s += f"<b><i>H</i></b> = {H} kcal mol<sup>—1</sup><br>"

        # add imaginary frequency if we have one
        if obj.properties.vibrations.number_of_imaginary_modes == 1:
            freq = abs(round(obj.properties.vibrations.frequencies[0]))
            s += f"<b><i>ν<sub>imag</sub></i></b> = {freq}<i>i</i> cm<sup>—1</sup>"

        # remove trailing line breaks
        s = s.removesuffix("<br>")

        # coords should be written in mono-type font with 8 decimals and 4 spaces between each coordinate
        s += "<pre>"
        for atom in obj.molecule.output:
            s += f"{atom.symbol:2}    {atom.coords[0]: .8f}    {atom.coords[1]: .8f}    {atom.coords[2]: .8f}<br>"
        s += "</pre>"
        self.html_parser.add_html_to_document(s, self.doc)
DocxTable class
class DocxTable:
    def __init__(self, file='test.docx', table_number='x', font_size=Pt(10.5)):
        self.file = file
        self.dont_save = False
        if isinstance(file, docx.document.Document):
            self.dont_save = True
            self.doc = file
        else:
            if not os.path.exists(file):
                self.doc = docx.Document()
            else:
                self.doc = docx.Document(file)

        self.caption = ''
        self.table_number = table_number
        self.font_size = font_size
        self.columns = []
        self.column_options = []
        self.rows = []
        self.mergers = []
        self.html_parser = htmldocx.HtmlToDocx()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.write()
        for run in self.doc.paragraphs[-1].runs:
            if run.text.startswith('Table'):
                add_bookmark(run, f'Table S{self.table_number}')
                break


    def add_column(self, name, **kwargs):
        self.columns.append(['single', name, kwargs])
        self.column_options.append(kwargs)

    def add_column_group(self, group_name, column_names, **kwargs):
        self.columns.append(['grouped', group_name, column_names])
        [self.column_options.append(kwargs) for _ in column_names]

    def add_row(self, data):
        self.rows.append(['data', data])

    def add_header_row(self, name):
        self.rows.append(['header', name])

    def add_empty_row(self):
        self.rows.append(['empty'])

    def merge_cells(self, x, y):
        if isinstance(x, int):
            x = (x, x)
        if isinstance(y, int):
            y = (y, y)

        self.mergers.append([x, y])

    def _correct_size(self):
        num_cols = 2 * len([col for col in self.columns if col[0] == 'single']) + sum([len(col[2]) + 1 for col in self.columns if col[0] == 'grouped']) - 1
        num_rows = 2 + len(self.rows)

        for _ in range(num_cols - len(self.tab.columns)):
            self.tab.add_column(int(360_000 * .1))
        for _ in range(num_rows - len(self.tab.rows)):
            self.tab.add_row()

    def write(self):
        self.html_parser.add_html_to_document(f'<b>Table S{self.table_number}.</b> ' + self.caption, self.doc)
        self.doc.paragraphs[-1].alignment = WD_ALIGN_PARAGRAPH.JUSTIFY

        self.tab = self.doc.add_table(1, 1)
        self._correct_size()

        for x, y in self.mergers:
            self.tab.cell(x[0], y[0]).merge(self.tab.cell(x[1], y[1]))

        num_cols = 2 * len([col for col in self.columns if col[0] == 'single']) + sum([len(col[2]) + 1 for col in self.columns if col[0] == 'grouped']) - 1
        num_rows = 2 + len(self.rows)
        # create a table
        self.tab.alignment = WD_TABLE_ALIGNMENT.CENTER

        # write the column headers
        spacing_columns = []
        col_idx = 0
        for col in self.columns:
            if col[0] == 'single':
                self.write_cell(1, col_idx, col[1], bold=True, font_size=self.font_size)
                spacing_columns.append(col_idx + 1)
                col_idx += 2

            if col[0] == 'grouped':
                cell = self.write_cell(0, (col_idx, col_idx + len(col[2]) - 1), col[1], 
                                  bold=True,
                                  bottom={'sz': 12, 'val': 'single', 'color': '#000000'},
                                  font_size=self.font_size)

                for i, val in enumerate(col[2]):
                    self.write_cell(1, col_idx + i, val, bold=True, font_size=self.font_size)

                spacing_columns.append(col_idx + len(col[2]))
                col_idx += len(col[2]) + 1

        # set the lines for the top and bottom header rows
        for i in range(num_cols):
            set_cell_border(self.tab.cell(0, i), top={'sz': 12, 'val': 'single', 'color': '#000000'})
            set_cell_border(self.tab.cell(1, i), bottom={'sz': 12, 'val': 'single', 'color': '#000000'})

        for j, row in enumerate(self.rows):
            if row[0] == 'data':
                for i in range(num_cols):
                    if i in spacing_columns:
                        continue
                    num_spacing_past = len([k for k in spacing_columns if (k - 1) < i])
                    self.write_cell(j + 2, i, row[1][i - num_spacing_past], font_size=self.font_size, **self.column_options[i - num_spacing_past])

            if row[0] == 'header':
                cell = self.write_cell(j + 2, (0, num_cols-1), row[1],
                                  bold=True,
                                  top={'sz': 12, 'val': 'single', 'color': '#000000'},
                                  bottom={'sz': 12, 'val': 'single', 'color': '#000000'},
                                  bkgr_color='F2F2F2',
                                  font_size=self.font_size)

            if row[0] == 'empty':
                for i in range(num_cols):
                     self.write_cell(j + 2, i, '')

        set_repeat_table_header(self.tab.rows[0])
        set_repeat_table_header(self.tab.rows[1])

        if not self.dont_save:
            self.doc.save(self.file)

    def write_cell(self, row, col, text, alignment='center', vert_alignment='center', bold=None, italic=None, bkgr_color=None, font_size=None, **kwargs):
        if isinstance(row, int) and isinstance(col, int):
            cell = self.tab.cell(row, col)
        else:
            if isinstance(row, int):
                row = (row, row)
            if isinstance(col, int):
                col = (col, col)
            cell = self.tab.cell(row[0], col[0]).merge(self.tab.cell(row[1], col[1]))

        if len(cell.paragraphs[0].runs) > 0:
            return

        alignment = {
            'center': WD_ALIGN_PARAGRAPH.CENTER,
            'left': WD_ALIGN_PARAGRAPH.LEFT,
            'right': WD_ALIGN_PARAGRAPH.RIGHT,
        }[alignment]

        vert_alignment = {
            'center': WD_ALIGN_VERTICAL.CENTER,
            'top': WD_ALIGN_VERTICAL.TOP,
            'bottom': WD_ALIGN_VERTICAL.BOTTOM,
        }[vert_alignment]

        self.html_parser.add_html_to_cell(text.replace('-', '–'), cell)


        cell.paragraphs[0].alignment = alignment
        cell.vertical_alignment = vert_alignment
        for run in cell.paragraphs[0].runs:
            if bold:
                run.bold = bold
            if italic:
                run.italic = italic
            if font_size:
                run.font.size = font_size

        set_cell_border(cell, **kwargs)
        if bkgr_color is not None:
            color_cell(cell, bkgr_color)

        return cell
Example usage
import random
from tcutility import formula

xcs = ['OLYP', 'CAM-B3LYP', 'BMK', 'M06-2X', 'MN12-SX']
radicals = ['CH3*', 'NH2*', 'OH*', 'SH*']
substrates = ['C2H2', 'C2H4']

# create a new SI and add a new table to it
with SI('example.docx', overwrite=True) as main:
    with main.add_table() as table:
        # set the table caption
        table.caption = 'This is an example Table create with TCutility.report. Calculated at the QRO-CCSD(T)/CBS+ level of theory. Energies given in kcal mol<sup>-1</sup>.'

        # set up the column headers
        table.add_column('XC')
        table.add_column_group(formula.molecule('C2H2'), [formula.molecule(rad) for rad in radicals])
        table.add_column_group(formula.molecule('C2H4'), [formula.molecule(rad) for rad in radicals])

        # make up some random data
        for xc in xcs:
            # we can add header rows
            if xc == 'M06-2X':
                table.add_header_row('Below is M06-2X')

            row = [xc]
            for substrate in substrates:
                for radical in radicals:
                    row.append(f'{2*random.random() - 1: .2f}')

            # and normal data rows
            table.add_row(row)

            # and also empty rows
            if xc == 'M06-2X':
                table.add_empty_row()
Screenshot 2024-07-17 at 12 01 50

@YHordijk
Copy link
Contributor

YHordijk commented Jul 3, 2024

Great additions! I think it will soon be in a ready state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants