Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres performance #361

Merged
merged 10 commits into from
Aug 4, 2021
Merged

Postgres performance #361

merged 10 commits into from
Aug 4, 2021

Conversation

max-hassani
Copy link
Member

@max-hassani max-hassani commented Jun 28, 2021

The new class is intended to monitor important parameters of pyiron central database. Here, it is assumed that the database is a Postgresql database. The final data is presented as pandas dataframe.

the use case should be :

from pyiron_base import DatabaseStatistics
db_stat = DatabaseStatistics()
df = db_stat.performance()

The references for the queries can be found:
[1] https://russ.garrett.co.uk/2015/10/02/postgres-monitoring-cheatsheet/
[2] https://pgdash.io/blog/essential-postgres-monitoring-part2.html
[3] https://wiki.postgresql.org/wiki/Index_Maintenance

@max-hassani
Copy link
Member Author

Up to an extent, I converted the sql queries to sqlalechemy expressions, but some of them are really long and complicated to convert. But if the reviewers see it essential, I will try to convert all the queries to sqlalchemy expressions.

@max-hassani
Copy link
Member Author

Of course a unittest will be added. I have to first figure out how to create a postgres database via a github workflow.

@stale
Copy link

stale bot commented Jul 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 13, 2021
@stale stale bot closed this Jul 27, 2021
"""
stmt = select(func.count()).select_from(self._stat_view)
result = conn.execute(stmt)
self._performance_dict['total num. connection'] = result.fetchone()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this code hard to read. From my perspective it would be easier to return the entry rather than writing to the dictionary. This also simplifies debugging later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the whole code or this function in particular? There are multiple functions which modifies the self._performance_dict. Some of them, like here and here would be more complex to be returned an entry. But definitely can be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like each function to return a sub dictionary and then use dict.update() to merge them in the end. https://python-reference.readthedocs.io/en/latest/docs/dict/update.html This way we can debug the individual functions, look at the output they return. Following the zen of python: Explicit is better than implicit. https://www.python.org/dev/peps/pep-0020/

Copy link
Member Author

@max-hassani max-hassani Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the links. Now each method returns a dictionary which is used at the end to update self._performance_dict.

@jan-janssen
Copy link
Member

@muh-hassani I added a few comments but apart from those I guess it would be great to merge this functionality soon.

@max-hassani max-hassani reopened this Jul 28, 2021
@stale stale bot removed the stale label Jul 28, 2021
@jan-janssen
Copy link
Member

I like the class still I can not stop wondering if it would not be easier for the user to have just one function?

from pyiron_base import get_database_statistics
get_database_statistics()

This function could be defined as:

def get_database_statistics():
    return DatabaseStatistics().performance()

@max-hassani
Copy link
Member Author

I like the class still I can not stop wondering if it would not be easier for the user to have just one function?

from pyiron_base import get_database_statistics
get_database_statistics()

This function could be defined as:

def get_database_statistics():
    return DatabaseStatistics().performance()

I agree, it makes the usage much easier. I have done it, the way you suggested.
I also made two methods as separate functions, as they did not need to be part of the class. You can find them here and here

@max-hassani
Copy link
Member Author

@jan-janssen, if you approve the current status of this PR, I can merge it.

Copy link
Member

@jan-janssen jan-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jan-janssen, if you approve the current status of this PR, I can merge it.

Looks good to me.

@jan-janssen jan-janssen marked this pull request as ready for review August 3, 2021 13:44
@max-hassani max-hassani merged commit e3aa971 into master Aug 4, 2021
@delete-merged-branch delete-merged-branch bot deleted the postgres_performance branch August 4, 2021 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants