My brief background introduction can be accessed here via my blog, which showcases several different cheminformatics, machine learning and data science projects in drug discovery using various software toolkits. The latest project I've worked on is an update on a previous longer post about small molecules in ChEMBL database. It has been updated and separated into four shorter posts covering aspects on data storage (1), preprocessing (2), machine learning model building (3) and evaluations (4).
There are also several other projects I've worked on over the past year or so such as:
- Cytochrome P450 and approved drugs - CYP3A4 and 2D6 inhibitors
- Tree series in machine learning on ChEMBL-derived data (decision tree 1, decision tree 2, decision tree 3, random forest, random forest classifier, boosted trees)
- Working with scaffolds in small molecules - Manipulating SMILES strings
- Molecular visualisation (Molviz) web application - Using Shiny for Python web application framework (interactive data table part)
- Shinylive app in Python - Embedding app in Quarto document (app embedded in web page) & using pyodide.http to import csv files
- Small molecules in ChEMBL database 1 - Parquet file in Polars dataframe library, 2 - Preprocessing data in Polars dataframe library, 3 - Building logistic regression model using scikit-learn and 4 - Evaluating logistic regression model in scikit-learn (other older posts - cross-validation & hyper-parameter tuning and re-training & re-evaluation with scikit-learn pending future updates)
Open-source contributions: practical_cheminformatics_tutorials, chembl_downloader