-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build data set #21
Comments
Hello! We want to use your model, but currently don't understand how to build the dataset to train the predictor and generator. Can you share the code to create the training data from the CIF files? |
@qoffee @Yong-Q Sorry for the late reply. I've pushed the data preparation notebook file in this commit. I must admit that the process of constructing the dataset for the generator, predictor, and reinforcement learning is quite complex. To help with this, I've shared the Google Drive link containing the data we used. Since the generator is already pretrained with this dataset, you won't need to train it again. However, if you want to run reinforcement learning with different properties, you can do so after training a new predictor to predict the desired properties. |
@hspark1212 thanks, I looked at the notebook, but it is still not obvious to me how to get the correct input in the format: |
Hi @qoffee Unfortunately, converting a CIF file to the required input format presents several challenges: The structure of the CIF file needs to be decomposed into topology and building blocks that are compatible with the PORMAKE building blocks, as these are represented by categorical variables. As outlined in the paper, the dataset was created through the following procedures: Topology: Sourced from the RCSC database. Thanks, |
The database may not be the most important. What is needed is the process from a cif to its feature extraction. topo+ Node.cif can be obtained through pormake, and other features such as smile format conversion will be more important, as well as the digitization of topo/node |
Can you share your approach to building the dataset? For example, write json from a cif structure
The text was updated successfully, but these errors were encountered: