-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import new data set from PubChem #158
Comments
Additional Details: The approach to importing the data set in the legacy version required a parser, some filters, and a post-processor. The parser (MoleculeSDFCombinedParser) is responsible for importing all the data from PubChem using an SDF file. This will generate two text files of molecule data (collection-molecules.txt and other-molecule.txt). Collection-molecules.txt contains molecule data for the collection boxes, while other-molecules.txt holds data for other molecules that can be built in the sim. See #153 (comment) for details on how to read these entries. At this point, we will need to filter out molecules that we don't want to build (either for pedagogical, or memory reasons). MoleculeKitFilterer and MoleculeDuplicateNameFilter handle this for us. The last step involves MoleculePreprocessing, which will generate the structural format for our molecules in a serialize format. See Action Items:
|
Here is a zip file of the BAM legacy source code with the relevant content described above: build-a-molecule-java.zip |
This sim requires that all possible molecules and molecule structures are defined prior to being built. This data is stored in
js/data
and was derived from PubChem. Taking a look atjs/data/
we see the current data set is comprised of:collectionMoleculesData.js
: Shortlist of Pubchem molecules used for collection boxes.otherMoleculesData.js
: Responsible for all PubChem related data with entries that can be read as described in More examples of incorrect nomenclature #153 (comment).structuresData.js
: Responsible for all possible structures. These structures may or may not have a correlated structure incollectionMoleculeData.js
The tools used to generate this data set have yet to be completely ported from Java and would require additional documentation. This includes handling filtering out any molecules not desired for this sim. During the design meeting on 01/31/20, it was decided to postpone this work until after publication of this sim.
Assigning to @ariel-phet for prioritization and assignment.
The text was updated successfully, but these errors were encountered: