-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating the database with the new feature names as they become available. #19
Comments
While importing the latest USGS feature names extracted on November 7, 2024, I noticed that the approval status column, in addition to My recommendation is to retain these entries in the database and treat them as historically active. My reasoning is that they were used at some point and may have historical relevance. I suggest adding an additional column to the feature name table to indicate that these features are currently dropped, also include this information to be passed to Solr, informing users of their dropped status. |
5- Import entities marked |
As new feature names become available, users will extract USGS terms and execute a pipeline command to check for any names that need to be added to the database. The CSV file used in this process must contain the following six columns: Feature_ID, Clean_Feature_Name, Target, Feature_Type, Approval_Date, and Approval_Status.
1- The pipeline reads this file and only considers rows where Approval_Status is set to "Approved." It then compares each Feature_ID with existing entities in the database to identify which rows should be newly added.
2- When adding a feature name, the pipeline first checks for any new targets associated with these names and inserts those targets if they don’t already exist. Next, it verifies if the feature type is present for the specified target, adding it if it’s missing. The pipeline also updates a separate table containing unique feature names without links to targets or feature types. If a feature name has not been used before, it is inserted into this table; if it has been used for another celestial body, it will not be added again.
3- Additional checks are in place for multi-word and ambiguous feature names. Specifically, the pipeline manages cases where one word in a multi-word name may represent another feature name (e.g., “C Herschel,” “C Herschel C”). It also identifies cases where a feature name is used for multiple celestial bodies (e.g., “Herschel Crater” on Mars, the Moon, and Mimas). The pipeline will update the database for these names as needed. However, if a name has other contextual uses (e.g., “Herschel” also refers to an asteroid), the pipeline will generate a log message. In this case, a power user will need to review and identify any additional contexts. The pipeline includes a list of new feature names for the power user to check. Once any new context is identified, the developer should be informed so they can add it to the database.
4- Update repo with the latest USGS feature names.
The text was updated successfully, but these errors were encountered: