This is a collection of DDL files that define the database schema used by the Data Management System (DMS) component of the Proteomics Research Information Storage and Management system (PRISM) at the Pacific Northwest National Laboratory (PNNL). PRISM incorporates a diverse set of analysis tools and allows a wide range of operations to be incorporated, leveraging a state machine that is accessible to independent, distributed computational nodes. The system has scaled well as data volume has increased since 2002, while allowing adaptability for incorporating new and improved data analysis tools for more effective proteomics research.
For more information see the manuscript "PRISM: a data management system for high-throughput proteomics" published in the journal Proteomics in 2006: PMID: 16470653
The original database schema for DMS is in the Microsoft SQL Server format, and can be found in the DBSchema_DMS repo on GitHub.
Migration of the SQL Server tables, views, stored procedures and functions to PostgreSQL was completed in August 2024.
The following table describes the PostgreSQL schemas for DMS, along with the Source SQL Server Database for each schema.
Database | Schema | Description | Source SQL Server DB |
---|---|---|---|
dms | public | Main DMS tables | DMS5 |
dms | cap | Dataset capture and archive tasks | DMS_Capture |
dms | sw | Software analysis jobs | DMS_Pipeline |
dms | dpkg | Data packages | DMS_Data_Package |
dms | mc | Manager parameters | Manager_Control |
dms | ont | Ontology tables | Ontology_Lookup |
dms | pc | Protein Collections (FASTA files) | Protein_Sequences |
dms | logdms | DMS historic logs | DMSHistoricLog |
dms | logcap | Capture task historic logs | DMSHistoricLogCapture |
dms | logsw | Software analysis historic logs | DMSHistoricLogPipeline |
mts | public | MTS metadata DBs | MTS_Master |
mts | mtmain | MTS metadata DBs | MT_Main |
mts | prismifc | MTS metadata DBs | Prism_IFC |
-
Use SSMS to script out tables, indexes, triggers, relationships, and views from SQL Server
-
Use the DB Schema Export Tool (DB_Schema_Export_Tool) to pre-process the scripted DDL to rename columns and skip unwanted tables and views
- In the DB-Schema-Export-Tool repo on GitHub
- Three input files:
- Schema DDL file from step 1
- Defined with the
ExistingDDL=Database.sql
parameter
- Defined with the
- Text file defining the source and target table names, along with primary key columns
- Also defines tables to skip
- Defined with the
DataTables=Database_Tables.tsv
parameter in the ExportOptions parameter file
- Text file defining the source and target column names, optionally defining columns to skip
- Defined with the
ColumnMap=Database_Table_Columns.tsv
parameter in the ExportOptions parameter file
- Defined with the
- Schema DDL file from step 1
- Use the sqlserver2pgsql tool to convert the DDL to PostgreSQL syntax, including citext
- sqlserver2pgsql is a perl script
- Originally from https://github.com/dalibo/sqlserver2pgsql
- Now in the sqlserver2pgsql repo on GitHub
- It creates updated DDL files
- It also creates a text file (named Database_ColumnNameMap.txt) listing the old and new table and column names
- Use the PgSql View Creator Helper program (PgSqlViewCreatorHelper) to update the DDL for views
- See the PgSQL-View-Creator-Helper repo on GitHub
- This program also creates a merged ColumnNameMap file (Database_ColumnNameMap_merged.txt) that merges two map files from the previous steps
- The ColumnNameMap.txt file created by sqlserver2pgsql
- The ColumnMap file provided to the DB Schema Export Tool, listing source and target column names
- Create tables and views in the PostgreSQL database
- Tables and views can be manually updated, if necessary
- Script out the schema for tables, views, etc. in the PostgreSQL database
- Again, use the DB Schema Export Tool, but this time the source database is the PostgreSQL server
- Store the scripted objects in a git repo
- See the DBSchema_DMS repo
- Transfer table data
- Use the DB Schema Export Tool to export data from all of the tables in SQL Server
- Three input files:
- Text file defining the source and target table names, along with primary key columns; also defines tables to skip
- Text file defining the source and target column names, optionally defining columns to skip
- Text file with table names defining the order that data should be exported from tables
- The export order also dictates the order that tables will be listed in the shell script created for importing data into the target database
- The program will create one file per table, with the data for that table scripted as
INSERT ... ON CONFLICT DO UPDATE
statements - It also creates a shell script for loading the data for each table
- Import data to to the PostgreSQL database by running the shell script
- Append new table data
- New data added to tables in SQL Server can be added to the PostgreSQL database
- Use the DB Schema Export Tool, with a parameter file that has TableDataDateFilter defined
- This points to a tab-delimited text file with columns SourceTableName, DateColumnName, and MinimumDate
- The program will create one file per table
- It also creates a shell script for loading the data for each table
- Import data to to the PostgreSQL database by running the shell script
- Convert stored procedures
- Use SQL Server Management Studio (SSMS) to script out stored procedures and user defined functions from a database
- Convert the scripted DDL using the SQLServer Stored Procedure Converter (SQLServer_Stored_Procedure_Converter.exe)
- See the SQLServer-Stored-Procedure-Converter repo on GitHub
- Uses the merged ColumnNameMap file (Database_ColumnNameMap_merged.txt) created by the PgSql View Creator Helper to rename tables and columns referenced in the stored procedures
Licensed under the Apache License, Version 2.0; you may not use this program except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0