Releases: BenediktKleppmann/DS4DM-Backend
Final Release
This release includes some minor changes and bugfixes.
The changes were made to the following program-components:
- empty-string handling
- column name transformations
- parameter naming
- modifications around the table fusion
- increased configurability due to additional function parameters
- result-formatting
Unconstrained Table Extension and Correlation-based Table Extension
This release has two new Backend API functions. The Unconstrained Table Extension and the Correlation-based Table Extension.
Background:
The DS4DM Backend API already previously offered functions for keyword-based Search and correspondence-based Search. These two search functions are used for the Constrained Table Extension.
At the Constrained Table Extension, the user provides a table and specifies the name of an additional column he/she would like to have added to the Table. The keyword-based/correspondence-based Search functions search through a repository of many tables and finds the right data for populating the specified extension column of the provided table.
Unconstrained Table Extension:
Whereas the Constrained Table Extension extends a provided table with exactly one additional column (see above), the Unconstrained Table Extension extends the provided table with as many columns as possible.
There is the restriction that the new extension columns have to have a minimum density of 10%.
Another difference between the Constrained Table Extension and the Unconstrained Table Extension is the following: For the Constrained Table Extension the Backend API functions search for the correct data and the correspondences needed for fusing the data and populating the extension column, the acctual fusion and population however happens in the front-end. This allows the user greater control over the fusion process. For the Unconstrained Table Extension on the other hand the fusion and population is done in the backend, as there are just too many variables involved for a user to effectively manage the fusion process.
The details of the implementation are discussed on the website http://web.informatik.uni-mannheim.de/ds4dm/
Correlation-based Table Extension:
The Unconstrained Table Extension - mentioned above - will extend a Table with over 100 columns if a big repository of tables is used. This can be overwhelming for the user. Therefore another Backend API function was developped - the Correlation-based Table Extension.
The Correlation-based Table Extension extends a table with columns that correlate with a user-specified Attribute (the 'correlation attribute'). In the Backend this is done by first extending the table with the Unconstrained Table Extension and then running Correlation-based Filtering on the extended Table.
The details of the implementation are discussed on the website http://web.informatik.uni-mannheim.de/ds4dm/
Attached binaries:
the following two .jar-files are too big for GitHub. That's why they aren't included in the right folder location.
For running the program, they will have to be copied to the folder DS4DM-Webservice/DS4DM_webservice/lib/
Create-and-use-multiple-repositories Release
Previously the DataSearch functionality of the Backend only ran on a pre-processed repository of webtables.
With this release, the Backend has been extended to allow the user to select a repository of tables for the DataSearch to run on.
Additioinally all the functionality for creating and maintaining repositories of tables has been added. This includes the functions: getRepositoryNames, uploadTable and bulkUploadTable.
If you try to upload a table (using uploadTable orbulkUploadTable) to a repository that doesn't exist yet, then the repository structure is automatically created. Furthermore, when uploading a table, the pre-processing steps are automatically executed - the indexing and the finding of instance- and schema-correspondences with other tables in the repository.
Release_Year2
initial commit