Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pandas revised #228

Merged
merged 3 commits into from
Apr 4, 2017
Merged

Conversation

JoostJM
Copy link
Collaborator

@JoostJM JoostJM commented Mar 28, 2017

Supersedes #225

Pandas is added in this PR as an optional feature, and results are only converted to pandas Series just prior to return to user. To illustrate use of pandas in PyRadiomics, a new batch example is added and the helloRadiomics script and notebook is updated with the additional call to enable pandas (commented out by default).

This PR also changes the way general info is stored slightly (retaining data types). This mainly affects how the result is stored when using JSON format, where the settings will also be stored as a JSON formatted dictionary string (i.e. a nested dictionary), instead of just as a string.

Finally, change the format of the input CSV for the batch command line script and batch example scripts. This CSV should now contain headers and allows a variable number of columns (which are all copied to the output). Columns must contain 'Image' and 'Mask', indicating the columns with the file locations of the image and mask, respectively.

Replacement was added to prevent column misalignment when output is formatted as CSV (with `,` delimiter). However this replacement is not necessary, as the csv writers handle this with quotes. Furthermore, this allows to store these values in their original type, which enables storage as JSON formatted strings when using the JSON format.
JoostJM added 2 commits March 29, 2017 17:25
Affects batchprocessing.py and commandlinebatch.py (used from command line as `pyradiomicsbatch`). The csv file used to define all combinations of image and mask should start now with a header line, which should contain at least the columns 'Image' and 'Mask' (specifying the location of image and mask, respectively). Additional columns can be added, these will be copied to the output.

Additionally, fix some small bugs in the command line scripts, and add option to shorten image and mask path in output to just the file name (by default stores full path as defined in the CSV).
Add an example to show how pandas can be used to handle data returned by PyRadiomics.
Pandas is not needed to handle data inside PyRadiomics, and therefore does not have to be a dependency.
The result from pyradiomics can be converted to pandas by using it to instantiate a pandas Series.

The example shows how to use pandas for batchprocessing and storing the result, which users can adapt to suit their own needs.
@JoostJM JoostJM force-pushed the add-pandas-revised branch from ffb9bb5 to c5fab34 Compare March 29, 2017 15:46
@JoostJM
Copy link
Collaborator Author

JoostJM commented Mar 29, 2017

I removed pandas use from the source code entirely and instead provided an example of the batchprocessing script instead.

This is because to add pandas as an optional feature in PyRadiomics involves a lot of code to check whether pandas can and should be used, whereas the actual use is just a simple conversion just before the result is returned.

This conversion is done in this new example (which assumes the user has pandas installed), with additional code showing how to combine results (pandas Series) into a pandas DataFrame and how to use pandas for reading / writing csv.

@JoostJM JoostJM merged commit 3ac3e35 into AIM-Harvard:master Apr 4, 2017
@JoostJM JoostJM deleted the add-pandas-revised branch April 4, 2017 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant