Skip to content

Commit

Permalink
Merge pull request #24 from Vincent-Maladiere/add_seer_preprocessing
Browse files Browse the repository at this point in the history
Add SEER preprocessing
  • Loading branch information
Vincent-Maladiere authored Dec 16, 2023
2 parents cd545b8 + 7ff9d74 commit bac9af1
Show file tree
Hide file tree
Showing 12 changed files with 561 additions and 12 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,5 +161,8 @@ cython_debug/
#.idea/

.vscode

doc/generated/


# This dataset should not be redistributed, because users have to sign an agreement.
hazardous/data/seer_cancer_cardio_raw_data.txt
Binary file added doc/_static/seer_extract.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/_static/seer_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Datasets
:nosignatures:

data.make_synthetic_competing_weibull
data.load_seer


Inverse Probability Censoring Weight
Expand Down
46 changes: 46 additions & 0 deletions doc/downloading_seer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

.. _downloading_seer:

How to get the SEER dataset
===========================

.. currentmodule:: hazardous

SEER is a reference dataset for cancer statistics in the US, used in competitive risk analysis.

Below is a quick guide to obtain it.
Note that you will need a **Windows computer** (or emulator) to download the files.

1. Head to the `SEER data access webpage <https://seerdataaccess.cancer.gov/seer-data-access>`_, enter your email address and click on "Research Data Requests".
2. Fill the form.
3. Confirm your email address and wait a few days before the confirmation.
4. After receiving the confirmation email, go to the `SEER*Stat Software webpage <https://seer.cancer.gov/seerstat/software/>`_ to download the software on Windows.
5. Open it and sign in with your SEER credendials received by email.

.. image:: _static/seer_home.png
:width: 300


.. raw:: html

<br></br>


6. Use seerstat to open the `data/seer.sl` file.

.. image:: _static/seer_extract.png
:width: 400


.. raw:: html

<br></br>


You should get a txt file. This file can be loaded using :func:`hazardous.data.load_seer`.


.. raw:: html

<br></br>

1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ of this library at this time.

api
auto_examples/index
downloading_seer
3 changes: 2 additions & 1 deletion hazardous/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ._competing_weibull import make_synthetic_competing_weibull
from ._seer import load_seer

__all__ = ["make_synthetic_competing_weibull"]
__all__ = ["make_synthetic_competing_weibull", "load_seer"]
Loading

0 comments on commit bac9af1

Please sign in to comment.