Skip to content

This repository contains the basic outlier detection algorithm that we use to find the weirdest SDSS galaxies.

License

Notifications You must be signed in to change notification settings

dalya/WeirdestGalaxies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Outlier Detection Algorithm on Galaxy Spectra

This repository contains the basic outlier detection algorithm that we used to find the weirdest galaxies in the Sloan Digital Sky Survey (SDSS). We used unsupervised Random Forest (RF) algorithm to assign a similarity measure (or distance) between every pair of galaxy spectra in the SDSS. We then used the distance matrix to find the galaxies that have the largest distance, on average, from the rest of the galaxies in the sample, and defined them as outliers.

The repository contains an iPython notebook with step-by-step instructions to detect outliers on simulated, 2D, data. If you have trouble constructing your input data on which to perform outlier detection, please let us know ([email protected]) and we will help!

Authors

  • Dalya Baron (TAU)
  • Dovi Poznanski (TAU)

Requirements

The code requires the following packages:

  • Python 2.7
  • numpy 1.11.1
  • matplotlib 1.5.3
  • scikit-learn 0.17.1
  • (In the future: a development version of scikit-learn will be necessary to run some of the algorithms)

Weirdest SDSS Galaxies

Additional information about the outlier detection algorithm and its implementation on SDSS galaxy spectra:

Credits

Our work is based on the study by Shi & Horvath (2006), Unsupervised Learning with Random Forest Predictors, though with a few modifications that are necessary to optimaly detect outliers on galaxy spectra.

About

This repository contains the basic outlier detection algorithm that we use to find the weirdest SDSS galaxies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published