This is an ongoing project. The goal is to implement the Scatter/Gather browsing method efficiently for a big number of documents. Also a user interface will be provided, with which the user can interact and do the browsing. The Scatter/Gather method has to be responsive enough to enable a good user experience.
For the time being only a toolset package has been implemented for experimentation purposes. This module includes the corpus module that allows some operations to be performed on the document collection and the clustering module which is a wrapper that enables some clustering algorithms to quickly be applied on the document collection.