Skip to content
This repository has been archived by the owner on Dec 31, 2021. It is now read-only.

Latest commit

 

History

History
25 lines (17 loc) · 1.25 KB

README.md

File metadata and controls

25 lines (17 loc) · 1.25 KB

nsndswap

A Python program that parses this website and this one, and outputs gexf files.

How to use

Run run.sh. For the most part, the copious logs can be ignored.

Output files

The following datasets are output to the output/ directory:

  • homestuck contains details of all songs on the first page (Homestuck soundtrack, unofficialmspafans, Homestuck Gaiden, and miscellaneous other things).
  • canwc contains details of all songs on the second page (the Cool and New Web Comic soundtrack).
  • viko contains a few additions I maintain (the raw form is in viko_nsnd.py.
  • everything contains all the previous files' data.

These are dumped in five formats:

  • .gexf - directed graphs of references
  • .txt - simple ad hoc plain-text format
  • .titles.txt - the titles, one per line
  • .reverse.txt - the format in .txt, but showing incoming references rather than outgoing
  • .unknown.txt - titles, one per line, of things which are referenced, but which don't have reference lists of their own (useful for checking for name misspellings and such)
  • .pkl - pickled version of the Web itself (subject to change, obviously)