Skip to content

Commit

Permalink
Merge pull request #18 from nlevitt/0.3
Browse files Browse the repository at this point in the history
0.3
  • Loading branch information
nlevitt committed May 24, 2016
2 parents 3fca524 + 571ab75 commit ffd0fce
Show file tree
Hide file tree
Showing 14 changed files with 588 additions and 675 deletions.
22 changes: 16 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# http://docs.travis-ci.com/user/migrating-from-legacy/
sudo: false

language: python
python:
- "2.6"
- "2.7"
install: pip install -r requirements.txt
script:
- py.test --doctest-modules -v surt/
- pylint --disable=all --enable=W0312 --reports=n surt/
- 2.6
- 2.7
- 3.3
- 3.4
- 3.5
- 3.5-dev
- nightly
- pypy
- pypy3

install: pip install . pytest pytest-cov
script: py.test -v --cov=surt tests/

28 changes: 0 additions & 28 deletions README.md

This file was deleted.

37 changes: 37 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Sort-friendly URI Reordering Transform (SURT) python package.

Usage:

::

>>> from surt import surt
>>> surt("http://archive.org/goo/?a=2&b&a=1")
'org,archive)/goo?a=1&a=2&b'

Installation:

::

pip install surt

Or install the dev version from git:

::

pip install git+https://github.com/internetarchive/surt.git#egg=surt

More information about SURTs:
http://crawler.archive.org/articles/user\_manual/glossary.html#surt

This is mostly a python port of the webarchive-commons org.archive.url
package. The original java version of the org.archive.url package is
here:
https://github.com/iipc/webarchive-commons/tree/master/src/main/java/org/archive/url

This module depends on the ``tldextract`` module to query the Public
Suffix List. ``tldextract`` can be installed via ``pip``

|Build Status|

.. |Build Status| image:: https://travis-ci.org/internetarchive/surt.svg
:target: https://travis-ci.org/internetarchive/surt
3 changes: 0 additions & 3 deletions requirements.txt

This file was deleted.

29 changes: 25 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,39 @@
from setuptools import setup
from setuptools.command.test import test as TestCommand

class PyTest(TestCommand):
def finalize_options(self):
TestCommand.finalize_options(self)
self.test_suite = True

def run_tests(self):
import pytest
import sys
cmdline = ' -v --cov surt tests/'
errcode = pytest.main(cmdline)
sys.exit(errcode)


setup(name='surt',
version='0.2',
version='0.3.0',
author='rajbot',
author_email='[email protected]',
classifiers=[
'License :: OSI Approved :: GNU Affero General Public License v3',
],
description='Sort-friendly URI Reordering Transform (SURT) python package.',
long_description=open('README.md').read(),
url='https://github.com/rajbot/surt',
long_description=open('README.rst').read(),
url='https://github.com/internetarchive/surt',
zip_safe=True,
install_requires=[
'tldextract',
'six',
'tldextract>=2.0',
],
provides=[ 'surt' ],
packages=[ 'surt' ],
scripts=[],
# Tests
tests_require=[ 'pytest', 'pytest-cov' ],
test_suite='',
cmdclass={'test': PyTest},
)
37 changes: 6 additions & 31 deletions surt/DefaultIAURLCanonicalizer.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -21,45 +21,20 @@

"""This is a python port of DefaultIAURLCanonicalizer.java:
http://archive-access.svn.sourceforge.net/viewvc/archive-access/trunk/archive-access/projects/archive-commons/src/main/java/org/archive/url/DefaultIAURLCanonicalizer.java?view=markup
The doctests are copied from DefaultIAURLCanonicalizerTest.java:
http://archive-access.svn.sourceforge.net/viewvc/archive-access/trunk/archive-access/projects/archive-commons/src/test/java/org/archive/url/DefaultIAURLCanonicalizerTest.java?view=markup
"""
from __future__ import absolute_import

import surt.GoogleURLCanonicalizer
import surt.IAURLCanonicalizer

import GoogleURLCanonicalizer
import IAURLCanonicalizer

# canonicalize()
#_______________________________________________________________________________
def canonicalize(url, **options):
"""The input url is a handyurl instance
These doctests are from DefaultIAURLCanonicalizerTest.java:
>>> from handyurl import handyurl
>>> canonicalize(handyurl.parse("http://www.alexa.com/")).getURLString()
'http://alexa.com/'
>>> canonicalize(handyurl.parse("http://archive.org/index.html")).getURLString()
'http://archive.org/index.html'
>>> canonicalize(handyurl.parse("http://archive.org/index.html?")).getURLString()
'http://archive.org/index.html'
>>> canonicalize(handyurl.parse("http://archive.org/index.html?a=b")).getURLString()
'http://archive.org/index.html?a=b'
>>> canonicalize(handyurl.parse("http://archive.org/index.html?b=b&a=b")).getURLString()
'http://archive.org/index.html?a=b&b=b'
>>> canonicalize(handyurl.parse("http://archive.org/index.html?b=a&b=b&a=b")).getURLString()
'http://archive.org/index.html?a=b&b=a&b=b'
>>> canonicalize(handyurl.parse("http://www34.archive.org/index.html?b=a&b=b&a=b")).getURLString()
'http://archive.org/index.html?a=b&b=a&b=b'
"""

url = GoogleURLCanonicalizer.canonicalize(url, **options)
url = IAURLCanonicalizer.canonicalize(url, **options)
url = surt.GoogleURLCanonicalizer.canonicalize(url, **options)
url = surt.IAURLCanonicalizer.canonicalize(url, **options)

return url

# main()
#_______________________________________________________________________________
if __name__ == "__main__":
import doctest
doctest.testmod()
Loading

0 comments on commit ffd0fce

Please sign in to comment.