Skip to content

Commit

Permalink
v1.3 - see CHANGELOG.md
Browse files Browse the repository at this point in the history
  • Loading branch information
xnl-h4ck3r committed Feb 19, 2024
1 parent 53040b2 commit 5483c2a
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 20 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
## Changelog

- v1.3

- New

- Add argument `-fnp`/`--fragment-not-param`. If passed the URL fragments `#` will NOT be treated in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) the link is usually kept, but if this argument is passed and a link has a filter word and fragment, the link will be removed. Also, if this arg is passed and `-iq` / `--ignore-querystring` is used, the fragment will NOT be removed from links if no query string is in the link.

- v1.2

- Changed
Expand Down
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<center><img src="https://github.com/xnl-h4ck3r/urless/blob/main/urless/images/title.png"></center>

## About - v1.2
## About - v1.3

This is a tool used to de-clutter a list of URLs.
As a starting point, I took the amazing tool [uro](https://github.com/s0md3v/uro/) by Somdev Sangwan. But I wanted to change a few things, make some improvements (like deal with GUIDs) and make it more customizable.
Expand All @@ -25,20 +25,21 @@ pipx install git+https://github.com/xnl-h4ck3r/urless.git

## Usage

| Argument | Long Argument | Description |
| -------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -i | --input | A file of URLs to de-clutter. |
| -o | --output | The output file that will contain the de-cluttered list of URLs (default: output.txt). If piped to another program, output will be written to STDOUT instead. |
| -fk | --filter-keywords | A comma separated list of keywords to exclude links (if there no parameters). This will override the `FILTER_KEYWORDS` list specified in config.yml |
| -fe | --filter-extensions | A comma separated list of file extensions to exclude. This will override the `FILTER_EXTENSIONS` list specified in `config.yml` |
| -ks | --keep-slash | A trailing slash at the end of a URL in input will not be removed. Therefore there may be identical URLs output, one with and one without a trailing slash. |
| -khw | --keep-human-written | By default, any URL with a path part that contains 3 or more dashes (-) are removed because it is assumed to be human written content (e.g. blog post), and not interesting. Passing this argument will keep them in the output. |
| -kym | --keep-yyyymm | By default, any URL with a path containing 3 /YYYY/MM (where YYYY is a year and MM month) are removed because it is assumed to be blog/news content, and not interesting. Passing this argument will keep them in the output. |
| -rcid | --regex-custom-id | **USE WITH CAUTION!** Regex for a Custom ID that your target uses. Ensure the value is passed in quotes. See the section below for more details on this. |
| -iq | --ignore-querystring | Remove the query string (including URL fragments `#`) so output is unique paths only. |
| -lang | --language | If passed and there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output. The codes are specified in the `LANGUAGE` section of `config.yml`. |
| -nb | --no-banner | Hides the tool banner (it is hidden by default if you pipe input to urless) output. |
| -v | --verbose | Verbose output |
| Argument | Long Argument | Description |
| -------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -i | --input | A file of URLs to de-clutter. |
| -o | --output | The output file that will contain the de-cluttered list of URLs (default: output.txt). If piped to another program, output will be written to STDOUT instead. |
| -fk | --filter-keywords | A comma separated list of keywords to exclude links (if there no parameters). This will override the `FILTER_KEYWORDS` list specified in config.yml |
| -fe | --filter-extensions | A comma separated list of file extensions to exclude. This will override the `FILTER_EXTENSIONS` list specified in `config.yml` |
| -ks | --keep-slash | A trailing slash at the end of a URL in input will not be removed. Therefore there may be identical URLs output, one with and one without a trailing slash. |
| -khw | --keep-human-written | By default, any URL with a path part that contains 3 or more dashes (-) are removed because it is assumed to be human written content (e.g. blog post), and not interesting. Passing this argument will keep them in the output. |
| -kym | --keep-yyyymm | By default, any URL with a path containing 3 /YYYY/MM (where YYYY is a year and MM month) are removed because it is assumed to be blog/news content, and not interesting. Passing this argument will keep them in the output. |
| -rcid | --regex-custom-id | **USE WITH CAUTION!** Regex for a Custom ID that your target uses. Ensure the value is passed in quotes. See the section below for more details on this. |
| -iq | --ignore-querystring | Remove the query string (including URL fragments `#`) so output is unique paths only. |
| -fnp | --fragment-not-param | Don't treat URL fragments `#` in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) the link is usually kept, but if this argument is passed and a link has a filter word and fragment, the link will be removed. Also, if this arg is passed and `-iq` / `--ignore-querystring` is used, the fragment will NOT be removed from links if no query string is in the link. |
| -lang | --language | If passed and there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output. The codes are specified in the `LANGUAGE` section of `config.yml`. |
| -nb | --no-banner | Hides the tool banner (it is hidden by default if you pipe input to urless) output. |
| -v | --verbose | Verbose output |

## What does it do exactly?

Expand Down
2 changes: 1 addition & 1 deletion urless/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__="1.2"
__version__="1.3"
22 changes: 18 additions & 4 deletions urless/urless.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,9 +379,14 @@ def processUrl(line):
# Build the path and parameters
path, params = parsed.path, paramsToDict(parsed.query)

# If there is a fragment, add as the last parameter with a name but with value {EMPTY} that doesn't add an = afterwards
# If there is a fragment...
# if arg -fnp / --fragment-not-param was passed, change the path to include the hash,
# else, add as the last parameter with a name but with value {EMPTY} that doesn't add an = afterwards
if parsed.fragment:
params['#'+parsed.fragment] = '{EMPTY}'
if args.fragment_not_param:
path = path+'#'+parsed.fragment
else:
params['#'+parsed.fragment] = '{EMPTY}'

# Add the host to the map if it hasn't already been seen
if host not in urlmap:
Expand Down Expand Up @@ -435,9 +440,12 @@ def processLine(line):
else:
line = line.rstrip('\n').rstrip('/')

# If the -iq / --ignore-querystring argument was passed, remove any querystring and fragment
# If the -iq / --ignore-querystring argument was passed, remove any querystring and fragment (unless -fnp is passed, in which case the fragment is only removed if a query string exists too)
if args.ignore_querystring:
line = line.split('?')[0].split('#')[0]
if args.fragment_not_param:
line = line.split('?')[0]
else:
line = line.split('?')[0].split('#')[0]
return line

def processInput():
Expand Down Expand Up @@ -673,6 +681,12 @@ def main():
action='store_true',
help='Remove the query string (including URL fragments `#`) so output is unique paths only.',
)
parser.add_argument(
'-fnp',
'--fragment-not-param',
action='store_true',
help='Don\'t treat URL fragments `#` in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) it is usually kept, but if this argument is passed and a link has a filter word and fragment, it will be removed.',
)
parser.add_argument(
'-lang',
'--language',
Expand Down

0 comments on commit 5483c2a

Please sign in to comment.