Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added probe to identify copyright year #1955

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

nyxgeek
Copy link

@nyxgeek nyxgeek commented Oct 17, 2024

Added copyright probe, useful for identifying old software

  • If copyright indicators are found near a year, will print those years ([Copyright: 2004])
  • If no copyright indicators exist, will print any years found, 1990-2024 ([Possible Years: 2012 2014])

Closes #1965

@GeorginaReeder
Copy link

Thanks for your contribution @nyxgeek !

We also have a Discord server, which you’re more than welcome to join. It's a great place to connect with fellow contributors and stay updated with the latest developments!

@ehsandeep ehsandeep changed the base branch from main to dev October 20, 2024 22:02
Copy link
Member

@ehsandeep ehsandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict + lint fail

@nyxgeek
Copy link
Author

nyxgeek commented Oct 22, 2024

Updated and tested, should be good.

@dogancanbakir dogancanbakir linked an issue Oct 22, 2024 that may be closed by this pull request
Copy link
Member

@dogancanbakir dogancanbakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some comments.

"strings"
)

var crreYear = regexp.MustCompile(`(?:copyright|Copyright|COPYRIGHT|\(C\)|\(c\)|©|&copy;|&#169;)?\s*(?:[a-zA-Z0-9 ,-]+\s*)?[\s,]*(199[0-9]|20[0-1][0-9]|202[0-4])[\s,<-]+(?:copyright|Copyright|COPYRIGHT|\(C\)|\(c\)|©|&copy;|&#169;|199[0-9]|20[0-1][0-9]|202[0-4])?`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're limiting the year statically with 2024. This must be dynamic. It'll not detect © 2025 Dummy Media Group. All Rights Reserved., for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll extend it through 2029 if that is acceptable. I am trying to avoid false positives so trying to keep it to a realistic range.

}
}

green := "\033[32m"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using aurora for colored output. We can do the same here.



// Apply regex to extract the years and check for indicators
matches := crreYear.FindAllStringSubmatch(textContent, -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect multiple copyright text in pages? If not, we should rethink post-processing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex will match strings like Copyright 2024, as well as Copyright 1995-2001, in which case it will display both dates.

@@ -1800,6 +1801,21 @@ retry:
builder.WriteRune(']')
}

var copyright string
if httpx.CanHaveTitleTag(resp.GetHeaderPart("Content-Type", ";")) {
copyright = httpx.ExtractCopyright(resp) // This will return a space-delimited string of years
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we extract copyright text here and not under scanopts.OutputCopyright if block?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just copied the same functions that exist for Title extraction, but changed them to copyright instead. If there is a better way, or if I mis-copied that format from Title extraction, happy to mod.

@Mzack9999
Copy link
Member

Isn't better a nuclei template?

@nyxgeek
Copy link
Author

nyxgeek commented Nov 8, 2024

Isn't better a nuclei template?

I don't use nuclei, but I do use my fork of httpx all the time on giant internal pentests because it's super easy to find the old software with this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add probe to identify copyright year
5 participants