Extract library and refactor (#7)

* add a library for splam detection well tested and somewhat documented, not wired in yet * add a section about library to readme * integrate new library * integrate locator * update docs * limit the number of events in a short time period reload ham and spam in watcher tests * add test for events debouncing * drop legacy deps from bot to updaters this is part of detector now * add test for forwarded ban, drop dead code * add watch interval parameter * minor logging improvements * minor simplifications * update comments * extend logging * fix test * convert server to chi, add middlewares for safety * document code, use common terms * update lib docs * rename to make it more consistent, add events package comment * drop separate super user file, add code comments * remote deployment on tags only
umputun · Dec 10, 2023 · 616a283 · 616a283
1 parent 5436760
commit 616a283
Show file tree

Hide file tree

Showing 311 changed files with 14,196 additions and 35,377 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -20,7 +20,6 @@ jobs:
       run: |
         go test -v -timeout=100s -covermode=count -coverprofile=$GITHUB_WORKSPACE/profile.cov_tmp ./...
         cat $GITHUB_WORKSPACE/profile.cov_tmp | grep -v "mocks" | grep -v "_mock" > $GITHUB_WORKSPACE/profile.cov
-      working-directory: app
       env:
         GO111MODULE: on
         TZ: "America/Chicago"
@@ -87,7 +86,7 @@ jobs:
             -t ${USERNAME}/tg-spam:${ref} -t ${USERNAME}/tg-spam:latest .
 
     - name: remote deployment from master
-      if: ${{ github.ref == 'refs/heads/master' }}
+      if: ${{ startsWith(github.ref, 'refs/tags/') }}
       env:
         UPDATER_KEY: ${{ secrets.UPDATER_KEY }}
       run: curl https://radio-t.com/updater/update/tg-spam/${UPDATER_KEY}
diff --git a/Makefile b/Makefile
@@ -11,7 +11,7 @@ docker:
 	docker build -t umputun/tg-spam .
 
 race_test:
-	cd app && go test -race -mod=vendor -timeout=60s -count 1 ./...
+	go test -race -mod=vendor -timeout=60s -count 1 ./...
 
 prep_site:
 	cp -fv README.md site/docs/index.md

diff --git a/README.md b/README.md
@@ -86,7 +86,7 @@ To allow such a feature, some parameters in `admin` section must be specified:
 
 ### Updating spam samples dynamically
 
-The bot can be configured to update spam samples dynamically. To enable this feature, reporting to the admin chat must be enabled (see `--admin.url=, [$ADMIN_URL]` above. If any of privileged users (`--super=, [$SUPER_USER]`) forward a message to admin chat, the bot will add this message to the internal spam samples file (`spam-dynamic.txt`) and reload it. This allows the bot to learn new spam patterns on the fly.
+The bot can be configured to update spam samples dynamically. To enable this feature, reporting to the admin chat must be enabled (see `--admin.url=, [$ADMIN_URL]` above. If any of privileged users (`--super=, [$SUPER_USER]`) forwards a message to admin chat, the bot will add this message to the internal spam samples file (`spam-dynamic.txt`) and reload it. This allows the bot to learn new spam patterns on the fly. In addition, the bot will do the best to remove the original spam message from the group and ban the user who sent it. This is not always possible, as the forwarding strips the original user id. To address this limitation, tg-spam keeps the list of latest messages (in fact, it stores hashes) associated with the user id and the message id. This information is used to find the original message and ban the user. 
 
 Note: if the bot is running in docker container, `--files.dynamic-spam=, [$FILES_DYNAMIC_SPAM]` must be set to the mapped volume's location to stay persistent after container restart.
 
@@ -128,6 +128,7 @@ Use this token to access the HTTP API:
 
 ```
       --testing-id=           testing ids, allow bot to reply to them [$TESTING_ID]
+      --history-duration=     history duration (default: 1h) [$HISTORY_DURATION]
       --super=                super-users [$SUPER_USER]
       --no-spam-reply         do not reply to spam messages [$NO_SPAM_REPLY]
       --similarity-threshold= spam threshold (default: 0.5) [$SIMILARITY_THRESHOLD]
@@ -176,4 +177,42 @@ message:
 Help Options:
   -h, --help                  Show this help message
 
-```
+```
+
+## Using tg-spam as a library
+
+The bot can be used as a library as well. To do so, import the `github.com/umputun/tg-spam/lib` package and create a new instance of the `Detector` struct. Then, call the `Check` method with the message and userID to check. The method will return `true` if the message is spam and `false` otherwise. In addition, the `Check` method will return the list of applied rules as well as the spam-related details.
+
+For more details see the [TBD]()
+
+Example:
+
+```go
+package main
+
+import (
+	"io"
+
+	tgspam "github.com/umputun/tg-spam/lib"
+)
+
+func main() {
+	detector := tgspam.NewDetector(tgspam.Config{
+		SimilarityThreshold: 0.5,
+		MinMsgLen:           50,
+		MaxEmoji:            2,
+		FirstMessageOnly:    false,
+		HTTPClient:          &http.Client{Timeout: 30 * time.Second},
+	})
+
+	// prepare samples and exclude tokens
+	spamSample := bytes.NewBufferString("this is spam\nwin a prize\n") // need io.Reader, in real life it will be a file
+	hamSample := bytes.NewBufferString("this is ham\n")
+	excludeTokens := bytes.NewBufferString(`"a", "the"`)
+
+	// load samples
+	detector.LoadSamples(excludeTokens, []io.Reader{spamSample}, []io.Reader{hamSample})
+
+	isSpam, details := detector.Check("this is spam", 123456)
+}
+```
diff --git a/app/bot/bot.go b/app/bot/bot.go
@@ -9,6 +9,11 @@ import (
 
 //go:generate moq --out mocks/http_client.go --pkg mocks --skip-ensure . HTTPClient:HTTPClient
 
+// PermanentBanDuration defines duration of permanent ban:
+// If user is restricted for more than 366 days or less than 30 seconds from the current time,
+// they are considered to be restricted forever.
+var PermanentBanDuration = time.Hour * 24 * 400
+
 // Response describes bot's reaction on particular message
 type Response struct {
 	Text          string

diff --git a/app/bot/file_watcher.go b/app/bot/file_watcher.go
diff --git a/app/bot/file_watcher_test.go b/app/bot/file_watcher_test.go