Skip to content

Commit

Permalink
Extract library and refactor (#7)
Browse files Browse the repository at this point in the history
* add a library for splam detection

well tested and somewhat documented, not wired in yet

* add a section about library to readme

* integrate new library

* integrate locator

* update docs

* limit the number of events in a short time period
reload ham and spam in watcher tests

* add test for events debouncing

* drop legacy deps from bot to updaters

this is part of detector now

* add test for forwarded ban, drop dead code

* add watch interval parameter

* minor logging improvements

* minor simplifications

* update comments

* extend logging

* fix test

* convert server to chi, add middlewares for safety

* document code, use common terms

* update lib docs

* rename to make it more consistent, add events package comment

* drop separate super user file, add code comments

* remote deployment on tags only
  • Loading branch information
umputun authored Dec 10, 2023
1 parent 5436760 commit 616a283
Show file tree
Hide file tree
Showing 311 changed files with 14,196 additions and 35,377 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ jobs:
run: |
go test -v -timeout=100s -covermode=count -coverprofile=$GITHUB_WORKSPACE/profile.cov_tmp ./...
cat $GITHUB_WORKSPACE/profile.cov_tmp | grep -v "mocks" | grep -v "_mock" > $GITHUB_WORKSPACE/profile.cov
working-directory: app
env:
GO111MODULE: on
TZ: "America/Chicago"
Expand Down Expand Up @@ -87,7 +86,7 @@ jobs:
-t ${USERNAME}/tg-spam:${ref} -t ${USERNAME}/tg-spam:latest .
- name: remote deployment from master
if: ${{ github.ref == 'refs/heads/master' }}
if: ${{ startsWith(github.ref, 'refs/tags/') }}
env:
UPDATER_KEY: ${{ secrets.UPDATER_KEY }}
run: curl https://radio-t.com/updater/update/tg-spam/${UPDATER_KEY}
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ docker:
docker build -t umputun/tg-spam .

race_test:
cd app && go test -race -mod=vendor -timeout=60s -count 1 ./...
go test -race -mod=vendor -timeout=60s -count 1 ./...

prep_site:
cp -fv README.md site/docs/index.md
Expand Down
43 changes: 41 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ To allow such a feature, some parameters in `admin` section must be specified:

### Updating spam samples dynamically

The bot can be configured to update spam samples dynamically. To enable this feature, reporting to the admin chat must be enabled (see `--admin.url=, [$ADMIN_URL]` above. If any of privileged users (`--super=, [$SUPER_USER]`) forward a message to admin chat, the bot will add this message to the internal spam samples file (`spam-dynamic.txt`) and reload it. This allows the bot to learn new spam patterns on the fly.
The bot can be configured to update spam samples dynamically. To enable this feature, reporting to the admin chat must be enabled (see `--admin.url=, [$ADMIN_URL]` above. If any of privileged users (`--super=, [$SUPER_USER]`) forwards a message to admin chat, the bot will add this message to the internal spam samples file (`spam-dynamic.txt`) and reload it. This allows the bot to learn new spam patterns on the fly. In addition, the bot will do the best to remove the original spam message from the group and ban the user who sent it. This is not always possible, as the forwarding strips the original user id. To address this limitation, tg-spam keeps the list of latest messages (in fact, it stores hashes) associated with the user id and the message id. This information is used to find the original message and ban the user.

Note: if the bot is running in docker container, `--files.dynamic-spam=, [$FILES_DYNAMIC_SPAM]` must be set to the mapped volume's location to stay persistent after container restart.

Expand Down Expand Up @@ -128,6 +128,7 @@ Use this token to access the HTTP API:

```
--testing-id= testing ids, allow bot to reply to them [$TESTING_ID]
--history-duration= history duration (default: 1h) [$HISTORY_DURATION]
--super= super-users [$SUPER_USER]
--no-spam-reply do not reply to spam messages [$NO_SPAM_REPLY]
--similarity-threshold= spam threshold (default: 0.5) [$SIMILARITY_THRESHOLD]
Expand Down Expand Up @@ -176,4 +177,42 @@ message:
Help Options:
-h, --help Show this help message
```
```

## Using tg-spam as a library

The bot can be used as a library as well. To do so, import the `github.com/umputun/tg-spam/lib` package and create a new instance of the `Detector` struct. Then, call the `Check` method with the message and userID to check. The method will return `true` if the message is spam and `false` otherwise. In addition, the `Check` method will return the list of applied rules as well as the spam-related details.

For more details see the [TBD]()

Example:

```go
package main

import (
"io"

tgspam "github.com/umputun/tg-spam/lib"
)

func main() {
detector := tgspam.NewDetector(tgspam.Config{
SimilarityThreshold: 0.5,
MinMsgLen: 50,
MaxEmoji: 2,
FirstMessageOnly: false,
HTTPClient: &http.Client{Timeout: 30 * time.Second},
})

// prepare samples and exclude tokens
spamSample := bytes.NewBufferString("this is spam\nwin a prize\n") // need io.Reader, in real life it will be a file
hamSample := bytes.NewBufferString("this is ham\n")
excludeTokens := bytes.NewBufferString(`"a", "the"`)

// load samples
detector.LoadSamples(excludeTokens, []io.Reader{spamSample}, []io.Reader{hamSample})

isSpam, details := detector.Check("this is spam", 123456)
}
```
5 changes: 5 additions & 0 deletions app/bot/bot.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ import (

//go:generate moq --out mocks/http_client.go --pkg mocks --skip-ensure . HTTPClient:HTTPClient

// PermanentBanDuration defines duration of permanent ban:
// If user is restricted for more than 366 days or less than 30 seconds from the current time,
// they are considered to be restricted forever.
var PermanentBanDuration = time.Hour * 24 * 400

// Response describes bot's reaction on particular message
type Response struct {
Text string
Expand Down
111 changes: 0 additions & 111 deletions app/bot/file_watcher.go

This file was deleted.

138 changes: 0 additions & 138 deletions app/bot/file_watcher_test.go

This file was deleted.

Loading

0 comments on commit 616a283

Please sign in to comment.