Skip to content

Commit

Permalink
feat(matches): improved match object
Browse files Browse the repository at this point in the history
  • Loading branch information
diego-ninja committed Dec 11, 2024
1 parent 35d8020 commit 64f25c2
Show file tree
Hide file tree
Showing 27 changed files with 461 additions and 120 deletions.
Binary file added .github/assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# 💀 Censor - Profanity and word filtering library for Laravel 10+
<p align="center">
<img src="./.github/assets/logo.png" alt="Laravel Devices Logo"/>
</p>

[![Laravel Package](https://img.shields.io/badge/Laravel%2010+%20Package-red?logo=laravel&logoColor=white)](https://www.laravel.com)
[![Latest Version on Packagist](https://img.shields.io/packagist/v/diego-ninja/laravel-censor.svg?style=flat&color=blue)](https://packagist.org/packages/diego-ninja/laravel-censor)
[![Total Downloads](https://img.shields.io/packagist/dt/diego-ninja/laravel-censor.svg?style=flat&color=blue)](https://packagist.org/packages/diego-ninja/laravel-censor)
![PHP Version](https://img.shields.io/packagist/php-v/diego-ninja/laravel-censor.svg?style=flat&color=blue)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
![GitHub last commit](https://img.shields.io/github/last-commit/diego-ninja/laravel-censor?color=blue)
[![Hits-of-Code](https://hitsofcode.com/github/diego-ninja/laravel-censor?branch=main&label=hits-of-code)](https://hitsofcode.com/github/diego-ninja/laravel-censor/view?branch=main&label=hits-of-code)
[![wakatime](https://wakatime.com/badge/user/bd65f055-c9f3-4f73-92aa-3c9810f70cc3/project/f5c4a047-d754-4ef3-b7b0-89ff0099a601.svg)](https://wakatime.com/badge/user/bd65f055-c9f3-4f73-92aa-3c9810f70cc3/project/f5c4a047-d754-4ef3-b7b0-89ff0099a601)
[![PHPStan Level][ico-phpstan]][link-phpstan]

# Introduction

Expand All @@ -19,11 +23,14 @@ This documentation has been generated almost in its entirety using 🦠 [Claude
- Multiple profanity checking services support (Local, [PurgoMalum](https://www.purgomalum.com/), [Azure AI](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/content-safety/), [Perspective AI](https://perspectiveapi.com/), [Tisane AI](https://tisane.ai/))
- Multi-language support
- Whitelist functionality
- Character replacement options
- Different detection strategies (exact with trie, pattern, n-gram, variation, repeated chars, levenshtein)
- Laravel Facade and helper functions
- Laravel controller
- Custom validation rule
- Configurable dictionaries
- Character substitution detection

## Planned Features
- Unicode support

## 📦 Installation

Expand Down Expand Up @@ -279,3 +286,6 @@ Special thanks to:
- All the contributors and testers who have helped to improve this project through their contributions.

If you find this project useful, please consider giving it a ⭐ on GitHub!

[ico-phpstan]: https://img.shields.io/badge/phpstan-max-blue?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAGb0lEQVR42u1Xe1BUZRS/y4Kg8oiR3FCCBUySESZBRCiaBnmEsOzeSzsg+KxYYO9dEEftNRqZjx40FRZkTpqmOz5S2LsXlEZBciatkQnHDGYaGdFy1EpGMHl/p/PdFlt2rk5O+J9n5nA/vtf5ned3lnlISpRhafBlLRLHCtJGVrB/ZBDsaw2lUqzReGAC46DstTYfnSCGUjaaDvgxACo6j3vUenNdImeRXqdnWV5az5rrnzeZznj8J+E5Ftsclhf3s4J4CS/oRx5Bvon8ZU65FGYQxAwcf85a7CeRz+C41THejueydCZ7AAK34nwv3kHP/oUKdOL4K7258fF7Cud427O48RQeGkIGJ77N8fZqlrcfRP4d/x90WQfHXLeBt9dTrSlwl3V65ynWLM1SEA2qbNQckbe4Xmww10Hmy3shid0CMcmlEJtSDsl5VZBdfAgMvI3uuR+moJqN6LaxmpsOBeLCDmTifCB92RcQmbAUJvtqALc5sQr8p86gYBCcFdBq9wOin7NQax6ewlB6rqLZHf23FP10y3lj6uJtEBg2HxiVCtzd3SEwMBCio6Nh9uzZ4O/vLwOZ4OUNM2NyIGPFrvuzBG//lRPs+VQ2k1ki+ePkd84bskz7YFpYgizEz88P8vPzYffu3dDS0gJNTU1QXV0NqampRK1WIwgfiE4qhOyig0rC+pCvK8QUoML7uJVHA5kcQUp3DSpqWjc3d/Dy8oKioiLo6uqCoaEhuHb1KvT09AAhBFpbW4lOpyMyyIBQSCmoUQLQzgniNvz+obB2HS2RwBgE6dOxCyJogmNkP2u1Wrhw4QJ03+iGrR9XEd3CTNBn6eCbo40wPDwMdXV1BF1DVG5qiEtboxSUP6J71+D3NwUAhLOIRQzm7lnnhYUv7QFv/yDZ/Lm5ubK2DVI9iZ8bR8JDtEB57lNzENQN6OjoIGlpabIVZsYaMTO+hrikRRA1JxmSX9hE7/sJtVyF38tKsUCVZxBhz9jI3wGT/QJlADzPAyXrnj0kInzGHQCRMyOg/ed2uHjxIuE4TgYQHq2DLJqumashY+lnsMC4GVC5do6XVuK9l+4SkN8y+GfYeVJn2g++U7QygPT0dBgYGIDvT58mnF5PQcjC83PzSF9fH7S1tZGEhAQZQOT8JaA317oIkM6jS8uVLSDzOQqg23Uh+MlkOf00Gg0cP34c+vv74URzM9n41gby/rvvkc7OThlATU3NCGYJUXt4QaLuTYwBcTSOBmj1RD7D4Tsix4ByOjZRF/zgupDEbgZ3j4ly/qekpND0o5aQ44HS4OAgsVqtI1gTZO01IbG0aP1bknnxCDUvArHi+B0lJSlzglTFYO2udF3Ql9TCrHn5oEIreHp6QlRUFJSUlJCqqipSWVlJ8vLyCGYIFS7HS3zGa87mv4lcjLwLlStlLTKYYUUAlvrlDGcW45wKxXX6aqHZNutM+1oQBHFTewAKkoH4+vqCj48PYAGS5yb5amjNoO+CU2SL53NKpDD0vxHHmOJir7L5xUvZgm0us2R142ScOIyVqYvlpWU4XoHIP8DXL2b+wjdWeXh6U2FjmIIKmbWAYPFRMus62h/geIvjOQYlpuDysQrLL6Ger49HgW8jqvXUhI7UvDb9iaSTDqHtyItiF5Suw5ewF/Nd8VJ6zlhsn06bEhwX4NyfCvuGEeRpTmh4mkG68yDpyuzB9EUcjU5awbAgncPlAeSdAQER0zCndzqVbeXC4qDsMpvGEYBXRnsDx4N3Auf1FCTjTIaVtY/QTmd0I8bBVm1kejEubUfO01vqImn3c49X7qpeqI9inIgtbpxK3YrKfIJCt+OeV2nfUVFR4ca4EkVENyA7gkYcMfB1R5MMmxZ7ez/2KF5SSN1yV+158UPsJT0ZBcI2bRLtIXGoYu5FerOUiJe1OfsL3XEWH43l2KS+iJF9+S4FpcNgsc+j8cT8H4o1bfPg/qkLt50uJ1RzdMsGg0UqwfEN114Pwb1CtWTGg+Y9U5ClK9x7xUWI7BI5VQVp0AVcQ3bZkQhmnEgdHhKyNSZe16crtBIlc7sIb6cRLft2PCgoKGjijBDtjrAQ7a3EdMsxzIRflAFIhPb6mHYmYwX+WBlPQgskhgVryyJCQyNyBLsBQdQ6fgsQhyt6MSOOsWZ7gbH8wETmgRKAijatNL8Ngm0xx4tLcsps0Wzx4al0jXlI40B/A3pa144MDtSgAAAAAElFTkSuQmCC
[link-phpstan]: https://phpstan.org/
6 changes: 4 additions & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
"illuminate/support": "^10.0|^11.0",
"symfony/string": "^7.2",
"google/cloud-language": "^0.34.1",
"aws/aws-sdk-php": "^3.334"
"aws/aws-sdk-php": "^3.334",
"lorisleiva/laravel-actions": "^2.8"
},
"require-dev": {
"phpunit/phpunit": "^11",
Expand All @@ -35,7 +36,8 @@
"phpstan/phpstan": "^2",
"phpstan/phpstan-deprecation-rules": "^2",
"phpstan/phpstan-strict-rules": "^2",
"laravel/octane": "^2.6"
"laravel/octane": "^2.6",
"wulfheart/laravel-actions-ide-helper": "^0.8.0"
},
"autoload": {
"psr-4": {
Expand Down
17 changes: 9 additions & 8 deletions helpers.php
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
<?php

use Ninja\Censor\Checkers\Contracts\ProfanityChecker;
use Ninja\Censor\Actions\CheckAction;
use Ninja\Censor\Actions\CleanAction;
use Ninja\Censor\Result\Contracts\Result;

if (! function_exists('is_offensive')) {
function is_offensive(string $text): bool
{
/** @var ProfanityChecker $service */
$service = app(ProfanityChecker::class);
/** @var Result $result */
$result = CheckAction::run($text);

return $service->check($text)->offensive();
return $result->offensive();
}
}

if (! function_exists('clean')) {
function clean(string $text): string
{
/** @var ProfanityChecker $service */
$service = app(ProfanityChecker::class);

return $service->check($text)->replaced();
/** @var string $result */
$result = CleanAction::run($text);

return $result;
}
}
7 changes: 7 additions & 0 deletions routes/censor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<?php

use Illuminate\Support\Facades\Route;
use Ninja\Censor\Actions\CheckAction;

Route::post('censor/check', CheckAction::class)
->name('censor.check');
38 changes: 38 additions & 0 deletions src/Actions/CheckAction.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<?php

namespace Ninja\Censor\Actions;

use Illuminate\Http\Request;
use Lorisleiva\Actions\Concerns\AsAction;
use Ninja\Censor\Checkers\Contracts\ProfanityChecker;
use Ninja\Censor\Http\Resources\CensorResultResource;
use Ninja\Censor\Result\Contracts\Result;

final readonly class CheckAction
{
use AsAction;

public function __construct(private ProfanityChecker $checker) {}

public function handle(string $text): Result
{
return $this->checker->check($text);
}

public function asController(Request $request): Result
{
if (! $request->has('text')) {
abort(400, 'Missing text parameter');
}

/** @var string $text */
$text = $request->input('text');

return $this->handle($text);
}

public function jsonResponse(Result $result): CensorResultResource
{
return new CensorResultResource($result);
}
}
18 changes: 18 additions & 0 deletions src/Actions/CleanAction.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?php

namespace Ninja\Censor\Actions;

use Lorisleiva\Actions\Concerns\AsAction;
use Ninja\Censor\Checkers\Contracts\ProfanityChecker;

final readonly class CleanAction
{
use AsAction;

public function __construct(private ProfanityChecker $checker) {}

public function handle(string $text): string
{
return $this->checker->check($text)->replaced();
}
}
2 changes: 2 additions & 0 deletions src/CensorServiceProvider.php
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ public function boot(): void
},
message: 'The :attribute contains offensive language.'
);

$this->loadRoutesFrom(__DIR__.'/../routes/censor.php');
}

public function register(): void
Expand Down
4 changes: 2 additions & 2 deletions src/Checkers/Censor.php
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ private function mergeResults(array $results, string $originalText): Result
->withOriginalText($originalText)
->withWords(array_unique($processedWords))
->withReplaced($matches->clean($originalText))
->withScore($matches->score($originalText))
->withOffensive($matches->offensive($originalText))
->withScore($matches->score())
->withOffensive($matches->offensive())
->withConfidence($matches->confidence())
->build();
}
Expand Down
74 changes: 20 additions & 54 deletions src/Collections/MatchCollection.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
namespace Ninja\Censor\Collections;

use Illuminate\Support\Collection;
use Ninja\Censor\Enums\MatchType;
use Ninja\Censor\ValueObject\Coincidence;
use Ninja\Censor\ValueObject\Confidence;
use Ninja\Censor\ValueObject\Score;
Expand All @@ -15,29 +14,40 @@ class MatchCollection extends Collection
{
public function addCoincidence(Coincidence $coincidence): void
{
if ($this->contains($coincidence)) {
return;
if (! $this->contains(fn (Coincidence $existingItem) => $existingItem->word() === $coincidence->word())) {
$this->add($coincidence);
}

$this->add($coincidence);
}

public function score(string $text): Score
public function score(): Score
{
return $this->calculate($text);
if ($this->isEmpty()) {
return new Score(0.0);
}

/** @var float $score */
$score = $this->sum(fn (Coincidence $match) => $match->score()->value());

return new Score(min(1.0, $score));
}

public function confidence(): Confidence
{
return Confidence::calculate($this);
if ($this->isEmpty()) {
return new Confidence(0.0);
}

return new Confidence(
(float) $this->average(fn (Coincidence $match) => $match->confidence()->value())
);
}

public function offensive(string $text): bool
public function offensive(): bool
{
/** @var float $threshold */
$threshold = config('censor.threshold_score', 0.5);

return $this->isNotEmpty() && ($this->score($text)->value() >= $threshold);
return $this->isNotEmpty() && ($this->score()->value() >= $threshold);
}

/**
Expand Down Expand Up @@ -104,48 +114,4 @@ public function clean(string $text): string

return $result;
}

private function calculate(string $text): Score
{
if ($this->isEmpty()) {
return new Score(0.0);
}

$totalWords = count(explode(' ', $text));
$offensiveWords = 0;
$weightedScore = 0.0;
$coveredWords = [];

foreach ($this as $match) {
$words = explode(' ', $match->word);
$newWords = array_diff($words, $coveredWords);
if (empty($newWords)) {
continue;
}

$coveredWords = array_merge($coveredWords, $words);

$typeWeight = match ($match->type) {
MatchType::Exact => 2.0,
MatchType::Trie => 1.8,
MatchType::Pattern => 1.5,
MatchType::NGram => 1.3,
default => $match->type->weight()
};

$lengthMultiplier = count($words) > 1 ? 1.5 : 1.2;
$weightedScore += $typeWeight * $lengthMultiplier * count($words);
$offensiveWords += count($words);
}

if ($offensiveWords === 0) {
return new Score(0.0);
}

$baseScore = $weightedScore / max($totalWords, 1);
$densityMultiplier = min(2.0, 1 + ($offensiveWords / max($totalWords, 1)));

return new Score(min(1.0, $baseScore * $densityMultiplier));

}
}
7 changes: 7 additions & 0 deletions src/Detection/Strategy/AbstractStrategy.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<?php

namespace Ninja\Censor\Detection\Strategy;

use Ninja\Censor\Detection\Contracts\DetectionStrategy;

abstract class AbstractStrategy implements DetectionStrategy {}
14 changes: 11 additions & 3 deletions src/Detection/Strategy/AffixStrategy.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
namespace Ninja\Censor\Detection\Strategy;

use Ninja\Censor\Collections\MatchCollection;
use Ninja\Censor\Detection\Contracts\DetectionStrategy;
use Ninja\Censor\Enums\MatchType;
use Ninja\Censor\Support\Calculator;
use Ninja\Censor\ValueObject\Coincidence;

final class AffixStrategy implements DetectionStrategy
final class AffixStrategy extends AbstractStrategy
{
/** @var array<string, array<string>> */
private array $cache = [];
Expand Down Expand Up @@ -44,7 +44,15 @@ public function detect(string $text, iterable $words): MatchCollection
/** @var string $textWord */
$lowerTextWord = mb_strtolower($textWord);
if (isset($index[$lowerTextWord])) {
$matches->addCoincidence(new Coincidence($textWord, MatchType::Variation));
$matches->addCoincidence(
new Coincidence(
word: $textWord,
type: MatchType::Variation,
score: Calculator::score($text, $textWord, MatchType::Variation),
confidence: Calculator::confidence($text, $textWord, MatchType::Variation),
context: ['original' => $textWord, 'variation' => $index[$lowerTextWord]]
)
);
}
}

Expand Down
16 changes: 12 additions & 4 deletions src/Detection/Strategy/IndexStrategy.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
namespace Ninja\Censor\Detection\Strategy;

use Ninja\Censor\Collections\MatchCollection;
use Ninja\Censor\Detection\Contracts\DetectionStrategy;
use Ninja\Censor\Enums\MatchType;
use Ninja\Censor\Index\TrieIndex;
use Ninja\Censor\Support\Calculator;
use Ninja\Censor\ValueObject\Coincidence;

final readonly class IndexStrategy implements DetectionStrategy
final class IndexStrategy extends AbstractStrategy
{
public function __construct(private TrieIndex $index) {}
public function __construct(private readonly TrieIndex $index) {}

public function detect(string $text, iterable $words): MatchCollection
{
Expand All @@ -23,7 +23,15 @@ public function detect(string $text, iterable $words): MatchCollection
if ($pos !== false) {
$originalWord = mb_substr($text, $pos, mb_strlen($word));
if ($originalWord === $word) {
$matches->addCoincidence(new Coincidence($word, MatchType::Trie));
$matches->addCoincidence(
new Coincidence(
word: $word,
type: MatchType::Trie,
score: Calculator::score($text, $word, MatchType::Trie),
confidence: Calculator::confidence($text, $word, MatchType::Trie),
context: ['original' => $text]
)
);
}
}
}
Expand Down
16 changes: 13 additions & 3 deletions src/Detection/Strategy/LevenshteinStrategy.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
namespace Ninja\Censor\Detection\Strategy;

use Ninja\Censor\Collections\MatchCollection;
use Ninja\Censor\Detection\Contracts\DetectionStrategy;
use Ninja\Censor\Detection\OptimizedLevenshtein;
use Ninja\Censor\Enums\MatchType;
use Ninja\Censor\Support\Calculator;
use Ninja\Censor\ValueObject\Coincidence;

final readonly class LevenshteinStrategy implements DetectionStrategy
final class LevenshteinStrategy extends AbstractStrategy
{
private int $threshold;

Expand All @@ -35,7 +35,17 @@ public function detect(string $text, iterable $words): MatchCollection
foreach ($textWords as $textWord) {
$similarWords = $levenshtein->findSimilar($textWord, $this->threshold);
if (! empty($similarWords)) {
$matches->addCoincidence(new Coincidence($textWord, MatchType::Levenshtein));
$matches->addCoincidence(
new Coincidence(
word: $textWord,
type: MatchType::Levenshtein,
score: Calculator::score($text, $textWord, MatchType::Levenshtein),
confidence: Calculator::confidence($text, $textWord, MatchType::Levenshtein),
context: [
'similar_words' => $similarWords,
]
)
);
}

}
Expand Down
Loading

0 comments on commit 64f25c2

Please sign in to comment.