Skip to content

Commit

Permalink
Restructure package (cont.) (#36)
Browse files Browse the repository at this point in the history
* fix: fix wrong PIL import

* feat: add cast for better typing

* feat: clean `CustomCollator` (mostly style edits)

* style: clean colpali_processing_utils and add better typing

* feat: factorize the ColPali processing utils in CustomCollator

* feat: factorize the ColIdefics processing utils in CustomCollator

* feat: restructure the `models` module

* feat: big refacto of the collator classes

* style: tweak bi-encoder losses

* feat: add ColPaliConfig

* doc: tweaks

* build: remove all `import *`

* feat: deprecate `TextRetrieverCollator`

* feat: remove redundant `tokenizer` attribute from `BaseVisualRetrieverProcessor`

* fix: address Manu's comments

* fix: fix typos in `ColIdefics2Processor`

* fix: fix HardNegCollator + style tweaks

* doc: tweak

* feat: deprecate HardNegDocmatixCollator

* feat: revert removing abstract attribute `tokenizer` from BaseVisualRetrieverProcessor

* doc: fix typos

* feat: update `__init__.py` files

* feat: fix typing for `ColPaliProcessor.from_pretrained`

* feat: add better typing and remove prints from CustomEvaluator

* feat: rename CustomEvaluator to CustomRetrievalEvaluator

* feat: tweak `get_torch_device`

* feat: turn `main_input_name` into ClassVar in ColPali

* feat: better `from_pretrained` methods

* feat: use PaliGemma tokenizer in `process_queries`

* feat: modify the processor classes

* feat: deprecate ColPaliConfig

* feat: rename ColPaliProcessor init arg

* feat: better `CustomRetrievalEvaluator`

* feat: move `CustomRetrievalEvaluator` in `evaluation` module

* feat: add input length guardrail in `CustomRetrievalEvaluator`

* feat: add tests for ColPali

* feat: add `hf_token` arg to `ColPaliProcessor`

* Revert "feat: use PaliGemma tokenizer in `process_queries`"

This reverts commit 7ec95cb.

* feat: reduce mock images's size

* build: remove `.vscode/`

* feat: revert `embedding_dim` attribute to `dim` in ColPali

* feat: put all model directories in 1st level of `models` module

* build: update module path for models in config files

* feat: sort models module by vlm backbone

* fix: fix imports in tests

* feat: rename all Idefics* classes to Idefics2*

* feat: add missing processors for Bi* models

* untested: processor is inherited directly

* feat: inherit processor directly in ColIdefics2Processor

* doc: update docstrings in processor classes

* build: loosen dev deps

* fix: add missing casts in processor tests

* feat: restructure test file structure

* fix: fix wrong init in Bi* processors

* rename

* fix: add texts query to list

* fix: ruff

* feat: remove unused __future__ imports

* build: move pytest conifg to pyproject

* feat: add logging in `get_torch_device`

* feat: set default device to cpu in `test_retrieval_evaluator.py`

* build: add "Ruff" and "Test" CI pipelines

* build: add missing `pillow` dep

* build: update ruff config in pyproject

* build: move `mteb` to compulsory deps + format pyproject

* build: tweak project details in pyproject

* build: remove black and use ruff formatter instead

* build: add missing HF_TOKEN secret in test CI

* feat: remove all `|` for python 3.9 compatibility

* feat: tweak ColPaliProcessor test

* feat: add test for ColPali collator

* build: remove `.python-version`

* fix: fix typo in `compute_hardnegs.py`

* build: unfreeze the numpy dep and make it compulsory

* feat: deprecate `mteb` metrics and remove `mteb` dep

* feat: tweak `CustomRetrievalEvaluator.evaluate`

* feat: rename `CustomRetrievalEvaluator` to `RetrievalScorer` + tweaks

* feat: add `CustomRetrievalEvaluator` as a `mteb` wrapper + update `ColModelTraining`

* chore: update CHANGELOG

* Add scorer in processor (#46)

* add: scorer in processor

* fix: lint

* fix: tests

* fix: bugs

* fix: tests pass

* fix: lint

* fix: tony's coms

* style: lint

* fix: fix wrong typing in processor classes

* fix: fix wrong `score` method override in processors

---------

Co-authored-by: ManuelFay <[email protected]>
Co-authored-by: Manuel Faysse <[email protected]>
  • Loading branch information
3 people authored Sep 10, 2024
1 parent 0eb0878 commit 2c75550
Show file tree
Hide file tree
Showing 60 changed files with 981 additions and 586 deletions.
13 changes: 13 additions & 0 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: Ruff
on:
push:
branches:
- main
pull_request:
jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: "Linting & Flaking"
uses: chartboost/ruff-action@v1
33 changes: 33 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Test

on:
push:
branches:
- main
pull_request:

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Run tests with pytest (except "slow" tests)
run: |
pytest -m "not slow"
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Custom
!*/configs/data/
.DS_Store
/.vscode/
/data/
/logs/
/models/
Expand Down
1 change: 0 additions & 1 deletion .python-version

This file was deleted.

50 changes: 37 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,60 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## Unreleased
## [0.3.0] - 2024-09-10

✨ This release is an exhaustive package refacto, making ColPali more modular and easier to use.

🚨 It is **NOT** backward-compatible with previous versions.

### Added

- feat: Deprecate `interpretability` and `eval_manager` modules
- feat: Deprecate unused util modules
- feat: Revamp module organization
- feat: Restructure the `utils` module
- feat: Move `ColModelTraining` module
- feat: Lint code + tweaks
- feat: deprecated a lot of unused modules and legacy code
- Restructure the `utils` module
- Restructure the model training code
- Add custom `Processor` classes to easily process images and/or queries
- Enable module-level imports
- Add scoring to processor
- Add `CustomRetrievalEvaluator`
- Add missing typing
- Add tests for model, processor, scorer, and collator
- Lint `Changelog`
- Add missing docstrings
- Add "Ruff" and "Test" CI pipelines

### Changed

- doc: Lint Changelog
- doc: Tweak README
- feat: The processing function in `colpali_engine.utils.processing_utils.colpali_processing_utils` `process_queries` has a changed API and does not require a Mock Image anymore.
- Restructure all modules to closely follow the [`transformers`](https://github.com/huggingface/transformers) architecture
- Hugely simplify the collator implementation to make it model-agnostic
- `ColPaliProcessor`'s `process_queries` doesn't need a mock image input anymore
- Clean `pyproject.toml`
- Loosen the required dependencies
- Replace `black` with the `ruff` linter

### Removed

- Deprecate `interpretability` and `eval_manager` modules
- Deprecate unused utils
- Deprecate `TextRetrieverCollator`
- Deprecate `HardNegDocmatixCollator`

### Fixed

- Fix wrong PIL import
- Fix dependency issues

## [0.2.2] - 2024-09-06

### Fixed

- Remove forced "cuda" usage in Retrieval Evaluator

## [0.2.1] - 2024-09-02

Patch query preprocessing helper function disalignement with training scheme.

### Fixed
- Add 10 extra pad token by default to the query to act as reasoning buffers. This was added in the collator but not the external helper function for inference purposes.

- Add 10 extra pad token by default to the query to act as reasoning buffers. This was added in the collator but not the external helper function for inference purposes.

## [0.2.0] - 2024-08-29

Expand Down
9 changes: 9 additions & 0 deletions colpali_engine/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .models import (
BiIdefics2,
BiPali,
BiPaliProj,
ColIdefics2,
ColIdefics2Processor,
ColPali,
ColPaliProcessor,
)
215 changes: 0 additions & 215 deletions colpali_engine/collators/custom_collator.py

This file was deleted.

Loading

0 comments on commit 2c75550

Please sign in to comment.