Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

gowthamkpr
Copy link
Collaborator

This adds the Differntial Binarization model for text detection.

Implemented the architecture based on ResNet50_vd from PaddleOCR and ported the weights.

@mattdangerw mattdangerw changed the base branch from master to keras-hub August 6, 2024 17:36
@mattdangerw
Copy link
Member

Let's split this up. Start with ResNetVD backbone?

Some notes...

  • Remove the aliases. One ResNetVDBackbone can handle all of these with different presets.
  • Conversion scripts as scripts not colabs.
  • Follow the local style for backbones as closely as possible. See some comments here Add VGG16 and VGG19 backbone #1737
  • Keep models a flat directory. No backbones/xx etc.
  • Add some tests.

@divyashreepathihalli
Copy link
Collaborator

@gowthamkpr is the PR ready for review?

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have left a reorganization comment.

example for structuring the code - https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/sam

@@ -0,0 +1,243 @@
# Copyright 2024 The KerasNLP Authors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename folder to differential_binarization and file to differential_binarization.py

backbone = backbone

inputs = backbone.input
x = backbone.pyramid_outputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create a file differential_binarization_backbone.py and move the diffbin_fpn_model and backbone code into that. You can rename the backbone you are using in this file to image_encoder in the differential_binarization_backbone file. The task model should contain the preprocessor, backbone and the task head.

from keras import ops


class DiceLoss:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test coverage for the losses here

@gowthamkpr gowthamkpr changed the base branch from keras-hub to master October 22, 2024 20:24
@divyashreepathihalli
Copy link
Collaborator

Hi @gowthamkpr! can you please refactor the code to KerasHub style?

  • Add a preprocessor flow
  • subclass image segementer model for the task class
  • add preset class
  • add standard test routines

@gowthamkpr
Copy link
Collaborator Author

Hi @gowthamkpr! can you please refactor the code to KerasHub style?

I've refactored using SAM as example.

* [ ]  Add a preprocessor flow

I've added DifferentialBinarizationPreprocessor and DifferentialBinarizationImageConverter.

* [ ]  subclass image segementer model for the task class

I've subclassed ImageSegmenter, but I left the custom compile() method, since we need a different loss than the one used in ImageSegmenter's compile().

* [ ]  add preset class

Done. The model is not yet in Kaggle, so I've disabled the presets test for now.

* [ ]  add standard test routines

Done. Not sure if there are additional standard test routines other than the ones used in SAM that should be run.

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gowtham! left a few comments!

56,
256,
),
run_mixed_precision_check=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the mixed precision check pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I tried adding an explicit dtype argument, but the problem remains that the mixed precision check checks against each sublayer of the model. The ResNet backbone, which is instantiated separately, therefore has the wrong dtype.

instance.
head_kernel_list: list of ints. The number of filters for probability
and threshold maps. Defaults to [3, 2, 2].
step_function_k: float. `k` parameter used within the differential
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think step_function_k is a arg we want to expose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Args:
backbone: A `keras_hub.models.DifferentialBinarizationBackbone`
instance.
head_kernel_list: list of ints. The number of filters for probability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move the head code to backbone.
rename this class to DifferentialBinarizationOCR and just take in preprocessor and backbone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Gowtham! Left a few comments. Can you please also add a demo colab in the PR description to verify the model is working before merging?

pyramid network.

Args:
image_encoder: A `keras_hub.models.ResNetBackbone` instance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add all args in docstring



def diffbin_fpn_model(inputs, out_channels, dtype=None):
in2 = layers.Conv2D(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is in2 - can we rename this to be more readable?

)

outputs = {
"probability_maps": probability_maps,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like probability_maps and threshold_maps are identical. what is the difference?


@keras_hub_export("keras_hub.layers.DifferentialBinarizationImageConverter")
class DifferentialBinarizationImageConverter(ImageConverter):
backbone_cls = DifferentialBinarizationBackbone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be some resizing/rescaling ops here right?



@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
class DifferentialBinarizationOCR(ImageSegmenter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add a new base class for ocr, I don't think ImageSegmenter is a good. one. Do you have a specific reason you chose to subclass ImageSegmenter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants