Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sortling/layout issues when Y coordinates don't exactly match #526

Open
lluchez opened this issue Nov 8, 2023 · 1 comment
Open

Sortling/layout issues when Y coordinates don't exactly match #526

lluchez opened this issue Nov 8, 2023 · 1 comment

Comments

@lluchez
Copy link

lluchez commented Nov 8, 2023

Hi,

We've been using an old version of this gem (1.4.1) for a little while now and we are looking to upgrade to the latest version. That upgrade broke some of our specs and when looking deeper, it seems like the logic around PageLayout changed.

It might also be bad luck, but the use of the round here (for X and Y coords) will create issues when the PDF generated the texts with slightly different y coordinates.

Below is an example:
image
In this case, the texts in those boxes/rectangles are slightly lower than the labels from that form, causing some of those texts to be generated on another line:

Claim Number:           PHNP1610102                                     Contact:
Insured:                                                                Phone:
                         Fairfield Boys Club
Address 1:                                                              Email:
                         c/o Bejo Nanni, Treasurer

Another example:
image

We could monkey patch or fork the repo to make those changes, but please see below the code that we're going to be using. I can create a PR if this repo is still well maintained. Please let me know.

PageLayout

class PDF::Reader
  class PageLayout

    def to_s
      return "" if @runs.empty?
      return "" if row_count == 0
      first_run_at_new_y = nil # remembering a previous run at a new Y coordinate

      page = row_count.times.map { |i| " " * col_count }
      @runs.each do |run|
        x_pos = ((run.x - @x_offset) / col_multiplier).round
        y_ref_run = run # line added
        if first_run_at_new_y && run.similar_y_coord?(first_run_at_new_y) # line added
          y_ref_run = first_run_at_new_y # line added
        else # line added
          first_run_at_new_y = run # line added
        end # line added
        y_pos = row_count - ((y_ref_run.y - @y_offset) / row_multiplier).round # line updated
        if y_pos <= row_count && y_pos >= 0 && x_pos <= col_count && x_pos >= 0
          local_string_insert(page[y_pos-1], run.text, x_pos)
        end
      end
      interesting_rows(page).map(&:rstrip).join("\n")
    end

  end
end

TextRun

class PDF::Reader
  class TextRun

    # def <=>(other)
    #   if similar_y_coord?(other)
    #     x <=> other.x
    #   else
    #     other.y <=> y
    #   end
    # end

    def similar_y_coord?(other, threshold = nil)
      # arbitrary logic below. It could probably safely bumped to a higher number (dividing by 2 for instance)
      threshold = threshold || [self.font_size, other.font_size].min / 3
      (self.y - other.y).abs < threshold
    end

  end
end

Thank you.

EDIT: I updated the code above to properly support for catching multiple texts which could have been drawn on the following line.

@yob
Copy link
Owner

yob commented Nov 18, 2023

Thanks for the well written report.

If you have the code in a fork for your own use, I'd love a PR that I can play with, check our spec suite, etc ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants