-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can we replace/blur some text in pdf without breaking any layout or formatting? #31
Comments
Replacing text depends on whether the replacement text fits into the place of the replaced text. If so, then yes, this is possible. As for bluring text: This may be possible by using PDF transparency and overlaying suitable graphics. However, this would only visually blur the text. Text selection or text extraction would still find the text underneath. |
@gettalong I tried this for bluring but did't get how to replace text using hexapdf. require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page, to_hide_arr)
super()
@canvas = page.canvas(type: :overlay)
@to_hide_arr = to_hide_arr
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
if @to_hide_arr.include? boxes.string
@canvas.stroke_color(0, 0 , 0)
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
@canvas.rectangle(x, y, tx - x, ty - y).fill
end
end
end
alias :show_text_with_positioning :show_text
end
file_name = ARGV[0]
strings_to_black = ARGV[1].split("|")
doc = HexaPDF::Document.open(file_name)
puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
doc.pages.each.with_index do |page, index|
processor = ShowTextProcessor.new(page, strings_to_black)
page.process_contents(processor)
end
new_file_name = "#{file_name.split('.').first}_updated.pdf"
doc.write(new_file_name, optimize: true)
puts "Writing updated file [#{new_file_name}]." |
The check for the boxes in If you want to black-out a whole Generally, text may be rotated, skewed, etc., so if you want to make sure to get the correct area, you need to iterate over the GlyphBoxes use the And note the the decoded string |
@gettalong Can we do something for black out strings for decoded string If we can consider whole page content in a singles boxes element string then we can iterate through boxes and black out specific box. |
@gettalong I found some temporal solution with require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page, to_hide_arr)
super()
@canvas = page.canvas(type: :overlay)
@to_hide_arr = to_hide_arr
@boxeslist = []
end
def show_text(str)
boxes = decode_text_with_positioning(str)
boxes.each do |box|
@boxeslist << box
end
end
def blackout_text()
@to_hide_arr.each do |hide_item|
@boxeslist.each_with_index do |box, index|
#puts sum_string(index, hide_item.length)
if hide_item == sum_string(index, hide_item.length)
blackout_array(index, hide_item.length)
end
end
end
end
def blackout_array(start_ind, end_ind)
sum = ""
i = start_ind
while i < start_ind+end_ind do
box = @boxeslist[i]
@canvas.fill_color(255, 255, 255)
x, y = *box.lower_left
tx, ty = *box.upper_right
@canvas.rectangle(x, y, tx - x, ty - y).fill
i +=1
end
end
def sum_string(start_ind, end_ind)
sum = ""
i = start_ind
while i < start_ind+end_ind do
begin
sum += @boxeslist[i].string
rescue NoMethodError
print ""
end
i +=1
end
return sum
end
alias :show_text_with_positioning :show_text
end
file_name = ARGV[0]
strings_to_black = ARGV[1].split("|")
doc = HexaPDF::Document.open(file_name)
puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
doc.pages.each.with_index do |page, index|
processor = ShowTextProcessor.new(page, strings_to_black)
page.process_contents(processor)
processor.blackout_text()
end
new_file_name = "#{file_name.split('.').first}_updated.pdf"
doc.write(new_file_name, optimize: true)
puts "Writing updated file [#{new_file_name}]." |
This code might work as long as the text was laid out in a linear order by the PDF writer. However, since you don't make use of the positional information while concatenating strings, there will be cases where it won't work. One example is non-linear text output by the PDF writer, another is non-output of spaces between words. However, in the general case, it should work fine because linear text output is one easy way to save space when writing a content stream. |
@gettalong Thanks for detailed explanation. |
No description provided.
The text was updated successfully, but these errors were encountered: