Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to detect vector shapes in pdf pages #9728

Closed
darkworks opened this issue May 16, 2018 · 3 comments
Closed

how to detect vector shapes in pdf pages #9728

darkworks opened this issue May 16, 2018 · 3 comments

Comments

@darkworks
Copy link

Attach (recommended) or Link to PDF file here:
test4.pdf

Configuration:

  • Web browser and its version: chrome latest
  • Operating system and its version: win 7
  • PDF.js version: latest from mozilla cdn
  • Is a browser extension: no

i can detect text , images in pdf pages but do not know how to detect vector shapes , i checked wiki pages but no found any useful related to it

so any idea how to detect vectors in pdf that page have vectors or not , like the pdf i attached , page 1 have text , page 2 have vector shapes , and page 3 have image. so am stuck with page 2 vector shape detection that page have it or not .

examples

with page 1 to detect text am doing something like

      page.getTextContent({ normalizeWhitespace: true }).then(function (textContent) {
        textContent.items.forEach(function (textItem) {
	//	console.log(textItem.str);	  
        });
      });

with page 3 to detect images doing something like

				var ops = await page.getOperatorList();						
				for (let j=0; j < ops.fnArray.length; j++) {
			        if (ops.fnArray[j] == pdfjsLib.OPS.paintJpegXObject || ops.fnArray[j] == pdfjsLib.OPS.paintImageXObject) {
				   var op = ops.argsArray[j][0];
			            var img = page.objs.get(op);
}
}

page 2 .... problem

thanks

@timvandermeij
Copy link
Contributor

Vector shapes are not one entity in PDF documents, but are instead composed of various operations, such as line drawing and filling (see https://github.com/mozilla/pdf.js/blob/master/src/shared/util.js#L174 for the list of existing operations), hence there is no easy way to extract vector shapes unfortunately.

@darkworks
Copy link
Author

darkworks commented May 17, 2018

ok thanks , then how to just detect that page have vectors or not i tried something like

 if (ops.fnArray[j] == pdfjsLib.OPS.rectangle || ops.fnArray[j] == pdfjsLib.OPS.closePath) {
var op = ops.argsArray[j][0];

which not worked , then i just looped to see what inside
so i did something like

				var ops = await page.getOperatorList();						
				for (let j=0; j < ops.fnArray.length; j++) {
					console.log(ops.fnArray[j]);

which output

78
79

and then i checked : https://github.com/mozilla/pdf.js/blob/master/src/shared/util.js#L174

  beginAnnotations: 78,
  endAnnotations: 79,

as you can see my test pdf page 2 have one rectangle and circle so i was aspecting some rectange and circle codes

any idea what am doing wrong

@timvandermeij
Copy link
Contributor

In that case the rectangle and circle are drawn onto the annotation layer, which are already SVG elements. Refer to https://github.com/mozilla/pdf.js/blob/master/src/display/annotation_layer.js#L898-L938 for how they're drawn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants