Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get comments from DOCX? #443

Closed
luigi-asprino opened this issue Dec 15, 2023 Discussed in #430 · 11 comments
Closed

How to get comments from DOCX? #443

luigi-asprino opened this issue Dec 15, 2023 Discussed in #430 · 11 comments
Labels
Feature New feature or request
Milestone

Comments

@luigi-asprino
Copy link
Member

Discussed in #430

Originally posted by kvistgaard November 18, 2023
From what I tried so far, it seems they are not accessible.
Yet, since they are what I mostly need to get from MS Word documents, I'm hoping that there is a way (I saw such an option for spreadsheets) or that it can be implemented.

@luigi-asprino luigi-asprino added the Feature New feature or request label Dec 15, 2023
@kvistgaard
Copy link
Contributor

@luigi-asprino , any updates on that?

@luigi-asprino
Copy link
Member Author

7912bb9 implements the extension to extract comment documents.
Comments are interpreted as containers with three slots containing the id, the author and the text of the comment. Comment Containers are attached to the paragraph the comment refers to.

See this docx and its RDF counterpart

@luigi-asprino luigi-asprino added this to the v1.0.0 milestone Aug 5, 2024
luigi-asprino added a commit that referenced this issue Aug 5, 2024
@kvistgaard
Copy link
Contributor

kvistgaard commented Aug 5, 2024

@luigi-asprino excellent, I'll give it a try very soon. At a first glance it's not obvious how a comment is linked to what it is a comment on, and the thread: commentY isResponseTo comment commentX

@kvistgaard
Copy link
Contributor

7912bb9 implements the extension to extract comment documents.

Now I see that it's for 1.0. I've been trying with the latest release 0.9.0. When will it be released?

@luigi-asprino
Copy link
Member Author

You can try it out with the pre-release v1.0-DEV.4 that has just been created.

https://github.com/SPARQL-Anything/sparql.anything/releases/tag/v1.0-DEV.4

@kvistgaard
Copy link
Contributor

Thanks. Just tested it. Works great. Excellent work.
Do you have any thoughts on the threads?

@luigi-asprino
Copy link
Member Author

I am reopening it to try to make the comments thread clearer.

@luigi-asprino luigi-asprino reopened this Aug 10, 2024
@luigi-asprino
Copy link
Member Author

At the moment, comments in the same thread are attached as subsequent slots of the container for the paragraph.

Suppose you have a paragraph "Paragraph1" with two comments ("This is a comment" and "This is a reply").

This results into two slots 2 and 3 referencing the comments

<http://www.example.org/document/paragraph/2>
        rdf:type  xyz:Paragraph;
        rdf:_1    "Paragraph1";
        rdf:_2    <http://www.example.org/document/Comment_0>;
        rdf:_3    <http://www.example.org/document/Comment_1> .

<http://www.example.org/document/Comment_1>
        rdf:type  xyz:Comment;
        rdf:_1    <http://www.example.org/document/Comment_1/Author>;
        rdf:_2    <http://www.example.org/document/Comment_1/CommentText>;
        rdf:_3    <http://www.example.org/document/Comment_1/CommentId>.

<http://www.example.org/document/Comment_1/CommentId>
        rdf:type  xyz:CommentId;
        rdf:_1    "1" .

<http://www.example.org/document/Comment_1/CommentText>
        rdf:type  xyz:CommentText;
        rdf:_1    "This is a reply" .

<http://www.example.org/document/Comment_1/Author>
        rdf:type  xyz:CommentAuthor;
        rdf:_1    "Luigi Asprino" .


<http://www.example.org/document/Comment_0>
        rdf:type  xyz:Comment;
        rdf:_1    <http://www.example.org/document/Comment_0/Author>;
        rdf:_2    <http://www.example.org/document/Comment_0/CommentText>;
        rdf:_3    <http://www.example.org/document/Comment_0/CommentId>.


<http://www.example.org/document/Comment_0/CommentId>
        rdf:type  xyz:CommentId;
        rdf:_1    "0" .

<http://www.example.org/document/Comment_0/CommentText>
        rdf:type  xyz:CommentText;
        rdf:_1    "This is a comment" .

<http://www.example.org/document/Comment_0/Author>
        rdf:type  xyz:CommentAuthor;
        rdf:_1    "Luigi Asprino" .

A possible solution would be adding the thread comment number as a slot of the comment.

<http://www.example.org/document/Comment_1>
        rdf:type  xyz:Comment;
        rdf:_1    <http://www.example.org/document/Comment_1/Author>;
        rdf:_2    <http://www.example.org/document/Comment_1/CommentText>;
        rdf:_3    <http://www.example.org/document/Comment_1/CommentId>;
        rdf:_4    <http://www.example.org/document/Comment_1/ThreadCommentNumber> .

<http://www.example.org/document/Comment_1/ThreadCommentNumber>
        rdf:type  xyz:ThreadCommentNumber;
        rdf:_1    "2"^^xsd:int .

<http://www.example.org/document/Comment_0>
        rdf:type  xyz:Comment;
        rdf:_1    <http://www.example.org/document/Comment_0/Author>;
        rdf:_2    <http://www.example.org/document/Comment_0/CommentText>;
        rdf:_3    <http://www.example.org/document/Comment_0/CommentId>;
        rdf:_4    <http://www.example.org/document/Comment_0/ThreadCommentNumber> .

<http://www.example.org/document/Comment_0/ThreadCommentNumber>
        rdf:type  xyz:ThreadCommentNumber;
        rdf:_1    "1"^^xsd:int .


luigi-asprino added a commit that referenced this issue Aug 10, 2024
@kvistgaard
Copy link
Contributor

I was imagining something more in the style of sioc:has_reply + sioc:Thread but I guess what you suggest would work equally well.

@luigi-asprino
Copy link
Member Author

The relationship between comments and their replies is implicit in the order of the comments. Therefore, sioc:has_reply + sioc:Thread can be materialised with a SPARQL construct if necessary. This is in line with the SPARQL Anything philosophy of using the minimum number of operations to transform data into RDF and leaving the transformation to the user.

@kvistgaard
Copy link
Contributor

@luigi-asprino Currently, there is a document part (paragraph, heading) on which the comment is made that is nicely linked with the comment. Is there a way to extract also the highlighted part of the text of that item on which the comment is made?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants