Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some notes about thornyreader crengine fork #267

Closed
poire-z opened this issue Mar 6, 2019 · 15 comments
Closed

Some notes about thornyreader crengine fork #267

poire-z opened this issue Mar 6, 2019 · 15 comments

Comments

@poire-z
Copy link
Contributor

poire-z commented Mar 6, 2019

New active fork of crengine found by @pazos at koreader/koreader#4745 (comment):

https://github.com/irrationalunzen/thornyreader thornyreader

Spent the evening reading git log -p... so sharing some observations:

  • There had been some huge reformatting (indent, brackets...), renames (like tinyNodeCollection renamed to CrDomBase), reordering of sections of code, removal (many source files or code they didn't need was removed, the most surprising is all the crengine caching that I find is essential). All this makes the possibly interesting patches unappliable without much work. I've often been tempted to do that, for easier walking into the code - but never did, and I'm glad I didn't, now :)
  • They have a CreBrigde.cpp that acts a bit like our cre.cpp.
  • They don't seem to use much of crengine like we do - and it feels like it's used mosly as a parser and renderer to lay out text, and they pick up "boxes" (hitbox) of text like may be they do for the other engines (mupdf, djvu), and have some common code do more stuff with that - unlike us that delegate more stuff to crengine (like finding the nodes that is rendered at x/y full coordinantes). It seems they work page per page: render 1 or 2 pages, get the bitmap and some boxes, and that's nearly all.
  • They do a lot more dedicated XML re-parsing - and many checks for specific tags in the XML parsing code (like, if it's a "span", ignore it, if it's a "p", remove some last space) - which seems very ugly. It really feels like they didn't get the full picture of crengine, and did some small hacks here and there to quickly solve some localized problems. Or may be it's all their frontend/common code (that I didn't look at) that required that.
  • I didn't seen any real fix to the initial codebase (that was the main motivation for going thru all their commits - but it was huge, so I may have missed them). Their upstream starting commit (and possibly last sync) was buggins/coolreader@1e07d15.

Some noticable stuff they added/changed (in bold, the things that could be of interest to us):
(the commit links I linked to is usually the most interesting one in the set - but the features come each with many many commits and later fixes).

(so, in bold, the things that could be of interest to us - even if I'm not particularly interested by any of them :) but I'll surely be reading that RTL stuff - just to see how easy or risky it would be in regards to all the LTR text fixes we added).

@Frenzie
Copy link
Member

Frenzie commented Mar 7, 2019

That was fast. :-)

smart txt detection (?)

Not sure what you mean by that but also see our #39.

Probably not usable as is - and there are lots of arabic char codes, so they do the positionning/sizing/rendering manually - while we could probably have harfbuzz do some of that, as it already renders/draw individual words in rtl. But it's definitely something to read to see how we could start to go at that.

Yeah, plus HarfBuzz does it for Persian, etc., etc.

@poire-z
Copy link
Contributor Author

poire-z commented Mar 7, 2019

smart txt detection (?)

Not sure, not my words :) CONFIG_CRENGINE_TXT_SMART_FORMAT and DOC_FLAG_TXT_NO_SMART_FORMAT were added in their first commit, and possibly disappeared later. Stuff about txt content parsing, checking more stuff like nb chars per line, some kind of reflowing, wrapping to 80 columns...

Didn't mention it above, but they did indeed many updates/fixes to the little document formats, like mobi (they included another library to do that), txt, rtf, docx...
Which might explain why they didn't care much about the more complex CSS/HTML renderings like tables, etc...

@Frenzie
Copy link
Member

Frenzie commented Apr 24, 2019

@poire-z The repo seems to have vanished. O_o Do you still have a local copy?

@xelxebar
Copy link
Contributor

FWIW, it looks like the author's GitHub account is still active. I just tried sending an email, requesting temporary access to the repo.

@Frenzie
Copy link
Member

Frenzie commented Apr 24, 2019

@xelxebar Besides giving some indications as to which part of the code to look at it may not have been super helpful for complex rendering changes like bidi or vertical judging by @poire-z's analysis above, but it seemed nice to have a relevant example in any case.

@poire-z
Copy link
Contributor Author

poire-z commented Apr 24, 2019

Yes, I do have a local copy.
I dunno what I'm allowed to do with it... If the original author has removed it, it would feel awkward to make it available again.
I could share here as attachment the 2 or 3 files related to RTL (they would probably miss some stuff in other files - but that code should really not be used as is anyway).

@link2xt
Copy link

link2xt commented May 12, 2019

Here is another copy of thornyreader: https://github.com/readera/thornyreader
It is used in the ReadEra EReader, see https://readera.ru/en/reader-book or https://readera.ru/reader-book for direct APK links.

@poire-z You are allowed to make it available again, since Cool Reader is released under the GPL license. By the way, thornyreader has license problems, LICENSE files contains GNU AGPL which is not compatible with the GPL license of Cool Reader. And the source code of ReadEra is not published.

@link2xt
Copy link

link2xt commented May 12, 2019

@poire-z You may want to write the author first anyway, of course, maybe he has reasons to try to remove this repository. I guess it was just moved to readera organization without announcement.

@poire-z
Copy link
Contributor Author

poire-z commented May 12, 2019

Here is another copy of thornyreader: https://github.com/readera/thornyreader

Thanks for finding that one.
But it's actually lagging by 669 commits from the repo I cloned some months ago.
But that's good, as I guess I can share a patch file with the 516 commits involving the crengine/ subdirectory, that the really interested person can apply on their clone of that readera/thornyreader repo:
thornyreader_crengine_after_20180207_af19707.patch.zip
(Made with git format-patch --stdout af19707 -- crengine if that matters)

@xelxebar
Copy link
Contributor

FWIW, I did contact the original author and got a reply. Apparently, they were developing thornyreader for their employer and got strong-armed into taking the repo down. It seems like they personally would like the repo to be available, but it's unclear to me whether the patchset is legally usable. The GPL only guarantees source availability once the binaries have been distributed, no?

@link2xt
Copy link

link2xt commented May 13, 2019

@xelxebar

The GPL only guarantees source availability once the binaries have been distributed, no?

Yes. But the binaries are distributed both on https://readera.ru/reader-book and https://play.google.com/store/apps/details?id=org.readera

ReadEra uses Cool Reader based engine, which is licensed under GPL. GPL does not require authors to publish the source, but any user can demand the source code from them. And since binaries are freely distributed, you can easily become the user.

EDIT: I would not contact the ReadEra with these demands, though. Better contact the authors directly to avoid any consequences for them (though not following the GPL is obviously their employer's fault).

@link2xt
Copy link

link2xt commented May 13, 2019

It seems like they personally would like the repo to be available, but it's unclear to me whether the patchset is legally usable.

It may be not legally usable in one corner case: if the authors took some code that was licensed under the GPL-incompatible license and used it in their GPL software fork. That would be illegal in the first place so the GPL cannot make this combined codebase legal even if the users were not aware of the origin of the code.

@poire-z
Copy link
Contributor Author

poire-z commented May 13, 2019

As I mentionned in the first post, there is really not much we'd want to borrow, or even could borrow (because their use of crengine looked so different than ours). So, no real need to get answers to these legal questions.
The patches are just something one can have a look at to see how these features have been done on this other planet, like we would probably look at some other software written in a different language - some kind of tourism.
(The only thing that felt like something we could possibly pick is the DOCX support - but I don't think there's that much interest in having it.)

@poire-z
Copy link
Contributor Author

poire-z commented Nov 22, 2019

Now that we got DocX support from upstream, and our own RTL/Bidi support, nothing much to pick from that fork.
Just cut and pasting the "Epub3 series metadata support" logic - in case we some day need to look at these other fields and properties:

//epub3 series metadata
if(series.empty()) {
    struct titleitem { lString16 title; lString16 id; };
    LVArray<titleitem> titles;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/title[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        titleitem curr;
        curr.id = item->getAttributeValue("id");
        curr.title = item->getText().trim();
        titles.add(curr);
    }
    lString16 series_id;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        lString16 property = item->getAttributeValue("property");
        lString16 content = item->getText().trim();
        if (property == "collection-type" && content == "series") {
            series_id = item->getAttributeValue("refines");
            if (series_id.startsWith("#"))
                series_id = series_id.substr(1, series_id.length() - 1);
            break;
        }
    }
    if (!series_id.empty()) {
        for (int i = 0; i < titles.length(); i++) {
            titleitem curr = titles.get(i);
            if (curr.id == series_id) {
                series = curr.title;
                break;
            }
        }
        for (int i = 1; i < 20; i++) {
            lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
            ldomNode *item = dom->nodeFromXPath(xpath);
            if (!item)
                break;
            lString16 refines = item->getAttributeValue("refines");
            lString16 property = item->getAttributeValue("property");
            lString16 content = item->getText().trim();
            if (refines == series_id && property == "group-positon") {
                series_number = content.atoi();
                break;
            }
        }
    }
}
//if series still empty: last attempt, for property="belongs-to-collection"
if(series.empty()) {
    lString16 series_id;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        lString16 property = item->getAttributeValue("property");
        if (property == "belongs-to-collection") {
            series = item->getText().trim();
            series_id = item->getAttributeValue("id");
            break;
        }
    }
    if(!series_id.empty() && series_number == 0) {
        for (int i = 1; i < 20; i++) {
            lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
            ldomNode *item = dom->nodeFromXPath(xpath);
            lString16 property = item->getAttributeValue("property");
            lString16 id = item->getAttributeValue("refines");
            if (id.startsWith("#"))
                id = id.substr(1, id.length() - 1);
            if (id == series_id && property == "group-position") {
                series_number = item->getText().trim().atoi();
                break;
            }
        }
    }
}

@poire-z
Copy link
Contributor Author

poire-z commented Aug 24, 2020

All the interesting stuff in bold has since been implemented via some original work.
So nothing more to look at for consideration in this no more available fork.

@poire-z poire-z closed this as completed Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants