add batch func #2

zengzzzzz · 2023-03-02T07:25:12Z

please review it, thanks.
1 not use the self.new_epub , so del it
2 add batch func with multiprocessing

yihong0618 · 2023-03-02T07:31:34Z

@zengzzzzz thanks will take a look tonight

make.py

yihong0618

Looks very good and clean for now, will test in my env tonight.

Thanks a lot.

yihong0618 · 2023-03-02T11:11:59Z

@zengzzzzz
可能报这个错我找了一本书
下载链接：https://cloudflare-ipfs.com//ipfs/bafk2bzacednqyfwpaydgs73roibihglile7toce4iiwcjvudjxfvmkejcu7ks?filename=%E6%AA%B8%E6%AA%AC_%E6%A2%B6%E4%BA%95%20%E5%9F%BA%E6%AC%A1%E9%83%8E%20%5B%E6%A2%B6%E4%BA%95%20%E5%9F%BA%E6%AC%A1%E9%83%8E%5D.epub

yihong0618 · 2023-03-02T11:14:17Z

有个地方有一点错误，我在 main 上改了哈，你可以先合并一下。

zengzzzzz · 2023-03-02T11:19:19Z

有个地方有一点错误，我在 main 上改了哈，你可以先合并一下。

好的。麻烦了哈。谢谢。

zengzzzzz · 2023-03-02T11:23:01Z

我应该没有权限合并分支。麻烦你帮忙合一下哈。

yihong0618 · 2023-03-02T11:30:45Z

@zengzzzzz 我 fix 了，你拉下代码看看我 comment 那个 herf 问题。

zengzzzzz · 2023-03-02T12:10:06Z

@zengzzzzz 我 fix 了，你拉下代码看看我 comment 那个 herf 问题。

ok

zengzzzzz · 2023-03-02T13:08:49Z

@yihong0618 I fix the herf error， please check it again，thanks。

yihong0618 · 2023-03-02T14:25:50Z

@zengzzzzz I feel that some restrictions should be added, now running in my computer will be directly stuck

yihong0618 · 2023-03-02T14:51:39Z

using test_books/lemo.epub to test find some problem.

the whole terminal stuck(I am using kitty)
seems the same speed with not batch
error found

yihong0618 · 2023-03-02T15:03:03Z

IMO we'd better use the <p1> + <p2> + <p3> -> chatgpt api -> <p1_t> <pt_2> <p3_t> batch way that every user can go well.

zengzzzzz · 2023-03-02T15:07:25Z

IMO we'd better use the <p1> + <p2> + <p3> -> chatgpt api -> <p1_t> <pt_2> <p3_t> batch way that every user can go well.

I will try this way
1 will add restrictions (eg: max workers) to avoid stuck
2 tested on my machine, it is faster, but its speed depends on the response of the api
3 this seems to be an error generated by openai, I will verify it

zengzzzzz · 2023-03-02T15:08:43Z

IMO we'd better use the <p1> + <p2> + <p3> -> chatgpt api -> <p1_t> <pt_2> <p3_t> batch way that every user can go well.

you are right . If we want everyone to use it well, we should try the method you mentioned.

zengzzzzz · 2023-03-02T15:13:02Z

IMO we'd better use the <p1> + <p2> + <p3> -> chatgpt api -> <p1_t> <pt_2> <p3_t> batch way that every user can go well.

I think the method you mentioned has these advantages
1 fewer tokens, less cost
2 not easy to get stuck
3 everyone can go well

zengzzzzz · 2023-03-04T06:57:37Z

I implemented the batch translate method, but there are still some flaws in the format, so I decided to use it in my own project. If you are interested or needed, you can refer to the following implementations:

def translate_book(self):
        new_book = epub.EpubBook()
        new_book.metadata = self.origin_book.metadata
        new_book.spine = self.origin_book.spine
        new_book.toc = self.origin_book.toc
        batch_p = []
        batch_count = 0
        for i in self.origin_book.get_items():
            if i.get_type() == 9:
                soup = bs(i.content, "html.parser")
                p_list = soup.findAll("p")
                for p in p_list:
                    if p.text and not p.text.isdigit():
                        batch_p.append(p)
                        batch_count += 1
                        if batch_count == self.batch_size:
                            translated_batch = self.translate_model.translate([p.text for p in batch_p])
                            for j, c_p in enumerate(batch_p):
                                c_p.string = c_p.text + translated_batch[j]
                            batch_p = []
                            batch_count = 0
                    # Process any remaining paragraphs in the last batch
                if batch_p:
                    translated_batch = self.translate_model.translate([p.text for p in batch_p])
                    for j, c_p in enumerate(batch_p): 
                        c_p.string = c_p.text + translated_batch[j]
                    batch_p = []
                    batch_count = 0
                i.content = soup.prettify().encode()
            new_book.add_item(i)
        name = self.epub_name.split(".")[0]
        epub.write_epub(f"{name}_translated.epub", new_book, {})

yihong0618 · 2023-03-04T07:02:12Z

thanks will keep this pr if I use your code I will commit on this.
Thanks again

zengzzzzz · 2023-03-04T07:11:55Z

thanks will keep this pr if I use your code I will commit on this. Thanks again

thank you, bro

DennySORA · 2023-03-04T14:53:07Z

我覺得，這部分有些能改成 async 的模式，再加上你的這個。

畢竟目前大多數的瓶頸都是在 openAI API 跟 network 上。

zengzzzzz · 2023-03-04T18:07:38Z

我覺得，這部分有些能改成 async 的模式，再加上你的這個。

畢竟目前大多數的瓶頸都是在 openAI API 跟 network 上。

you are probably right . a reason for not using async or multprocessing etc is the qps limitation of the open ai api ( 20 per minute) , which makes its performance improvement not obvious.

tianshanghong · 2023-03-05T23:20:54Z

you are probably right . a reason for not using async or multprocessing etc is the qps limitation of the open ai api ( 20 per minute) , which makes its performance improvement not obvious.

Looks that the rate limit from OpenAI may not be a problem now. For Pay-as-you-go-user (after 48 hours), OpenAI allows 3500 requests per minute and 90,000 tokens per minute.

I created a new PR #62 to support batch processing with asyncio.

1. Supported to customize translation text position and delete original content. resoved yihong0618#7 2. Supported to exclude original content by keyword and regular expression. resolved yihong0618#2 3. Added Baidu and Youdao translation engines. resolved yihong0618#3 4. Changed to save translated ebooks as a new book in Calibre library. 5. Supported to customize the color of translation text. 6. Supported to customize ChatGPT prompt word. resolved yihong0618#4 7. Ignored to translate phonetic symbols (e.g. Japanese). fixed yihong0618#3 8. Added Spanish as supported interface language. resolved yihong0618#5 9. Added disgnosis information to log. 10. Added "lang" attribute at translation element. 11. Fixed plugin icon disappearance when changing Calibre interface language. 12. Improved the functionality to extract original text.

add batch func

77bab7b

yihong0618 reviewed Mar 2, 2023

View reviewed changes

make.py Outdated Show resolved Hide resolved

make.py Outdated Show resolved Hide resolved

zengzzzzz added 2 commits March 2, 2023 16:30

fix the batch func in order

f3bea71

del func

ed4065b

yihong0618 reviewed Mar 2, 2023

View reviewed changes

fix: merge main

8d036b4

fix the href error

6ee2a6f

tianshanghong mentioned this pull request Mar 5, 2023

support batch processing #62

Open

adrbn mentioned this pull request Mar 20, 2023

REQ: Output translated strings to seperate file #169

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add batch func #2

add batch func #2

zengzzzzz commented Mar 2, 2023 •

edited

Loading

yihong0618 commented Mar 2, 2023

yihong0618 left a comment

yihong0618 commented Mar 2, 2023 •

edited

Loading

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 4, 2023 •

edited

Loading

yihong0618 commented Mar 4, 2023

zengzzzzz commented Mar 4, 2023

DennySORA commented Mar 4, 2023

zengzzzzz commented Mar 4, 2023 •

edited

Loading

tianshanghong commented Mar 5, 2023

add batch func #2

Are you sure you want to change the base?

add batch func #2

Conversation

zengzzzzz commented Mar 2, 2023 • edited Loading

yihong0618 commented Mar 2, 2023

yihong0618 left a comment

Choose a reason for hiding this comment

yihong0618 commented Mar 2, 2023 • edited Loading

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

yihong0618 commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 2, 2023

zengzzzzz commented Mar 4, 2023 • edited Loading

yihong0618 commented Mar 4, 2023

zengzzzzz commented Mar 4, 2023

DennySORA commented Mar 4, 2023

zengzzzzz commented Mar 4, 2023 • edited Loading

tianshanghong commented Mar 5, 2023

zengzzzzz commented Mar 2, 2023 •

edited

Loading

yihong0618 commented Mar 2, 2023 •

edited

Loading

zengzzzzz commented Mar 4, 2023 •

edited

Loading

zengzzzzz commented Mar 4, 2023 •

edited

Loading