-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add batch func #2
base: main
Are you sure you want to change the base?
Conversation
@zengzzzzz thanks will take a look tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good and clean for now, will test in my env tonight.
Thanks a lot.
有个地方有一点错误,我在 main 上改了哈,你可以先合并一下。 |
好的。麻烦了哈。谢谢。 |
我应该没有权限合并分支。麻烦你帮忙合一下哈。 |
@zengzzzzz 我 fix 了,你拉下代码看看我 comment 那个 herf 问题。 |
ok |
@yihong0618 I fix the herf error, please check it again,thanks。 |
@zengzzzzz I feel that some restrictions should be added, now running in my computer will be directly stuck |
IMO we'd better use the |
I will try this way |
you are right . If we want everyone to use it well, we should try the method you mentioned. |
I think the method you mentioned has these advantages |
I implemented the batch translate method, but there are still some flaws in the format, so I decided to use it in my own project. If you are interested or needed, you can refer to the following implementations: def translate_book(self):
new_book = epub.EpubBook()
new_book.metadata = self.origin_book.metadata
new_book.spine = self.origin_book.spine
new_book.toc = self.origin_book.toc
batch_p = []
batch_count = 0
for i in self.origin_book.get_items():
if i.get_type() == 9:
soup = bs(i.content, "html.parser")
p_list = soup.findAll("p")
for p in p_list:
if p.text and not p.text.isdigit():
batch_p.append(p)
batch_count += 1
if batch_count == self.batch_size:
translated_batch = self.translate_model.translate([p.text for p in batch_p])
for j, c_p in enumerate(batch_p):
c_p.string = c_p.text + translated_batch[j]
batch_p = []
batch_count = 0
# Process any remaining paragraphs in the last batch
if batch_p:
translated_batch = self.translate_model.translate([p.text for p in batch_p])
for j, c_p in enumerate(batch_p):
c_p.string = c_p.text + translated_batch[j]
batch_p = []
batch_count = 0
i.content = soup.prettify().encode()
new_book.add_item(i)
name = self.epub_name.split(".")[0]
epub.write_epub(f"{name}_translated.epub", new_book, {}) |
thanks will keep this pr if I use your code I will commit on this. |
thank you, bro |
我覺得,這部分有些能改成 async 的模式,再加上你的這個。 畢竟目前大多數的瓶頸都是在 openAI API 跟 network 上。 |
you are probably right . a reason for not using async or multprocessing etc is the qps limitation of the open ai api ( 20 per minute) , which makes its performance improvement not obvious. |
Looks that the rate limit from OpenAI may not be a problem now. For Pay-as-you-go-user (after 48 hours), OpenAI allows 3500 requests per minute and 90,000 tokens per minute. I created a new PR #62 to support batch processing with |
1. Supported to customize translation text position and delete original content. resoved yihong0618#7 2. Supported to exclude original content by keyword and regular expression. resolved yihong0618#2 3. Added Baidu and Youdao translation engines. resolved yihong0618#3 4. Changed to save translated ebooks as a new book in Calibre library. 5. Supported to customize the color of translation text. 6. Supported to customize ChatGPT prompt word. resolved yihong0618#4 7. Ignored to translate phonetic symbols (e.g. Japanese). fixed yihong0618#3 8. Added Spanish as supported interface language. resolved yihong0618#5 9. Added disgnosis information to log. 10. Added "lang" attribute at translation element. 11. Fixed plugin icon disappearance when changing Calibre interface language. 12. Improved the functionality to extract original text.
please review it, thanks.
1 not use the self.new_epub , so del it
2 add batch func with multiprocessing