Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get more than 15 minutes computation time? #2715

Closed
Paebbels opened this issue Mar 10, 2017 · 25 comments
Closed

How to get more than 15 minutes computation time? #2715

Paebbels opened this issue Mar 10, 2017 · 25 comments

Comments

@Paebbels
Copy link

Thanks for fixing #2697.

We now know, that the documentation of "poc-library" takes more then 15 minutes on the RTD servers. I know, that our 750 PDF pages doc is not the smallest, but my local machine requires only 10 minutes to build everything (HTML, LaTeX, PDF).

What are the criteria to get more runtime?

We currently estimate the documentation status to 30-40% for our project. Is there any way to speed up the pickle process? As far as I noticed the creation gets slower if we add more internal links. As I started to add text into the Python docstrings, it got slower.

Are there other open source projects compiling on RTD with a similar size?

I was actually working on a new language extension for Sphinx, to add VHDL support to the tool.

Kind regards
Patrick

@humitos humitos added the Support Support question label Mar 11, 2017
@humitos
Copy link
Member

humitos commented Mar 11, 2017

I think this is a good question for @ericholscher and or @agjohnson.

@ericholscher
Copy link
Member

It seems like a worthy project -- I've raised the limit to an hour. We do ask projects that use a lot of resources to contribute to the project either with time or money -- you can help with development by reading through our contributing docs: http://docs.readthedocs.io/en/latest/contribute.html -- also you can contribute money here monthly: https://readthedocs.org/accounts/gold/subscription/ or once off: https://readthedocs.org/sustainability/

@Paebbels
Copy link
Author

Thank you @ericholscher.

Currently, I can't offer a (money) donation as we have no funding for our open source project. I'm working on finding sponsors for the project :).

I'll try to keep the number of builds low, as we have two repositories: a internal with lot's of commits and the external, visible on GitHub. We push changes maybe once a week. The doc builds are limited to the master and release branches.

I don't know if I can contribute to RTD itself, but I'm currently working on a new language plugin for Spinx. It hopefully brings as much benefits to Sphinx as the python domain.

So 6 months ago I started to write a parser for VHDL, that will extract the necessary information for an upcoming vhdl domain. VHDL is one of the hardest languages to parse, so it will take me some time to finish it.

https://github.com/Paebbels/pyVHDLParser

Kind regards
Patrick

@agjohnson agjohnson removed the Support Support question label Mar 14, 2017
@ericholscher
Copy link
Member

Cool stuff. @agjohnson has done a good bit of work around domains, and they're definitely a bit under specified. You might look at https://github.com/rtfd/sphinxcontrib-dotnetdomain which I believe is reasonably complete and actually has tests :)

@Paebbels
Copy link
Author

Oh I see. The domain just uses static ReST. So my solution also covers something like autoapi :)

@Paebbels
Copy link
Author

Paebbels commented May 6, 2018

@ericholscher
There has been a pause in the development of PoC... So no ReadTheDocs builds have been run. Now I found a release plan and agreement to continue the work on PoC. Therefore I would like to continue using ReadTheDocs. As far as I can see, the limit for PoC has been set back to 15 minutes.

Can you explain this?
I would have been good to get a notification when our agreement ended.


According to my local machine it needs 32 minutes to compile. Currently I can not say why each compilation step invalidates the pickeled results. Every run reads the sources from scratch.

I was think as I now have a well paided job, I might spend some money for ReadTheDocs, even if it's still open source work. What monthly rate do you expect for 45 to 60 minutes compile time for my account?

In addition, I would expect support by email or Skype.

@humitos
Copy link
Member

humitos commented May 8, 2018

I was think as I now have a well paided job, I might spend some money for ReadTheDocs, even if it's still open source work. What monthly rate do you expect for 45 to 60 minutes compile time for my account?

Maybe it worth to take a look at https://readthedocs.com/pricing/

It also has email support :)

I'm not sure what's the monthly rate for 45/60 minutes, but you can send an email to the address listed there. Thanks!

@Paebbels
Copy link
Author

Paebbels commented May 8, 2018

Sorry, but paying 50$/month for an open source project is way to much!
I don't want to buy RTFD shares.

I was thinking of 20$/month at max to become a Gold User.

@stsewd
Copy link
Member

stsewd commented May 8, 2018

@Paebbels the $50/month is for the readthedocs.com service, for gold member you can do it from $5/month https://readthedocs.org/accounts/gold/subscription/

@stsewd
Copy link
Member

stsewd commented May 8, 2018

Just for clarification, https://readthedocs.org (org) is the open source project, and there is https://readthedocs.com which is the commercial project (isn't open source).

@Paebbels
Copy link
Author

Paebbels commented May 8, 2018

That's clear. That's why I'm asking what is needed to get 45-60min compilation time for my open source project? I'm not paying a commercial account to run my open source projects.

@humitos
Copy link
Member

humitos commented May 8, 2018

@Paebbels I pointed to .com since I understood that there were two different repositories: one private and one public. Private repos are not supported in .org

Regarding the Gold membership, we are currently working on make it more clear and transparent. There is a PR in progress at #4063

Also, from what I know, there is not direct support over email for Gold members. I suppose the Github issue tracker is the only place to get help in that case.

Anyway, I will mention this case to the folks so we can analyse it together and it's not only my point of view. I will back to you with more information. Thanks.

@davidfischer
Copy link
Contributor

@Paebbels, sorry for the confusion. Our goal is not to bill open source projects and you shouldn't have to pay to host an open source project on Read the Docs. We don't intend to change that. Read the Docs is trying to clear up some of our messaging and I'm actively working on this.

Essentially our paid plans (readthedocs.com) are mostly for companies who want to host documentation on closed source stuff. They also get dedicated support channels outside of the github issue tracker and typically faster response times.

We are also working on detailing some of the benefits of Gold Members. These include a mention on our supporters page and an ad-free experience (see #4063). This is totally optional however and we aren't trying to guilt people into going Gold.

Your build time has been increased to 60 minutes. However, I am seeing errors related to too much memory. That's more curious because generally memory consumption in sphinx is pretty small (a couple hundred megs). Do you know why there would be more memory usage?

@Paebbels
Copy link
Author

@davidfischer
I'm currently dividing the project into two parts:

  • The IP core library with VHDL contents
  • The Python infrastructure to automate simulations

On the onehand I want to minimize ReadTheDocs compile times, but the main reason is that others should be able to use the Python scripts I developed for PoC.

I think when the split is done, we can investigate much better what part of documentation is stressing RTFD so much. The Python infrastructure is heavily using autoapi and autoprogram. Whereas the IP core library part has lots of documents with much source code.

As far as I can see, the current setup produces around 240 MB out outputs.

I have also no clue, why it's invalidating all cached data for each backend run: HTML, Single-HTML, Pickle, PDF. I suspect it's due to an extension.

@davidfischer
Copy link
Contributor

As far as I can see, the current setup produces around 240 MB out outputs.

The size of the output does not necessarily yield larger memory consumption. We have projects that write very large output but consume normal amounts of memory. 240MB of memory would be totally fine but something is consuming large amounts. I haven't profiled it.

I suspect it's due to an extension.

Seems likely.

@Paebbels
Copy link
Author

How can I profile it?
I have a PyCharm installation. Can I start Sphinx (the Python module) directly without the sphinx-build.exe wrapper in PyCharm? I know how to setup environments and other settings for such runs.

@davidfischer
Copy link
Contributor

At the simplest, you could just monitor its memory consumption with top or the like. That will tell you whether it is using a lot locally. As to figuring out what is taking all the memory, you might have to use cProfile or something similar.

@Paebbels
Copy link
Author

@davidfischer how many memory can be used in a RTFD run?

Here are some numbers for the Python Infrastructure part. So this is measured without all the documentation for the IP cores.
Reading sources: ~100 MB
Writing results: ~150 MB
This was measured with Windows task manager for python.exe; build target was set to html.

The generated HTML output is around 70 MB in size. The index lists ~6,800 entries.

PyCharm lists these project statistics:

  • 14 packages
  • 133 modules
  • 751 classes
  • 25,300 LoC / 15,300 source lines / 5,800 comment lines / 4,100 blank lines

The project makes heavy use of inheritance.


Is this still a mid-sized project or is there something big in it?

@davidfischer
Copy link
Contributor

I'm going to give the project a build and see what I can see. In general, everything you're saying seems pretty normal although that is pretty large in terms of classes for autoapi. The out of memory error happens on the singlehtmllocalmedia builder and not the regular html one so I'm going to check that as well.

Our default memory limit is ~250MB (I believe it's a soft limit at 200MB and a hard limit a bit above that).

@davidfischer
Copy link
Contributor

davidfischer commented May 15, 2018

The single HTML builder uses significantly more memory in my tests. The regular HTML builder was ~250MB but the singlehtml builder peaked at 620MB.

I know there is an effort to make the singlehtml builder optional in #3220 but that isn't done yet.

The short term solution here is to just increase your memory limit. I'm going to do that now to 750MB. I think this memory consumption is just do due to the large size of your project all being included via autoapi. If you give your build another try, I expect it to succeed.

@Paebbels
Copy link
Author

Paebbels commented May 16, 2018

Wow, that's a lot of memory! Thanks for investigating.

That would be great to turn it off. We are fine with PDF (or ePub) downloads. The single file HTML version still can not be used standalone, because it needs java scripts and images ... (at least my output directory contains more than just one HTML file).

@Paebbels
Copy link
Author

I started the build process again. It shows another error but still related to not enough memory. It's a libc error directly from the system bubbling through Python.

Exception occurred:
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

Source: https://readthedocs.org/projects/poc-library/builds/7196859/

As far as the log looks, it was short before finishing.

Any ideas?

@davidfischer
Copy link
Contributor

Jeez. That looks like an out of memory issue on the docker container as a whole.

@Paebbels
Copy link
Author

Paebbels commented May 18, 2018

I'm back from a business trip, so I can go on and split the code into two repos.

Any ideas, when #3220 can be used?
Could this help to improve the RTFD environment: readthedocs/readthedocs-docker-images#29?

I could use Travis-CI to build the documentation, but I cannot deploy to RTFD, right?

@davidfischer
Copy link
Contributor

I could use Travis-CI to build the documentation, but I cannot deploy to RTFD, right?

That is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants