-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hosting: manual integrations via build contract #10127
Conversation
This is a really interesting example. I like how it basically just duct tapes the pieces together, and doesn't require any user interaction. There's a bunch of design discussions downstream of this, but just having an example of how this might work is great. It's a lot less hacky if we ship Let's chat more about this tomorrow, since I don't know what level of feedback is particularly useful at this stage, but I think we can get pretty close to something we could share with users as a demo at least. |
Instead of prepending all the commands with the `PATH=` variable, we pass the environment variable directly to the Docker container. This allow us to run multi-line shell script without failing with weird syntax errors. Besides, the implementation is cleaner since all the environment variables are passed to the commands in the same way. I added some _default paths_ that I found by checking the local Docker container. I'm also passing the users' path, depending if we are working locally as root or in production. This is not 100% complete and there may be some other issues that I'm not seeing yet, but I think it's a first step to behave in a way our users are expecting. Closes #10103
Locally it tries to reverted back 🤷
…s/readthedocs.org into humitos/hosting-integrations
X-RTD-Hosting-Integration: true/false This can be used from CloudFlare to decide whether or not inject a `<script>` into the resulting HTML or not.
We touched on this in our call already, but I want to make my position clear about this here. IMO, we have to force the injection of our own JavaScript file. Otherwise, many projects won't include our JavaScript and most of the good features we offer won't be present in their documentation. Even people wanting to integrate with us, will have a bigger barrier to do it. In particular, commercial customers. Besides, on the community side, most projects won't have ads --which is not a minor. I think if we don't inject the JavaScript automatically, we won't differentiate ourselves from GitHub Pages or any other plain-text hosting, I'd say. |
I updated the description of this PR with a lot more context and a more polished idea of how all these pieces fit together. |
Overall, this implementation looks great, especially considering the boundaries on goals. I think it's helpful to get some experimentation here on the backend implementation, and have some direction at a high level on how we'll inject JS. All of my questions, and all the places I want to put a bit more thought on patterns and API design are in your v2 list. The goals seem great, and for that, I'd 100% skip the V2 parts of the JS implementation -- HTML templates, API overrides, etc. This is where I'll have the most technical input, there are features from Webpack we'll use here, and extra libraries we'll rely on. It's still fuzzy to me why a Do you have a better idea now of what configuration options we need to be in an contract file? The features list in the contract file stand out still -- and I'm just talking in context of a contract file, not feature enabling/disabling API design or patterns for the next iteration of this work. A contract file may help with reproducible builds somewhat, but there's also very real cases where configuring features at the request level will be much better UX:
I see some benefit to controlling features at the request level and configuring them in the admin dashboard. We'd use the same data structure, but included in the documentation via a
I'd agree, and I think we can very easily keep the footprint of this file very minimal as well. |
readthedocs/doc_builder/templates/doc_builder/readthedocs-data.html
Outdated
Show resolved
Hide resolved
This space is reserved for all the configuration that only the doctool knows at build time (or that needs to be generated dynamically). I understand why you are not finding it required and that's probably because all the data that I put as example there is not mandatory. We need more feedback from other doctools (and users!) here to find out if there are things we strictly require from them. I don't have that information yet and we should keep trying out other doctools as I was doing previously and keeping this in mind. For now, I refactored a little the
@ericholscher and I briefly touched on this idea on Thursday. I thought it could be a good idea to have the ability to decide things dynamically at request time. However, I didn't come up with good arguments while talking about this. Also, Eric mentioned it was simpler to just serve a I think the questions you are rising here are important and may make us to use an endpoint as you are suggesting. If that's the case tho, all the That said, I understand that you agree with 80% of this proposal --without considering the specific JavaScript technical aspects. That missing 20% would be:
Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really simple, beautiful implementation. I think we can trim down the amount in this PR already, and get something shipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to ship this with the app, or version it directly on S3, like we do with the ad client? I think keeping it deployable outside of the application seems right to me.
https://github.com/readthedocs/ethical-ad-client/tags
https://media.ethicalads.io/media/client/v1.4.0/ethicalads.min.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought too much about this and I don't have experience deploying the script with a different process than the normal one. I'm not sure about the pros/cons here.
I put this file here because we need it for development as well. We could just put this file in the local MinIO S3, tho, as well.
Note this file is generated by running npm run build
from another repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to also think about what version are we going to serve by default. The latest one? Would people be able to pin to a particular version? Are we going to support multiple versions at the same time? How do we deploy new features to those that are pinned to an older version? Do we care? Too many questions 🤷🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we always deploy the latest, but if the goal is for other people to integrate the library, then it can be versioned based on their needs. The versioning is primarily valuable for strict SRI protection, like PyPI does, to validate the hash of the library hasn't changed for security reasons.
I think we should definitely deploy the client outside of our application. We don't need to decide on a proper deployment pattern yet though, but I think we should keep it out of the application from the start.
For now, we can just manually upload it to S3, and use that everywhere?
# Copy the YAML data into `Version.build_data`. | ||
# It will be saved when the API is hit. | ||
# This data will be used by the `/_/readthedocs-config.json` API endpoint. | ||
self.data.version.build_data = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to actually parse the data into memory, or just store the file contents directly as a string? I think we just want to store a string to start? I'd like to avoid as much YAML parsing as possible...
I also wonder if we should make this file JSON, instead of YAML? If the goal is for it to be aligned with the JSON data returned via the API, I think that makes more sense. But if it's closer to our .readthedocs.yaml
config, then YAML makes sense 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to actually parse the data into memory, or just store the file contents directly as a string? I think we just want to store a string to start? I'd like to avoid as much YAML parsing as possible...
I think we will need to parse it so we can validate it at some point, anyways.
This also allows us to use a JSON field in the database that we can query in the future, looking for answers.
I also wonder if we should make this file JSON, instead of YAML? If the goal is for it to be aligned with the JSON data returned via the API, I think that makes more sense. But if it's closer to our .readthedocs.yaml config, then YAML makes sense thinking
I decide to use YAML here on purpose. It's a lot easier to write than JSON, a lot less nit picking (e.g. requires no trailing comma in the last element of a list), supports comments, works better with more data types, and others. The structure is going to be just a dictionary, YAML is going to be just the representation/serialization of it
In particular, being able to put comments in .readthedocs.yaml
is important for ourselves and our users as well. That was one of the reasons why I picked YAML for this file as well. Otherwise, you end up with things like this in JSON:
readthedocs.org/readthedocs/proxito/views/hosting.py
Lines 36 to 39 in 48de597
"comment": ( | |
"THIS RESPONSE IS IN ALPHA FOR TEST PURPOSES ONLY" | |
" AND IT'S GOING TO CHANGE COMPLETELY -- DO NOT USE IT!" | |
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea.. let's use YAML for anything human-writable, and JSON for machine-writable 👍
readthedocs/proxito/views/hosting.py
Outdated
# TODO: why the UnresolvedURL object is not injected in the `request` by the middleware. | ||
# Is is fine to calculate it here? | ||
unresolved_url = unresolver.unresolve_url(request.headers.get("Referer")) | ||
version = unresolved_url.version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humitos I see we're using the referer here. Is this because we don't want to pass in the domain explicitly as a GET arg? I think we likely want to actually use the request.host here, instead of a header that the browser is setting.
Not strong preference. We can use something different here. I used the Referer
because it was what I had at hand and easy to get.
A url=
GET attribute probably seems fine as well, since it's what we are using in other API endpoints (analytics and embed api, for example). Referer also makes testing harder since "copying and pasting a URL doesn't work" --I've ended up hardcoding the URL in the code because of this 😅
I think we should combine both, url=
GET attribute with a validation with request.host
on that URL.
# TODO: define how it will be the exact JSON object returned here | ||
# NOTE: we could use the APIv3 serializers for some of these objects if we want to keep consistency. | ||
# However, those may require some extra db calls that we probably want to avoid. | ||
data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're re-creating the footer API, but with JSON instead of HTML. That seems reasonable, but I do think we'll want to think more about what this API is returning.
It seems like we actually want the client to be able to tell us exactly what data points they want? Something similar to what we're doing with apiv3 with the expand
and fields
parameters? That way we give the caller control over how much data they want...
That might be an over-optimization at this stage, but I think we want to design this with expandability and performance in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we'll want to think more about what this API is returning.
Yes. This is one part of the design decisions I mentioned we have to sit down, talk together, think about and decide what's the best structure.
It seems like we actually want the client to be able to tell us exactly what data points they want? Something similar to what we're doing with apiv3 with the expand and fields parameters? That way we give the caller control over how much data they want...
This is a good point to start considering and thinking about, yeah. However, I don't think we will require that complexity because:
- the API endpoint will be cached
- most of the values are static
- there are a few db queries only (
Project
andProject.versions(active=True)
) - we could decide whether or not return some specific data based on the feature being enabled. If
non_latest_version_warning
is disabled, we don't calculate/query nor return the list of active versions this feature consumes in the client.
In any case, I think we will arrive at the best answer by exploration, experimentation and brainstorming. Also, once we deploy this, we will have some data in NR to know how much time this endpoint takes on a real production db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I don't think the performance of this approach is inherently bad, but more that we might want to make this look more like an API for additional UX options in the future. No user will expect passing GET args to a .json
file. However, if the file doesn't grow too large, always returning all of it too bad. But I do expect we'll feel constrained by the design in the future, as we want to make this endpoint smarter, so starting with an API instead of a JSON file seems like the proper design to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm fine shipping this as a test for now.
But I do think over time if this is the "one endpoint to get all the data you need in the page", there will be demands on this functionality that are resource intensive, vary by user, etc. That is where we'll start to get value from the API approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the .json name in the URL endpoint. We should definitely change that as first step.
I will think a little more about the expand argument. I don't have a strong opinion on that yet 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we get to discussing the API structure in the next phase, this would be a good conversation. This API could help us avoid APIv3 complexity.
Also on the difference between this and our API, would it make sense to use DRF APIv3 resources directly in the response here, or make this an APIv3 endpoint? This way weren't not making a 4th API version of sorts. The response for versions could be identical to the APIv3 response for versions as well.
We'll come back to the data security question around private projects/versions pretty quick here, but we need to already for this API endpoint and private projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good to ship this initial experimentation next week so we can play with it in production.
Alternatively, we could just use defaults in the readthedocs-client
, and not require a server response for initial testing? We could also just return some JSON data via a CF Worker to make it faster to iterate with this, instead of requiring a full Django deploy.
I'd prefer if we can get this from the server itself. This will also help me to understand how much time this endpoint takes to return via NR and also confirm that everything is on it's place.
This is a good option in case we need it, yeah. I'm not sure how many times we will be changing this response. It's definitely gonna change, and big, probably. But maybe it changes once, it's done. The main changes are gonna happen in the js client, without requiring too much changes in the API response, I'd guess. In any case, we can keep doing all the test of these stuffs locally while developing. We can also deploy more often if required 😉 |
But then I can't play with it :) I guess there's always ngrok, or I just get it running locally :) |
He he. Note that you can add/update data from the build process itself, by creating a YAML file 😉. That will give us some extra control to perform changes in the API response without deploying 😏 |
So, we can change the js client by just uploading a new js file to S3. We can change the API response by generating a YAML file in the build process |
…mitos/hosting-integrations
Latest updates:
I'm going to merge this PR once all tests pass. |
I don't understand why I'm receiving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one call to open that we should replace with safe_open
before deploying this change
return | ||
|
||
try: | ||
with open(yaml_path, "r") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use readthedocs.core.utils.filesystem.safe_open
here
# TODO: why the UnresolvedURL object is not injected in the `request` by the middleware. | ||
# Is is fine to calculate it here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the unresolved domain is injected for all proxito views, the unresolved URL is needed only when serving docs, so you need to call it manually only when you need it. And you are parsing the URL from a query param, so you need to call it manually anyway.
Integration and design for the future of the Read the Docs features integrations.
Requires generating the
readthedocs-client.js
from (https://github.com/humitos/readthedocs-client) and copying to this repository and then collecting the statics:cp ../readthedocs-client/dist/readthedocs-client.js readthedocs/core/static/core/js/readthedocs-client.js inv docker.manage 'collectstatic --noinput'
Goals
readthedocs-sphinx-ext
,readthedocs-sphinx-search
)READTHEDOCS
environment variables)Workflow
build.jobs
or custom commands viabuild.commands
) 1$READTHEDOCS_OUTPUT/html/readthedocs-build.yaml
file at build time with a pretty known structure (the "build contract" that we need to define, but we have an idea of what fields are required already) 2readthedocs-build.yaml
and stores its data into theVersion
object. This will be used to communicate data from the builder to the frontend.readthedocs-client.js
)readthedocs-client.js
file loads all the extra required JS and CSS files dynamically./_/readthedocs-config.json
) to get all the config required to perform the hosting integrations. The data generated at build time viareadthedocs-build.yaml
is also returned by this endpoint. Note that some of this "hosting integrations" are already implemented in our Sphinx extensions and they have to be re-written into Javascript. I migrated some basic one for now and put them into the client repository as an example.Integration points from a user's perspective
Nothing.
Optionally, the user could configure the build process to generate the
readthedocs-build.yaml
at build time with a specific structure (to be defined) to communicate some extra data that's only known at build time.v2 and future projections
semver
)readthedocs-build.yaml
and enforce a specific structureRelated
build.commands
#9755External related issues
Footnotes
Read the Docs pass data to the doctool via environment variables. Exactly as we are doing now. However, we may want to add some extra variables (see Build: expose VCS-related environment variables #9423) ↩
This YAML file is the way for the doctool to pass back data to Read the Docs and configure its integrations ↩
It's done at NGINX for now, but it can be feature flagged and done at CloudFlare ↩