-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log git checkout
and expose to users
#3520
Conversation
@@ -173,7 +189,7 @@ def checkout(self, identifier=None): | |||
self.fetch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably needs to pass the env
here, also 😞
I think we should be able to set a per-command option to choose whether to record that call ever. It seems like we haven't needed that in the past, but for the VCS stuff we'll want to have a subset of things that aren't recorded. One question, is there a reason we really don't want all the VCS commands recorded? It might be extra info, but it doesn't seem inherently bad to show them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove the BaseCLI:run()
command from the VCS tools, and move to always just proxying those commands to the Environment:run()
. We really shouldn't have two different ways of running commands in our build process, and the environments are a better abstraction.
readthedocs/projects/tasks.py
Outdated
@@ -593,7 +593,7 @@ def update_imported_docs(version_pk): | |||
version_slug = version.slug | |||
version_repo = project.vcs_repo(version_slug) | |||
|
|||
ret_dict['checkout'] = version_repo.checkout(version.identifier) | |||
ret_dict['checkout'] = version_repo.checkout(version.identifier, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the setup_env
here, or could we just create a new vcs_env
inside checkout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we could pass it into the original call to project.vcs_repo(version_slug)
in the line above, and stored on the repo object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting this btw, just thinking out loud. Is it worth having a vcs_env
we use for all commands, or is passing in the setup_env
correct? I'm not sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was to just use the setup_env
. The difference between setup_env
and build_env
is just that one is executed on the host, and one in the vm. Separating this further might make for more places to check for env exception vs env failure vs env pass as well, so reusing setup_env
ends up cleaner too.
Do you have a scenario that you think it might help to have a separate env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run()
method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env
is important for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup_env
is not used for VCS checkouts since they are using their own run
. In fact, the function/task didn't receive the setup_env
, I had to added to it.
That could be a problem because that task (update_imported_docs
) is called from a couple of places where a setup_env
doesn't exist (e.g. from core/views/hooks.py
). This could be one of the reason of why a vcs_env
could be a better approach: we could create it inside the Project.vcs_repo
instead of passing around through all the code or forcing other calls to create this environment just to log the commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that we need to consider if we want to create a new vcs_env
is that the envs are created with this dependencies:
self.project
self.version
self.config
(from YAML)self.build
record
env_vars
. Fromget_env_vars
, which depends onself.version
self.project
self.config
In case we want to create the environment inside the Project.vcs_repo
will need to consider this dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at this, and I found that the only missing piece here is the build_pk
(originally passed from core.utils.trigger_build
after creating an Build object).
Solution 1
So, I'd say that this method update_imported_docs
could receive the version_pk
and build_pk
(but when called from UpdateDocsTask
we will be repeating a couple of actions like parsing the YAML and more) and create our own vcs_env
for this. If build_pk
is None
we won't record
(False
) the commads since we don't know where to record them. This will allow other uses of this function to keep working.
Solution 2
Refactor all the things!
- move
update_imported_docs
as a method ofUpdateDocsTask
- fix
test_celery.py:test_update_imported_docs
- refactor/remove
core.view.hooks._build_url
function that uses the functionupdate_imported_docs
- refactor/remove
github_build
,gitlab_build
andbitbucket_build
functions that use the_build_url
but the docstrings say those functions are deprecated in favor ofreadthedocs.restapi.views.integrations
classes (these classes usecore.views.hooks._build_version
which usestrigger_build
which usesUpdateDocsTask
--so we are safe)
I think the last solution is a better plan, but I may missing a lot of things in the middle :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We found that there are users using the old webhook endpoints. Only projects with Feature.ALLOW_DEPRECATED_WEBHOOK
can hit these ones. But anyway, we can't remove them yet.
So, the refactor would include changing calls to projects.tasks.update_imported_docs
,
by core.utils.trigger_build
which will end up calling UpdateDocsTask
in a proper way and "everything should keep working"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need think more about why we are calling into this function from other places, and build a supported API for that use case.
I believe all the other calls are so that we can re-sync the checkout on disk and the DB (to update versions on a push, for example). We should find a way to make that supported explicitly in the build tooling I think, so that we don't have to maintain a bunch of different logic. I think the best approach here is to have some kind of mechanism on the doc builds to say "Do a resync of all the on-disk stuff, but don't actually build the docs", which is mostly what your Solution 2 is talking about.
readthedocs/projects/tasks.py
Outdated
@@ -335,7 +335,7 @@ def setup_vcs(self): | |||
self.setup_env.update_build(state=BUILD_STATE_CLONING) | |||
|
|||
self._log(msg='Updating docs from VCS') | |||
update_imported_docs(self.version.pk) | |||
update_imported_docs(self.version.pk, self.setup_env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see kwargs here. I'd love to get to a world where we're explicitly using kwargs in all function/method calls, at least until we can add type checking to the calls :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also just drop this arg. Other methods use the class local self.setup_env
, so it might make sense to continue assuming this state exists.
I don't particularly like this pattern, but its at least consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_imported_docs
is a function outside this UpdateDocsTask
class (it's used from other places). See #3520 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is a huge improvement, I'm really excited to have this output included and unified with the rest of the command output finally!
I'll echo the sentiment to continue cleaning up the vcs backend code and completely remove the secondary run()
method. I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?
readthedocs/projects/tasks.py
Outdated
@@ -335,7 +335,7 @@ def setup_vcs(self): | |||
self.setup_env.update_build(state=BUILD_STATE_CLONING) | |||
|
|||
self._log(msg='Updating docs from VCS') | |||
update_imported_docs(self.version.pk) | |||
update_imported_docs(self.version.pk, self.setup_env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also just drop this arg. Other methods use the class local self.setup_env
, so it might make sense to continue assuming this state exists.
I don't particularly like this pattern, but its at least consistent.
readthedocs/projects/tasks.py
Outdated
@@ -593,7 +593,7 @@ def update_imported_docs(version_pk): | |||
version_slug = version.slug | |||
version_repo = project.vcs_repo(version_slug) | |||
|
|||
ret_dict['checkout'] = version_repo.checkout(version.identifier) | |||
ret_dict['checkout'] = version_repo.checkout(version.identifier, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run()
method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env
is important for this feature.
code, out, err = build_cmd.exit_code, build_cmd.output, build_cmd.error | ||
else: | ||
code, out, err = self.run('git', 'checkout', | ||
'--force', '--quiet', revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your screenshots, --quiet
is not what we want it seems. The output without --quiet
was really helpful. So 👍 on removing that from the command.
One more thought:
We probably won't want to support this. In the case of our commercial hosting, this step is specifically kept out of the Docker build environment so that the build environment has no read access to the source deploy keys. |
@agjohnson I'd really like to run the VCS calls inside Docker on the .org though, so I think having it be configurable by a setting or similar is good. |
I'm working on running all the commands inside a
|
I just updated this PR with several changes implementing the idea that I commented you both. This is not a final solution but some code to discuss over something concrete. Summarizing, the idea is to have a This code will probably needs some polishing and arrangement between us, but it shows the idea I think. @agjohnson screenshot for all the VCS commands recorded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments where I have doubts or where I wanted to explain a little more about that change.
Anyway, this code still need work for sure but I will wait for your comments before continue working on this.
def _log_warn_only(self, msg): | ||
log.warn(LOG_TEMPLATE.format( | ||
project=self.project.slug, | ||
version='latest', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this, but I don't want to depend on a version
instance to just get the slug. I would like to think in a better way to implement it, though
# :'( | ||
log.warn(LOG_TEMPLATE.format( | ||
project=self.project.slug, | ||
version=self.version.slug, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can use the version object because the *BuildEnvironment
are instantiated with this object. No real problem here.
@@ -163,7 +163,7 @@ def save_environment_json(self): | |||
with open(self.environment_json_path(), 'w') as fpath: | |||
# Compatibility for Py2 and Py3. ``io.TextIOWrapper`` expects | |||
# unicode but ``json.dumps`` returns str in Py2. | |||
fpath.write(unicode(json.dumps(data))) | |||
fpath.write(json.dumps(data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone proposed to use six.text_type
, but I didn't try it yet
""" | ||
|
||
max_retries = 5 | ||
default_retry_delay = (7 * 60) | ||
name = __name__ + '.update_docs' | ||
|
||
# TODO: the argument from the __init__ are used only in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted this because I found these non-required attributes kind of confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.
readthedocs/projects/tasks.py
Outdated
version_repo = self.project.vcs_repo( | ||
self.version.slug, | ||
# When ``sync_only`` we don't a setup_env | ||
getattr(self, 'setup_env', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also kind of hacky: sync_repo
can be executed before (sync_only=True) or after the run_setup
(the one that creates the self.setup_env
we want to use here in case it exists)
@@ -21,7 +22,7 @@ class Backend(BaseVCS): | |||
|
|||
def update(self): | |||
super(Backend, self).update() | |||
retcode = self.run('bzr', 'status')[0] | |||
retcode = self.run('bzr', 'status', warn_only=True)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is used to check the existence of the repository and call pull
or clone
for example.
@@ -66,21 +66,21 @@ def checkout_revision(self, revision=None): | |||
revision = 'origin/%s' % branch | |||
|
|||
code, out, err = self.run('git', 'checkout', | |||
'--force', '--quiet', revision) | |||
'--force', revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are exposing the output to the user, I removed all the quiet
arguments.
@@ -23,7 +24,7 @@ def pull(self): | |||
(pull_retcode, _, _) = self.run('hg', 'pull') | |||
if pull_retcode != 0: | |||
raise RepositoryError | |||
(update_retcode, stdout, stderr) = self.run('hg', 'update', '-C') | |||
(update_retcode, stdout, stderr) = self.run('hg', 'update', '--clean') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expand the arguments to use the long form which is clearer.
|
||
""" | ||
Base for VCS Classes. | ||
|
||
Built on top of the BaseCLI. | ||
VCS commands are ran inside a ``LocalEnvironment``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, could be any Environment
not just Local
. By default, is LocalEnvironment
environment = os.environ.copy() | ||
|
||
# TODO: kind of a hack | ||
del environment['PATH'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to delete this one because there is an assert
in the code that doesn't allow the PATH
in the environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best anyways? I think we want tight control over this.
Now that I've seen how many commands are raised to the user, I feel like there is perhaps some filtering of commands that we need to do. I think it's important to reduce redundancy in the implementations and use a central execution pattern, but it probably isn't useful for users to see all of these commands. For instance, if we do a This will probably also get around the issue that a command like |
To add another place where we are running commands outside a Build and even outside an Environment, I just found this: (used for symlinks, for example) |
Yeah, that's the other one that needs to be unified. Let's address that separately though. I'll open a ticket for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I could use some more explanation around moving the update docs task into the main build task, as it seems like a conditional use of the main task. Maybe all that is required here is an abstraction though.
We already discussed the addition of a record
on run()
, but i agree the verbose cli options is a necessary addition for all of the vcs commands. It makes the user view more obvious and makes it easier for development as well.
|
||
class LocalEnvironment(BaseEnvironment): | ||
|
||
# TODO: BuildCommand name doesn't make sense here, should be just Command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this meaning has changed now. 👍 on changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, maybe this is a later refactor after all. If we generalize the naming, we should probably also move out of doc_builder/
app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a ticket for this: #3541
kwargs['environment'] = self.environment | ||
|
||
# TODO: ``build_env`` is passed as ``kwargs`` when it's called from a ``*BuildEnvironment`` | ||
# kwargs['build_env'] = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this comment, but I assume this note makes more sense for you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to leave that comment to discuss a little about this.
build_env
was self
, but now if it comes, it's from the overriden run
method from the class that is calling this method.
Sounds a little elaborated, that's why I'd appreciate your thoughts on this.
self.commands.append(build_cmd) | ||
build_cmd.run() | ||
|
||
# TODO: I don't like how it's handled this entry point here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your concern here? I can see how warn_only
would be confusing as we're now considering several different return states:
- Command passed, report
- Command failed, report
- Command failed, report and fail the build
- Command passed, don't report
- Command failed, don't report
- Command failed, don't report but fail the build (this probably shouldn't be used ever, but maybe there are steps that should raise a general error?)
- Command failed, but report as pass? (I don't know if this still exists though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO comment was moved accidentally. It belongs to this line: https://github.com/rtfd/readthedocs.org/pull/3520/files/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8#r163242699
I agree with the return states, but I'd say that the last two shouldn't exist at all. The rest, are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command failed, but report as pass? (I don't know if this still exists though)
This is tricky, but it does exist. Although, we are not using it but maybe we need it.
For example, hg tags
command. I suppose that we want to record=True
it but we want to warn_only=True
also, since it could fail but we don't want the Build to fail. Besides, if it's recorded and failed (exit status != 0) it will be presented in red to the user and we don't want that.
Is that a good example of this use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sumarizing, I think we want the warn_only
commands to be recorded but don't interfers in the build status nor in the command itself (when presented to the user).
Maybe what we need to change is the BuildCommandResultMixin.successful
property to something like:
return self.exit_code == 0 and self.warn_only == False
but that would involve to save the warn_only
in the database also and I don't know if this is getting more and more complicated :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible idea is that if warn_only
we record it, but we set the exit_code
as 0
(this could end up with other problems like the Latex build fail and we will never know it, even when opening the Build interface since it won't be marked as failed -in red).
So, add another attribute? warn_only
and force_success
😛 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Its sounding like we also want a force_success
option as well here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, I just realized that we need something like record_as_success
instead of force_success
because we are using git status
exit code to decide if clone
or pull
, so we need the real exit code in the flow but we want to save a fake one in the databse...
if not warn_only: | ||
# TODO: maybe receive ``record`` as an attribute for skip/record | ||
# just specific commands but not all of them ran under the | ||
# *BuildEnvironment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note sounds like we want a record
kwarg on command execution? I don't quite understand the comment on *BuildEnvironment
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe.
Yes, I was thinking on having the record
attribute passed in the run
method to override the one from the __init__
from *BuildEnvironment
classes (now we have LocalEnvironment
and (Local/Docker)BuildEnvironment
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i think a record
arg on run()
makes sense. Given __init__(record=False, ...)
, and run(record=True, ...)
, what do think we should do? I'd probably say don't record if __init__(record=False)
, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to follow what run(record=)
is set independently how was the class initialized.
Why? Because sometimes the LocalEnvironment
is initialized far away (in another file and just passed as argument) from the code where the command is ran and you want/need to override the default record.
I implemented this at https://github.com/rtfd/readthedocs.org/blob/e1766a4d1061f7e97c75039b84a196088b111164/readthedocs/doc_builder/environments.py#L310-L312
# TODO: do we want to save commands that FAILED but not raised and | ||
# exception? This will cause the first `git status` (when | ||
# importing) to fail and be marked with RED in the Build command | ||
# details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised in general review of this, I'd say having a record
option that wouldn't record this command at all is a better pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both: record=False
and warn_only=True
In fact, I'm thinking that record=False
should implicit mean warn_only=True
because the False
case doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. I think you're right.
""" | ||
|
||
max_retries = 5 | ||
default_retry_delay = (7 * 60) | ||
name = __name__ + '.update_docs' | ||
|
||
# TODO: the argument from the __init__ are used only in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.
readthedocs/projects/tasks.py
Outdated
""" | ||
Run a documentation build. | ||
Run a documentation sync or sync n' build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does sync
mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync
means: "update docs from VCS (sync tags/branches and trigger stable
build if the stable version has changed)"
Where sync
is not a good summary for that :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger, this makes sense.
We do describe a lot of things as syncing
(we also overuse the words backend
and environment
). This doesn't need to change though, I just was looking for some clarification to help understand the changes.
readthedocs/projects/tasks.py
Outdated
@@ -149,9 +145,15 @@ def run(self, pk, version_pk=None, build_pk=None, record=True, | |||
self.build_force = force | |||
self.config = None | |||
|
|||
if sync_only: | |||
self.sync_repo() | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding my note above on the significance of sync
, is there a strong reason to make this task conditionally operate like this? It seems like this should be a separate task instead?
If the purpose of this is code reuse, perhaps a mixin class that two tasks can share would make more sense.
If the sync operation requires the same amount of arguments/etc, then it's probably not a strong case for task isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I talked to @ericholscher, and I saw it very clear at the beginning while we were talking, but the implementation is not as good as I imagined.
I wanted to avoid the needing of handling the LocalEnvironment inside the update_imported_docs
and also try to avoid the "jump to a middle step of the flow": update_imported_docs
is like the third step of the whole UpdateDocsTask
and was used from outside also -I didn't like that.
Besides, the use of this sync_only
from outside was done only from https://github.com/rtfd/readthedocs.org/blob/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8/readthedocs/core/views/hooks.py#L106 and that endpoint is deprecated. So, I'd say that we shouldn't be calling this anymore in the future.
Regarding your proposals and considering what I wrote on this comment, maybe a Mixin class from where two different Task class inherit is the best approach: we share code, it's cleaner and easy to refactor/remove when finally we don't use the sync task alone anymore.
I will work on this implementation and see how it goes, but it should work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a commit with this change at c79f9f1
Let me know if it's clearer for you now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Another option could be to start using celery's builtin task chaining. Without looking deeply into this I don't know if this would clear up any confusion around the patterns we're using here. For now this accomplishes what we want however, perhaps moving to more common celery patterns can be a later batch of work.
try: | ||
# Hit the API ``sync_versions`` which may trigger a new build | ||
# for the stable version | ||
api_v2.project(self.project.pk).sync_versions.post(version_post_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I waffle on making this API driven, or just using executing this as a broadcast task directly. It seems calling the API is an unnecessary level of abstraction. Do you have any strong opinions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't.
I didn't write this code, I moved from update_imported_docs
and I don't fully understand it --I just added the comment :) So, I don't have a strong opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is historical significance here either. We should probably just address in a separate issue.
environment = os.environ.copy() | ||
|
||
# TODO: kind of a hack | ||
del environment['PATH'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best anyways? I think we want tight control over this.
# exception? This will cause the first `git status` (when | ||
# importing) to fail and be marked with RED in the Build command | ||
# details | ||
self.record_command(build_cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like that because to record a command we need to be inside a build, but this LocalEnvironment
knows nothings about a build. So, for this class I override it with just a pass
and the *BuildEnvironment
has its own to really record the commands
self.commands.append(build_cmd) | ||
build_cmd.run() | ||
|
||
if record and not warn_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saving the warn_only
commands involves another problem: "latex command are not being recorded" :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of warn_only
should produce two outcomes for commands:
warn_only=False
means that commands that fail will raise exceptions and halt the buildwarn_only=True
means that commands that fail will be recorded as failing in the database, but an exception won't be raised. The build will continue after this.
So, I don't think we want warn_only
to imply that a task is not recorded, record=False
should be used explicitly if we want a command that can fail and doesn't need to be recorded.
67c74fa
to
d83515f
Compare
In case we want to apply the same method that I used at #3560 (applying the styling in another PR to simplify the review), I wrote this command to do that in an easier way:
It runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'm going to give this some QA before merge, but I just raised a couple of questions on some of the notes we left in code. We can probably update this to make sure our future selves know exactly what needs to change.
""" | ||
if record is None: | ||
# ``self.record`` only exists when called from ``*BuildEnvironment`` | ||
record = getattr(self, 'record', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised in another comment of mine, it seems we probably want to respect the class record
if it is False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, once you initialize the class with False
there is no way to change the behaviour for a specific command. I supposed that we want to override the default value for a specific command.
This would be confusing from my point of view:
- initialize the class with
False
- call this method with
record=True
- it won't be recorded :/
To me, it's the same (opposite) case as "initialize the class with True
but this and this command I don't want to record, so I call it False
"
@@ -310,7 +416,9 @@ def __init__(self, project=None, version=None, build=None, config=None, | |||
self.environment = environment or {} | |||
self.update_on_success = update_on_success | |||
|
|||
# TODO: remove this one, comes from super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything blocking this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Done
if record: | ||
# TODO: I don't like how it's handled this entry point here since | ||
# this class should know nothing about a BuildCommand (which are the | ||
# only ones that can be saved/recorded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we discussed this already, so apologies if we're rehashing this :)
What would be a better pattern to use here? Or is there something more specific we can assign to ourselves with this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we don't have an strong opinion in how to improve this but I was thinking on something like how the signal works: pre_something
, post_something
, etc. The behaviour will be the same, but it will be correctly named instead of using record
for something that know nothing about recording the commands.
So, we could call methods like pre_run_command
and post_run_command
so inherited classes can do whatever they want (in this example, the post_run_command
will record it)
What do you think? In case you like it, another PR or the same one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting thought, I do like the plan. I don't think we need to do this now, but we could always turn this todo into a ticket. I have a "Refactoring" milestone that we can assign this to if so -- this would be a great place for people to jump in and clean up code.
These are two separated tasks that share some code by inheriting from SyncRepositoryMixin. The final goal (as ``core.views.hooks._build_url`` is *DEPRECATED*) is to finally remove the SyncRepositoryTask and merge that code into UpdateDocsTask.
Either `record=False` or `warn_only=True` commands are not recorded. This attributes can be set at initialization time or when calling the `run()` method.
We need the real exit_code for some command since there are decisions based on that code, but we want to record it as success. So, the fake exit_code is saved in the database but the real exit_code is used in the whole flow of the building process.
83d2c26
to
30b7cef
Compare
I've been taking a look at the code and I found some consideration that we need to take a look:
BaseVCS
class has it ownsrun
method that works in a similar way thanBuildEnvironment.run
log_tmpl
string differscwd
pathBaseVCS
are not logged as aBuildCommand
BuildCommand
needs to be instantiated withbuild_env
to make be able to.save()
/record the command for the currentBuild
object.build_env
could beLocalEnvironment
orDockerEnvironment
Depending on how much we want to refactor, a minimalistic solution would involve:
setup_env
toupdate_imported_docs
version_repo.checkout
checkout
consider use if thatenv
if notNone
env.run
with propercwd
This is the what this PR implements but, this leaves a mix of calls to
run
method betweenBaseVCS
andLocalEnvironment
for the same purpose: update the repository's code. So, maybe we should consider to just pass thesetup_env
to the init ofBaseVCS
and execute everything undersetup_env
.The problem of using always the
setup_env
(or maybe a new one calledvcs_env
) is that therecord
affects all the commands ran under that env and we want to record onlyclone
andcheckout
commands --therun
method could receive an optionalrecord
to override the default behaviour.It's a good starting point to discuss a little about this.
Closes #3397