Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task to remove orphan symlinks #3543

Merged
merged 7 commits into from
Mar 23, 2018
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Jan 23, 2018

This is an initial implementation of this task to remove orpahn symlinks, which should be ran periodically every 1h aprox.

In the ticket, I also suggested that the Edit action doesn't make sense to me and should be reconsidered.

Finally, if we consider to remove the Edit action, this task should be run just once and should be enough.

Closes #3493

for domain_path in [symlink.PROJECT_CNAME_ROOT, symlink.CNAME_ROOT]:
for domain in os.listdir(domain_path):
try:
Domain.objects.get(domain=domain)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach on the implementation, making just one query to database could be like this:

            valid_cnames = set(Domain.objects.all().values_list('domain', flat=True))
            orphan_cnames = set(os.listdir(domain_path)) - valid_cnames
            for cname in orphan_cnames:
                os.unlink(orphan_domain_path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, it saves a lot of queries.

@ericholscher
Copy link
Member

The issue link here is wrong. Would be good to know more about the background here. I imagine there are places we should fixing this logic in the code too.

@humitos
Copy link
Member Author

humitos commented Jan 26, 2018

@ericholscher I fixed the issue link in the description. There are some context in the issue.

I imagine there are places we should fixing this logic in the code too.

I'm not very familiarized about how symlinks work, but this PR is half of the solution that it's described in the issue.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good! I do prefer the second query you posted, so I'd say 👍 on moving to that query.

for domain_path in [symlink.PROJECT_CNAME_ROOT, symlink.CNAME_ROOT]:
for domain in os.listdir(domain_path):
try:
Domain.objects.get(domain=domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, it saves a lot of queries.

@humitos
Copy link
Member Author

humitos commented Mar 9, 2018

@agjohnson I changed the query and did a quick test locally: all the cnames that comes with the test data (djangokong.com for example) were removed but temporal.humitos.ar (the one that I have been using for .com) was kept there.

[09/Mar/2018 08:15:27] readthedocs.projects.tasks:795[6934]: INFO Unlinking orphan CNAME: /home/humitos/rtfd/code/readthedocs.org/private_cname_project/djangokong.com

Please, take a new a look at this and after merged I will create a new PR under -ops to configure the periodic task.

@agjohnson
Copy link
Contributor

Any reason to not set the task up as a scheduled task in the application? I think we can keep this out of ops.

@humitos
Copy link
Member Author

humitos commented Mar 9, 2018

No strong reason, really. I though we were setup them only in the ops repo as I think we don't have any other scheduled task in this repository. If we have, can you point me where they are? Otherwise, can you give an example (from the web) where/how I should configure this?

@agjohnson
Copy link
Contributor

We have scheduled tasks set up in our corporate repo. The ops tasks are just for tasks that only make sense in production. This is a task that would be useful for a development instance.

@humitos humitos force-pushed the humitos/symlink/remove-orphans branch from 2555fe0 to 55463c0 Compare March 10, 2018 01:34
@humitos
Copy link
Member Author

humitos commented Mar 13, 2018

I define the task here under CELERYBEAT setting and opened a PR in the ops repo to extend the basic definition from here.

This should be ready to be merged.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! One thing i didn't catch here was that this isn't calling broadcast to all webs. Celery beat only runs one instance per cluster, not on each instance. The task will need to call broadcast instead.

The changes look okay at this point, but the call logic and tests will change with this.

@humitos humitos force-pushed the humitos/symlink/remove-orphans branch from 55463c0 to b847b4b Compare March 22, 2018 21:27
@humitos
Copy link
Member Author

humitos commented Mar 22, 2018

I just pushed the changes requested.

I didn't find any example for celery beat and broadcast. So, I created a new task just to broadcast the one that I already had.

The broadcast function could be a decorator (maybe) in the future if we need to do this multiple times.

@agjohnson
Copy link
Contributor

I resolved conflicts here, going to wait on tests to merge though

@agjohnson
Copy link
Contributor

Same issue with linting. Merging!

@agjohnson agjohnson merged commit da56a7d into master Mar 23, 2018
@agjohnson agjohnson deleted the humitos/symlink/remove-orphans branch March 23, 2018 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants