-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async deliverability checking? #104
Comments
I actually started working on that a while ago in https://github.com/JoshData/python-email-validator/tree/async. But my standards are higher now for completing the work: There has to be a complete set of tests and the code has to be clear and documented. And if the work is started over, I also really really want to avoid duplicated logic by not having separate functions. So yes but with those caveats. |
I just realized the branch didn't actually have my async work on it. I've fixed it now and tried to bring it up to date with other changes that I did since I started working on it. It's not in a working state though. |
I will share with you my async implementation; feel free to use it or get inspired by it. The method first performs one DNS request for MX records, optimistically assuming it won't be a Null MX. If it happens to be a Null MX, it will perform two DNS requests in parallel for A/AAAA records. In contrast to # SPDX-License-Identifier: 0BSD OR CC0-1.0
import logging
from operator import attrgetter
from anyio import create_task_group
from dns.asyncresolver import Resolver
from dns.exception import DNSException, Timeout
from dns.rdatatype import RdataType
from dns.resolver import NXDOMAIN, NoAnswer, NoNameservers
from email_validator import validate_email
resolver = Resolver()
...
info = validate_email(email, check_deliverability=False)
domain = info.ascii_domain
success = False
async with create_task_group() as tg:
async def task(rd: RdataType):
nonlocal success
try:
answer = await resolver.resolve(domain, rd)
rrset = answer.rrset
except NoAnswer:
rrset = None
except NXDOMAIN:
return # domain does not exist, skip further checks
except (NoNameservers, Timeout):
raise # something's wrong on our side
except DNSException:
# some other error, log and proceed gracefully
logging.exception('DNS error for %r (%r)', domain, rd)
rrset = None
if rd == RdataType.MX:
if not rrset:
# on implicit mx, try a/aaaa
tg.start_soon(task, RdataType.A)
tg.start_soon(task, RdataType.AAAA)
return
# mx - treat not-null answer as success
# sort answers by preference in descending order
rrset_by_preference = sorted(rrset, key=attrgetter('preference'), reverse=True)
exchange = str(rrset_by_preference[0].exchange)
success = exchange != '.'
else:
# a/aaaa - treat any answer as success and cancel other tasks
if rrset:
success = True
tg.cancel_scope.cancel()
tg.start_soon(task, RdataType.MX) |
Thanks for sharing! The branch currently has an async implementation that seems to be working. It doesn't run DNS queries in parallel though. I'd be curious to see if it improves performance in real world scenarios. I might try it although I don't know when I'll have time to. |
Bump! |
I'd appreciate anyone testing out the |
I have taken a look at the code and the only thing that stands out is that this async implementation only supports asyncio and not trio. I know that there are many people who prefer to use trio and libraries should generally be async platform agnostic (but it's your decision at the end of the day). anyio is a nice package that lets you support both at once (although I am not sure if it will work with this Future use case). There is also a small chance that asyncio Future will work out of the box with trio - I haven't tested the code, I just read it. But maybe the future dependency is not needed at all? Maybe just return an object and let the _async method handle both cases and only await if needed. Aside of that, looks good 🙂 |
Thanks for the feedback! Makes sense. I'll take a look. |
FWIW you might be able to use |
Oh interesting. I need to make time to make some test scripts and try some of the other frameworks. Probably won't happen soon. |
@JoshData Any insight on when we can expect the |
It's hard to see a time when I would be able to get back to this. And it doesn't help that it's a high-risk change (i.e. unexpected breakage in non-async uses). |
"But my standards are higher now for completing the work" I think this is a good example of how becoming too idealistic prevents you from doing meaningful work. |
As someone who maintains a somewhat widely used Python package, but certainly not as widely used as this package which seems to be racking in 24 million downloads a month, there is a lot that goes into maintaining a package to ensure:
So I sympathise with Josh. |
Exactly. There's no way to do it in a way that won't be have a risk of making my life harder. 😀 |
Hi,
Thank you for creating this excellent library.
Would you accept a PR that adds async methods for deliverability checking? A quick look suggest it would entail a new
validate_email_deliverability
function, with some duplicated logic, and a newvalidate_email
function which could probably share almost all logic with the existing one.Thanks.
The text was updated successfully, but these errors were encountered: