Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async deliverability checking? #104

Open
magnuswatn opened this issue Apr 13, 2023 · 15 comments
Open

Async deliverability checking? #104

magnuswatn opened this issue Apr 13, 2023 · 15 comments

Comments

@magnuswatn
Copy link

Hi,

Thank you for creating this excellent library.

Would you accept a PR that adds async methods for deliverability checking? A quick look suggest it would entail a new validate_email_deliverability function, with some duplicated logic, and a new validate_email function which could probably share almost all logic with the existing one.

Thanks.

@JoshData
Copy link
Owner

JoshData commented Apr 13, 2023

I actually started working on that a while ago in https://github.com/JoshData/python-email-validator/tree/async. But my standards are higher now for completing the work: There has to be a complete set of tests and the code has to be clear and documented. And if the work is started over, I also really really want to avoid duplicated logic by not having separate functions. So yes but with those caveats.

@JoshData
Copy link
Owner

I just realized the branch didn't actually have my async work on it. I've fixed it now and tried to bring it up to date with other changes that I did since I started working on it. It's not in a working state though.

@Zaczero
Copy link

Zaczero commented Mar 4, 2024

I will share with you my async implementation; feel free to use it or get inspired by it.

The method first performs one DNS request for MX records, optimistically assuming it won't be a Null MX. If it happens to be a Null MX, it will perform two DNS requests in parallel for A/AAAA records.

In contrast to python-email-validator, it doesn't check for SPF records. In my opinion, such assumptions are incorrect.

# SPDX-License-Identifier: 0BSD OR CC0-1.0

import logging
from operator import attrgetter
from anyio import create_task_group
from dns.asyncresolver import Resolver
from dns.exception import DNSException, Timeout
from dns.rdatatype import RdataType
from dns.resolver import NXDOMAIN, NoAnswer, NoNameservers
from email_validator import validate_email 

resolver = Resolver()

...

info = validate_email(email, check_deliverability=False)
domain = info.ascii_domain
success = False

async with create_task_group() as tg:

    async def task(rd: RdataType):
        nonlocal success

        try:
            answer = await resolver.resolve(domain, rd)
            rrset = answer.rrset
        except NoAnswer:
            rrset = None
        except NXDOMAIN:
            return  # domain does not exist, skip further checks
        except (NoNameservers, Timeout):
            raise  # something's wrong on our side
        except DNSException:
            # some other error, log and proceed gracefully
            logging.exception('DNS error for %r (%r)', domain, rd)
            rrset = None

        if rd == RdataType.MX:
            if not rrset:
                # on implicit mx, try a/aaaa
                tg.start_soon(task, RdataType.A)
                tg.start_soon(task, RdataType.AAAA)
                return

            # mx - treat not-null answer as success
            # sort answers by preference in descending order
            rrset_by_preference = sorted(rrset, key=attrgetter('preference'), reverse=True)
            exchange = str(rrset_by_preference[0].exchange)
            success = exchange != '.'
        else:
            # a/aaaa - treat any answer as success and cancel other tasks
            if rrset:
                success = True
                tg.cancel_scope.cancel()

    tg.start_soon(task, RdataType.MX)

@JoshData
Copy link
Owner

JoshData commented Mar 5, 2024

Thanks for sharing! The branch currently has an async implementation that seems to be working. It doesn't run DNS queries in parallel though. I'd be curious to see if it improves performance in real world scenarios. I might try it although I don't know when I'll have time to.

@mrdeveloperdude
Copy link

Bump!

@JoshData
Copy link
Owner

I'd appreciate anyone testing out the async branch before I merge it.

@Zaczero
Copy link

Zaczero commented Mar 12, 2024

I have taken a look at the code and the only thing that stands out is that this async implementation only supports asyncio and not trio. I know that there are many people who prefer to use trio and libraries should generally be async platform agnostic (but it's your decision at the end of the day). anyio is a nice package that lets you support both at once (although I am not sure if it will work with this Future use case).

There is also a small chance that asyncio Future will work out of the box with trio - I haven't tested the code, I just read it.

But maybe the future dependency is not needed at all? Maybe just return an object and let the _async method handle both cases and only await if needed.

Aside of that, looks good 🙂

@JoshData
Copy link
Owner

Thanks for the feedback! Makes sense. I'll take a look.

@tamird
Copy link
Contributor

tamird commented May 9, 2024

FWIW you might be able to use collections.abc.Awaitable instead of asyncio.Future.

@JoshData
Copy link
Owner

Oh interesting.

I need to make time to make some test scripts and try some of the other frameworks. Probably won't happen soon.

@umarbutler
Copy link

@JoshData Any insight on when we can expect the async branch to be merged? I'm happy to test it out myself if that's necessary.

@JoshData
Copy link
Owner

It's hard to see a time when I would be able to get back to this. And it doesn't help that it's a high-risk change (i.e. unexpected breakage in non-async uses).

@Zaczero
Copy link

Zaczero commented Jan 16, 2025

"But my standards are higher now for completing the work" I think this is a good example of how becoming too idealistic prevents you from doing meaningful work.

@umarbutler
Copy link

"But my standards are higher now for completing the work" I think this is a good example of how becoming too idealistic prevents you from doing meaningful work.

As someone who maintains a somewhat widely used Python package, but certainly not as widely used as this package which seems to be racking in 24 million downloads a month, there is a lot that goes into maintaining a package to ensure:

  • You don’t shoot yourself in the foot by introducing features that are not battle-hardened enough and end up causing mayhem.
  • The features you add are added in such a way as to not end up with a ridiculously complex and often duplicative API (see, eg, Langchain).
  • You don’t add unnecessary dependencies to the stack such that you end up with a very fragile dependency tree that’s one breaking change away from destroying everything you hold dear.

So I sympathise with Josh.

@JoshData
Copy link
Owner

Exactly. There's no way to do it in a way that won't be have a risk of making my life harder. 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants