Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling Error when using 20.2.0 release with multiprocessing Pool #301

Closed
Farfigschitter opened this issue Feb 1, 2021 · 4 comments
Closed

Comments

@Farfigschitter
Copy link

Farfigschitter commented Feb 1, 2021

After upgrading structlog version from 20.1.0 to 20.2.0 i started getting errors when calling function that somehow uses logs, in parallel, using multiprocessing Pool.
Environment: python 3.7, structlog 20.2.0

Example Code:

from multiprocessing import Pool
import structlog
import uuid

class Someclass:
    def __init__(self):
        self.log = structlog.get_logger(self.__class__.__name__).bind(service='someservice')

    def example(self):
        log = self.log.bind(func_call_id=str(uuid.uuid4()))
        log.msg('example message')
        return


def func_call(obj):
    obj.example()
    return


if __name__ == "__main__":
    threads = 2
    obj = Someclass()
    pool = Pool(processes=threads)
    args = [obj]*threads
    pool.map(func_call, args)

Expected output(as with structlog 20.1.0):

2021-02-01 13:08.11 example message                func_call_id=02d66c3c-5f55-4336-acba-3db29be474cc service=someservice
2021-02-01 13:08.11 example message                func_call_id=9b1786f5-140d-460c-a1a9-eda2b9a50c9f service=someservice

Process finished with exit code 0

Actual output(with structlog 20.2.0):

Traceback (most recent call last):
  File "D:/Projects/example/example.py", line 25, in <module>
    pool.map(func_call, args)
  File "C:\Users\ae\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\ae\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
  File "C:\Users\ae\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 431, in _handle_tasks
    put(task)
  File "C:\Users\ae\AppData\Local\Programs\Python\Python37\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Users\ae\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'structlog._log_levels.BoundLoggerFilteringAtNotset'>: attribute lookup BoundLoggerFilteringAtNotset on structlog._log_levels failed

Process finished with exit code 1
@hynek
Copy link
Owner

hynek commented Feb 18, 2021

The problem is that the default logger changed as called out in https://github.com/hynek/structlog/blob/master/CHANGELOG.rst#backward-incompatible-changes-1

I will look into making the new one pickleable without compromising on its performance, but you can have the old behavior by passing structlog.configure(wrapper_class=structlog.BoundLogger) in the meanwhile.

@hynek
Copy link
Owner

hynek commented Feb 18, 2021

I believe this has been fixed by 334f029. Ping this issue if not, thanks.

@hynek hynek closed this as completed Feb 18, 2021
@9dogs
Copy link

9dogs commented Apr 23, 2021

I have a similar issue on 21.1.0 when cache_logger_on_first_use is set to True:
AttributeError: Can't pickle local object 'BoundLoggerLazyProxy.bind.<locals>.finalized_bind'

Example code below:

import asyncio
from concurrent.futures.process import ProcessPoolExecutor

import structlog

structlog.configure(
    # cache_logger_on_first_use=True  # uncomment to get an error
)


class TestClass:
    def __init__(self):
        self.logger = structlog.get_logger('test')

    def sync_method(self):
        self.logger.info('Hello from sync_method', val=1)

    async def main(self):
        self.logger.info('Hello from main')
        loop = asyncio.get_running_loop()
        with ProcessPoolExecutor() as pool:
            await loop.run_in_executor(pool, self.sync_method)


if __name__ == '__main__':
    print(structlog.__version__)
    test_class = TestClass()
    asyncio.run(test_class.main())

@hynek
Copy link
Owner

hynek commented Apr 23, 2021

I'm afraid that can't be fixed unless you find a way to pickle finalized_bind. AFAICT your options are:

  • don't use the logger before it moves into the new process
  • don't use cache_logger_on_first_use.

Barring any logging hot loops, so suspect the second option is good enough.

hynek added a commit that referenced this issue May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants