Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify support for forking process models #3307

Open
PeterJCLaw opened this issue May 10, 2023 · 13 comments
Open

Clarify support for forking process models #3307

PeterJCLaw opened this issue May 10, 2023 · 13 comments

Comments

@PeterJCLaw
Copy link

https://opentelemetry-python.readthedocs.io/en/latest/examples/fork-process-model/README.html indicates that BatchSpanProcessor does not support forking process models, including those used by Gunicorn and uWSGI. However those docs (#1609) pre-date other changes which appear to have added support for forking process models, including:

which seem specifically targetted at resolving the described issues.

What is the current expected state of support for the various forking models?
Either way, updating the docs to clarify the actual situation would be appreciated.

Possibly related issues:

@aabmass
Copy link
Member

aabmass commented May 11, 2023

Thanks for opening this issue, I think it is an important topic.

Afaik, things should be working for tracing at this point. Metrics are a little more complicated and #2837 is definitely still valid. It would be great if we could clean up old issues/documentation and would definitely appreciate help on this if you have the time @PeterJCLaw.

@srikanthccv last worked on this, do you know if any of the linked bugs can be closed out now?

@srikanthccv
Copy link
Member

@PeterJCLaw
Copy link
Author

Thanks for those additional details.

  • I think 4th linked issue can be closed; however, it's better to verify once before closing.

What does verification look like here? Given the nature of this as a forking issue I'd kinda assumed the failure mode was race conditions and the like -- timing based and hard to reproduce/be sure if fixed just from testing. Is there a known reproduction of the failure mode or something we can do to check that it's working as intended?
(Or is it a cased of code inspection?)

@srikanthccv
Copy link
Member

The verification can be pretty simple. If you run a simple gunicorn based application with multiple workers (Without following the special instructions mentioned in the docs), it should export the traces/metrics/logs. If they are not exported (to the collector, console etc.), then the issue still exists.

@PeterJCLaw
Copy link
Author

PeterJCLaw commented May 11, 2023

Hrm, ok. I'm currently running gunicorn with the default worker type and that does seem to work. My reading of gunicorns docs indicate that the master process doesn't actually load the application though -- only the workers do (after the fork happens), unless you use the --preload option (which I'm currently not doing).

I'll see if I can find some time to validate the behaviours under gunicorn & uwssgi.

Edit for clarity: my use of gunicorn without --preload works fine -- I get tracing reported even though I'm using BatchSpanProcessor.

@srikanthccv
Copy link
Member

I am confused now. Is it working or not working with gunicorn?

@PeterJCLaw
Copy link
Author

My current setup is using gunicorn without calling --preload and that seems to work.
Not using --preload under gunicorn seems to be the default, though the open telemetry docs seem to imply that they're expecting the inverse -- that gunicorn will preload before forking.

@PeterJCLaw
Copy link
Author

I've just tested a toy application under gunicorn and uWSGI, using both their --preload and --lazy-apps modes (and not) respectively using https://gist.github.com/PeterJCLaw/5e6fd585d0c620496c82ca926ce50f67. All four configurations seem to work ok -- I get traces printed out to the console which seem reasonable.

I'm not sure how to validate that the desired batching is happening, though I can see that the traces are often not printed until several requests later which suggests that it is working.

@srikanthccv
Copy link
Member

I've just tested a toy application under gunicorn and uWSGI, using both their --preload and --lazy-apps modes (and not) respectively using https://gist.github.com/PeterJCLaw/5e6fd585d0c620496c82ca926ce50f67. All four configurations seem to work ok -- I get traces printed out to the console which seem reasonable.

Thank you, this confirms the above statement that traces and logs batch processors work all the time.

I'm not sure how to validate that the desired batching is happening, though I can see that the traces are often not printed until several requests later which suggests that it is working

They are emitted if the queue becomes full or batch timeout elapses. I think this is working fine.

@PeterJCLaw
Copy link
Author

PeterJCLaw commented Jun 6, 2023

Unfortunately while it looks like the BatchSpanProcessor does actually work fine, the PeriodicExportingMetricReader appears not to. I've not tested it under all the various configurations previously discussed, however I have observed it not working correctly in uWSGI's default mode (i.e: where it imports the WSGI app before forking). In that case the ticker thread only exists in the main process, presumably meaning that metrics are only collected from that main process. I'd guess this is due to PeriodicExportingMetricReader using the approach from #2242 rather then the enhanced approach introduced in #2277.

Edit: using uWSGI's --lazy-apps to delay the import does get PeriodicExportingMetricReader running in all the worker processes.

@PeterJCLaw
Copy link
Author

PeterJCLaw commented Jun 6, 2023

Hrm, the uWSGI side of this may be somewhat resolvable by users. Ish. Via https://stackoverflow.com/a/72601192 and unbit/uwsgi#2388 it seems that uWSGI has some extra options which can be used to enable os.register_at_fork to work -- specifically --py-call-osafterfork. There's also --enable-threads which seems to be needed to enable workers to have threads. Unfortunately in local testing I'm seeing that using those two options together results in uWSGI not serving any requests (at all); while using just --py-call-osafterfork seems to lock up one of the two workers I'm running with. Not sure yet if this is a uWSGI issue generally (definitely feels like it could be) or something to do with threads. I doubt it's open-telemetry specific. Posting here as hopefully useful to others hitting this.

@SharifiFaranak
Copy link

+1 to updating docs here since BatchSpanProcessor is working well with Gunicorn. It cost me more than a few hours trying to get the post_fork hook to work (which didn't end up working) when it wasn't necessary.

@alexmojaki
Copy link

It seems that BatchSpanProcessor works fine, but not OTLPSpanExporter (with --preload). Given test.py containing:

from flask import Flask
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, SimpleSpanProcessor, ConsoleSpanExporter

# BatchSpanProcessor works fine
# exporter = ConsoleSpanExporter()
# span_processor = BatchSpanProcessor(exporter)

# But OTLPSpanExporter does not
exporter = OTLPSpanExporter()
span_processor = SimpleSpanProcessor(exporter)

tracer_provider = TracerProvider()
tracer_provider.add_span_processor(span_processor)

app = Flask(__name__)
FlaskInstrumentor().instrument_app(app, tracer_provider=tracer_provider)


@app.route('/')
def hello():
    return 'Hello!'

and running gunicorn --workers=2 --preload test:app, visiting http://127.0.0.1:8000 causes an error:

objc[37636]: +[NSCharacterSet initialize] may have been in progress in another thread when fork() was called.
objc[37636]: +[NSCharacterSet initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[2024-10-14 13:50:45 +0200] [37634] [ERROR] Worker (pid:37636) was sent SIGKILL! Perhaps out of memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants