Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query debugging tracer #2010

Closed
wants to merge 6 commits into from
Closed

Conversation

jtao15
Copy link
Member

@jtao15 jtao15 commented Nov 13, 2019

Add query debugging tracer for Presto as #1826
This pr includes tracer creation and propagations and some basic tracer events.

Tracer high level design

Objectives

  • Develop an extensible tracer framework that allows Presto engine to emit tracing events during query execution
  • Implement propagation of tracer to connectors to enable publishing events from within connectors
  • Add useful debugging events to the aforementioned framework

Key design points

  • Tracers track interesting execution actions per query

  • Context dependency
    Tracers keep the execution contexts tracked, such as queryId, stageId, taskId, worker nodeId, etc. such that tracers events can be uniquely identified.

  • Introduction of tracers
    EventListenerManger is injected in tracer as an event consumer (emitter) when the tracer is created by TracerFactroy.
    Tracer defined in SPI package are wrapped in ConnectorEmitter such that ConnectorTracer can be created with emitters without explicitly referencing an engine tracer.

    Untitled Diagram (3)

  • Hierarchical
    As query executes hierarchically, tracers are created and propagated correspondingly. QueryTracer is created before the query is planned on coordinator. StageTracers are created based on the context of their parent QueryTracer with stageId assigned. TaskTracers are then created following the parent StageTracers.
    On workers, TaskTracers are created from scratch with task context (queryId, stageId, taskId) assigned. DriverTracers and OperatorTracers are created accordingly as the execution propagates.

    TracerPropagation (6)

  • ConnectorTracers are created by ConnectorTracerFactory with ConnectorTracerEmitter(wrapper of engine tracer). In connectors the ConnectorTracer is propagated through connector context, the context is extensible to include other information for future development so there will not be an interface change.

  • Tracer events are semi-structured. Structured fields include execution object ids, event type, worker node id, timestamp, etc. Other information is kept as generic payload(implementation yet to be decided).

  • TracerEventType enums and ConnectorEventType enums share the same interface: EventTypeSupplier, so that connectors are able to define their own event types with connector name prefix.

Proposed events details

Coordinator/Worker EngineTracer/ConnectorTracer Execution unit Event payload (default as null) Count for sample query(SELECT * FROM hive.u_name.test_table) Possible multiple occurence per execution unit test
Coordinator EngineTracer Query PLAN_QUERY_START 1 TestTracer
PLAN_QUERY_END 1 TestTracer
QUERY_STATE_CHANGE_EVENTS(QUEUED, PLANNING, STARTING etc. 1
Stage STAGE_STATE_CHANGE_EVENT(PLANNED, SCHEDULING, RUNNING etc. 2 TestSqlStageExecution
Task SCHEDULE_TASK_WITH_SPLITS { "splits": splits( Multimap<PlanNodeId, Split>) } 2 TestSqlStageExecution
ADD_SPLITS_TO_TASK { "splits": splits( Multimap<PlanNodeId, Split>) } 0 TRUE
SEND_UPDATE_TASK_REQUEST_START { "request": request(Request.toString( )) } 5 TRUE TestHttpRemoteTask
SEND_UPDATE_TASK_REQUEST_END { "response": value(TaskInfo.toString( ))} or { "error": cause(Throwable.toString( ) )} 5 TRUE TestHttpRemoteTask
ConnectorTracer Hive.LIST_FILE_STATES_START { "path": path(Path.toString( ))} 1 TestBackgroundHiveSplitLoader
Hive.LIST_FILE_STATES_END { "path": path(Path.toString( ))} 1 TestBackgroundHiveSplitLoader
Hive.ORC_OPEN_FILE_START 0
Hive.ORC_OPEN_FILE_END 0
Worker EngineTracer Task CREATE_LOCAL_PLAN_START 2
CREATE_LOCAL_PLAN_END 2
TASK_STATE_CHANGE_EVENTS(PLANNED, RUNNING, FINISHED etc. 2 TestSqlTask
Split/Driver CREATED_DRIVER 18
SPLIT_DRIVER_CREATED 18 TestTaskExecutor
SPLIT_ADDED
SPLIT_SCHEDULED 40 TRUE TestTaskExecutor
SPLIT_UNSCHEDULED
SPLIT_STARTS_WAITING 2 TRUE TestTaskExecutor
SPLIT_ENDS_WAITING
SPLIT_BLOCKED 18 TRUE
SPLIT_UNBLOCKED 20 TRUE
SPLIT_DESTROY_INVOKED 18 TestTaskExecutor
SPLIT_FINISHED 17 TestTaskExecutor
Operator ADD_SPLIT_TO_OPERATOR 2 TestDriver
ConnectorTracer Orc.READ_ORC_TAIL_START { "path": path(orcDataSource.getId().toString()) } 2 TestOrcReaderPositions
Orc.READ_ORC_TAIL_END { "path": path(orcDataSource.getId().toString()), "orc tail": buffer(Slice.toString( )) } 2 TestOrcReaderPositions
Orc.READ_STRIPE_START { "stripe": stripeInformation(StripeInformation.toString( )) } 2 TestOrcReaderPositions
Orc.READ_STRIPE_END { "stripe": stripeInformation(StripeInformation.toString( )) } 2 TestOrcReaderPositions
Orc.READ_ORC_COMPLETE_FOOTER_START 0
Orc.READ_ORC_COMPLETE_FOOTER_END { "orc footer": completeFoorterSlice(Slice.toString( )) } 0
Orc.READ_BLOCK_START 2 TestOrcReaderPositions
Orc.READ_BLOCK_END { "block": block(Block.toString()) } 2 TestOrcReaderPositions

@cla-bot
Copy link

cla-bot bot commented Nov 13, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtao15 One high level comment before reviewing, can you please separate the work into separate commits? eg. introduction of Tracer, propagation in the engine, propagation to connectors, separate commits for different categories of events etc

pom.xml Outdated Show resolved Hide resolved
presto-hive/pom.xml Outdated Show resolved Hide resolved
@cla-bot
Copy link

cla-bot bot commented Nov 15, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Nov 18, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Nov 19, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I've added some comments.

@cla-bot
Copy link

cla-bot bot commented Nov 20, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Nov 20, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Nov 20, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Nov 20, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Nov 21, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

private final String mockProperty;

@JsonCreator
public MockSplit(@JsonProperty("mockProperty") String mockProperty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonCodec throws errors when it converts an empty split to json payload, so I added a dummy field here.

page = page.getLoadedPage();
if (tracer.isEnabled()) {
Map<String, Object> payload = new HashMap<>();
payload.put("page", page.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the page here. Remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the page.toString() will not include the extra data, and we may be interested which page it is when an event listener receive a PageLoaded event.

@cla-bot
Copy link

cla-bot bot commented Nov 22, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Nov 22, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I've some top level and inline comments.

  1. Restricting access to tracer object

We want the connectors to be able to emit events, with limited access to the engine side tracer framework. In the PR right now, we have ConnectorTracerFactory::createTracer method that takes in the engine tracer itself. I see that the ConnectorTracer has only one constructor which requires the private engine tracer object. But this pattern does not prevent connectors from storing tracer as another object in the class. And then they can invoke methods on the engine tracer. For example, HiveConnectorTracer can store the engine tracer as another member variable, and use it to create new tracer, or emit events that are not engine specific without proper namespacing.

I was thikning about another approach to circumvent this issue. Instead of ConnectorTracerFactory::createTracer taking in the tracer as an argument, it'd take a ConnectorEventEmitter, which encapsulates Tracer object and restricts access to it. It also takes care of namespacing the events.

It can look like the following:

ConnectorEventEmitter
{
	String connectorName;	// I think you have this as "source" right now
	Tracer tracer;

	ConnectorEventEmitter(String connectorName, Tracer tracer)
	{
		this.connectorName = connectorName;
		this.tracer = tracer;
	}

	public emitEvent(string actionType, payload)
	{
		String eventName = concatenate(connectorName, actionType);
		tracer.emitEvent(eventName, payload);
	}

	// if required
	public boolean isEnabled()
	{
		return tracer.isEnabled();
	}
}

The ConnectorTracerFactory would look like the following:

public class ConnectorTracerFactory
{
	public ConnectorTracer createConnectorTracer(ConnectorEventEmitter emitter);
}
public interface ConnectorTracer() {}

In this case, the connector tracers can store a reference to ConnectorEventEmitter, and invoke emitEvent when required. This also helps namespace the types.

  1. Encapsulating the connector tracer providers:

It seems that we are adding ConnectorTracerFactories to the SplitManager for the coordinator side execution. I think there is a better way to do this. Presto's codebase follows this design pattern where engine specific "manager" objects register and keep track of connector specific objects. For example, SplitManager stores a mapping for catalog specific ConnectorSplitManager objects. Similarly is a map for ConnectorPageSourceProviders in PageSourceManager. We can maintain that pattern here as well. We should register the connector specific factories/providers (i.e. a map from catalogName to ConnectorTracerProvider) in another engine specific object, say "ConnectorTracerManager", and have this object injected whenever required. For instance, we can inject it as a constructor to the SplitManager or PageSourceManager.

  1. Thanks for posting the event details, I've added my thoughts on what we should add in this first pass, would love to hear others' comments.

Events dependent on state-changes: let's use .name() and .values() methods for enum instead of manually mapping all states cleanly.

  • QUERY:
    (prepareQuery parses the query. Logical and distributed planning happens in SqlQueryExecution. We should have the following events.)
    parsing start and end
    logical plan start and end
    distributed planning start and end
    Query state change events

  • STAGE:
    Stage state change events
    Task sending updates
    (ADD_SPLITS_TO_TASK and SCHEDULE_TASK_WITH_SPLITS <-- not sure about the how useful they are, given how the updates are scheduled.)

  • TASK:
    Task state change events
    also record when the task execution is created
    LocalExecution Plan start and end

  • PIPELINE:
    none
    (I'd drop the event for adding pipeline context: I don't think there is anything latency-critical here)

  • SPLIT RUNNER:
    Driver Creation
    Scheduled, blocked, finished, destroyed etc

  • Operator:
    We're mostly interested in pagesources. I'd say the implementations in TableScanOperator may just load/propagate lazy blocks, and we can skip tracking that information for now.

  • Hive Connector

    • Directory listing events (start and end looks good. We should notice that there's also a CachingDirectoryLister, but it's okay not to care about it here.
    • I believe we're yet to add the page source events in the PR.

Also, please look at travis failures

@cla-bot
Copy link

cla-bot bot commented Nov 25, 2019

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jiapeng Tao.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Block block = blockReader.readBlock();
if (loadFully) {
block = block.getLoadedBlock();
}
if (block != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last commit: we can inline toString into toJson() call, and remove block != null check from here. Btw, we'll just get a Block object address here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the check. BlockInfo is needed here since only final fields can be referenced in lamda expression.

presto-orc/src/main/java/io/prestosql/orc/OrcReader.java Outdated Show resolved Hide resolved
@jtao15 jtao15 force-pushed the query-debug-tracing branch 5 times, most recently from e6abebe to 60442d8 Compare December 18, 2019 00:54
@jtao15
Copy link
Member Author

jtao15 commented Dec 18, 2019

Thanks for the comments @phd3 , I've addressed them accordingly.
Appreciated the suggestions we discussed offline @martint , I incorporated them in this pr also. Feel free to drop any comments or feedbacks. Thank you!

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments for the "Tracer introduction" commit.

public class TracerEvent
{
private final String nodeId;
private final String uri;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What URI does this represent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the URI of the node where the tracer events are sent.

@@ -26,4 +26,8 @@ default void queryCompleted(QueryCompletedEvent queryCompletedEvent)
default void splitCompleted(SplitCompletedEvent splitCompletedEvent)
{
}

default void tracerEventOccurred(TracerEvent tracerEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name sounds funny (I don't have any suggestions for alternate names, yet -- let me think about it).

@jtao15 jtao15 force-pushed the query-debug-tracing branch 3 times, most recently from 08e2f38 to a44de48 Compare January 17, 2020 08:33
@phd3 phd3 requested a review from martint January 20, 2020 20:18
@phd3
Copy link
Member

phd3 commented Jan 23, 2020

@jtao15 can you please fix the failing test?

@tooptoop4
Copy link
Contributor

@jtao15 conflicts

@RosterIn
Copy link

@jtao15 This can be awesome!!! is there any chance you will continue this?

@bitsondatadev
Copy link
Member

👋 @jtao15 - this PR is inactive and doesn't seem to be under development, and it might already be implemented. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants