-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far.
And it is really good that we got rid off slog :)
b801f26
to
ab089f9
Compare
client/telemetry/src/layer.rs
Outdated
// NOTE: this is a hack to inject the span id into the json | ||
let mut message = format!(r#"{{"id":{},"#, (*self.0).id); | ||
message.push_str(&value[1..]); | ||
(*self.0).json = Some(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkchr 😱 😱 😱
I think I should open an issue on tracing to see if they plan to do something to allow passing anything through the key-values of the log. If we had that, I could serialize the json later when I actually have that id.
Linking this here: old-storyai/tracing-wasm#13 TL;DR Telemetry in the browser needs some update. It works already! But the logs look completely different because it's not using the same library than before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the hard work. 👍
Some last nitpicks are left and than we are good to merge this IMHO.
client/telemetry/src/lib.rs
Outdated
for addr in addresses { | ||
if let Some(node) = node_pool.get_mut(&addr) { | ||
node.telemetry_connection_notifier | ||
.push(connection_notifier.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.push(connection_notifier.clone()); | |
.push(connection_notifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't! It's in a loop
client/telemetry/src/lib.rs
Outdated
let input = if let Some(x) = input { | ||
x | ||
} else { | ||
log::error!( | ||
target: "telemetry", | ||
"The telemetry seems to be polled multiple times simultaneously" | ||
"Unexpected end of stream. This is a bug.", | ||
); | ||
// Returning `Pending` here means that we may never get polled again, but this is | ||
// ok because we're in a situation where something else is actually currently doing | ||
// the polling. | ||
return Poll::Pending; | ||
return; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be an expect, as we hold sender/receiver and this can not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f462418
client/telemetry/src/lib.rs
Outdated
let (id, verbosity, message) = if let Some(x) = input { | ||
x | ||
} else { | ||
log::error!( | ||
target: "telemetry", | ||
"Unexpected end of stream. This is a bug.", | ||
); | ||
return; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f462418
client/telemetry/src/lib.rs
Outdated
addr, | ||
verbosity, | ||
); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return; | |
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 Fixed in f6ec36d
client/telemetry/src/lib.rs
Outdated
Node::new(transport.clone(), addr.clone(), Vec::new(), Vec::new()) | ||
}); | ||
|
||
node.connection_messages.extend(connection_message.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.connection_messages.extend(connection_message.clone()); | |
node.connection_messages.extend(connection_message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't, it's in a loop
client/telemetry/src/lib.rs
Outdated
let mut node_pool: HashMap<Multiaddr, _> = HashMap::new(); | ||
|
||
// initialize the telemetry nodes and register new notifiers | ||
async fn process_register( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect 👍
Now just move this and the other function below out of the run
function :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 24596be
(...like that??)
client/telemetry/src/lib.rs
Outdated
@@ -16,339 +16,486 @@ | |||
// You should have received a copy of the GNU General Public License | |||
// along with this program. If not, see <https://www.gnu.org/licenses/>. | |||
|
|||
//! Telemetry utilities. | |||
//! To start using this module, please initialize the global logger from `sc-tracing`. This will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to have these docs more a high level description on how the system works.
This means:
- How does it work with the span
- How does the communication works with the telemetry worker
- etc
- I just imagine some sort of "diagram" explaining it :D
However, I will not make this a hard requirement to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look better? acf1ec8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More doc bc36136
bot merge |
Checks failed; merge aborted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was half-way through when I noticed the PR was merged. Feel free to open a followup with the suggested changes (and let me know if I should do the second half).
disable_log_color, | ||
})?; | ||
let mut logger = GlobalLoggerBuilder::new(self.log_filters()?); | ||
logger.with_log_reloading(!self.is_log_filter_reloading_disabled()?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.with_log_reloading(!self.is_log_filter_reloading_disabled()?); | |
let reloadable = !self.is_log_filter_reloading_disabled().expect("Hardcoded Ok; qed"); | |
logger.with_log_reloading(reloadable); |
…but I think the booleans should be flipped so we'd have logger.with_log_reloading(self.log_reloading)
.
The origin of these funny booleans isn't this PR, but maybe we can avoid extending the awkwardness and have to do one bool flip here, and then another one in GlobalLoggerBuilder::with_log_reloading()
. I'd rename GlobalLoggerBuilder.disable_log_reloading
to just log_reloading: bool
and work backwards from there.
|
||
/// Get a new [`TelemetryHandle`]. | ||
/// | ||
/// This is used when you want to register a new telemetry for a Substrate node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new telemetry… what? Telemetry endpoint? Telemetry worker? Telemetry channel? Please elaborate (same on handle()
).
let telemetry_span = if config.telemetry_endpoints.is_some() { | ||
Some(TelemetrySpan::new()) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let telemetry_span = if config.telemetry_endpoints.is_some() { | |
Some(TelemetrySpan::new()) | |
} else { | |
None | |
}; | |
let telemetry_span = config.telemetry_endpoints.as_ref().map(|_| TelemetrySpan::new() ); |
(same for full client)
match self.telemetry_endpoints.as_ref() { | ||
// Don't initialise telemetry if `telemetry_endpoints` == Some([]) | ||
Some(endpoints) if !endpoints.is_empty() => Some(endpoints), | ||
_ => None, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match self.telemetry_endpoints.as_ref() { | |
// Don't initialise telemetry if `telemetry_endpoints` == Some([]) | |
Some(endpoints) if !endpoints.is_empty() => Some(endpoints), | |
_ => None, | |
} | |
// Don't initialise telemetry if `telemetry_endpoints` == Some([]) | |
self.telemetry_endpoints | |
.as_ref() | |
.filter(|endpoints| endpoints.is_empty())``` |
@@ -228,14 +233,17 @@ pub struct TaskManager { | |||
/// terminates and gracefully shutdown. Also ends the parent `future()` if a child's essential | |||
/// task fails. | |||
children: Vec<TaskManager>, | |||
/// A telemetry handle used to enter the telemetry span when a task is spawned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing; it's a TelemetrySpan
, not a TelemetryHandle
. Can you reword this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, that's because I changed the implementation at some point. Before that the handle also handled the span
|
||
> **Note**: Cloning the [`Telemetry`] and polling from multiple clones has an unspecified behaviour. | ||
Substrate's nodes initialize/register to the [`TelemetryWorker`] using a [`TelemetryHandle`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrate's nodes initialize/register to the [`TelemetryWorker`] using a [`TelemetryHandle`]. | |
Substrate's nodes initialize/register with the [`TelemetryWorker`] using a [`TelemetryHandle`]. |
> **Note**: Cloning the [`Telemetry`] and polling from multiple clones has an unspecified behaviour. | ||
Substrate's nodes initialize/register to the [`TelemetryWorker`] using a [`TelemetryHandle`]. | ||
This handle can be cloned and passed around. It uses an asynchronous channel to communicate with | ||
the running [`TelemetryWorker`] dedicated to registration. Registering a telemetry can happen at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the running [`TelemetryWorker`] dedicated to registration. Registering a telemetry can happen at | |
the running [`TelemetryWorker`] dedicated to registration. Registering a telemetry source can happen at |
?
/// The URL string can be either a URL or a multiaddress. | ||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Hash)] | ||
pub struct TelemetryEndpoints( | ||
#[serde(deserialize_with = "url_or_multiaddr_deser")] pub(crate) Vec<(Multiaddr, u8)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[serde(deserialize_with = "url_or_multiaddr_deser")] pub(crate) Vec<(Multiaddr, u8)>, | |
#[serde(deserialize_with = "url_or_multiaddr_deser")] | |
pub(crate) Vec<(Multiaddr, u8)>, |
impl TelemetryLayer { | ||
/// Create a new [`TelemetryLayer`] and [`TelemetryWorker`]. | ||
/// | ||
/// If not provided, the `buffer_size` will be 16 by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If not provided, the `buffer_size` will be 16 by default. | |
/// The `buffer_size` defaults to 16. |
worker: Option<worker::TelemetryWorker>, | ||
/// Receives log entries for them to be dispatched to the worker. | ||
receiver: mpsc::Receiver<async_record::AsyncRecord>, | ||
/// It should be ran as a background task using the [`TelemetryWorker::run`] method. This method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// It should be ran as a background task using the [`TelemetryWorker::run`] method. This method | |
/// It should run as a background task using the [`TelemetryWorker::run`] method. This method |
@dvdplm np I will make another PR |
* Adapt service creation to new substrate API * Fix test * WIP * Revert "WIP" This reverts commit 816a363. * WIP * Adapt to current code * Fix tests * Yet another fix * CLEANUP * WIP * WIP * WIP * Adapt code to changes on substrate * Adapt code * Introduce kick. * Fixes * WIP * WIP * WIP * WIP * Bump * Update sp-io * WIP Co-authored-by: Gav Wood <[email protected]> Co-authored-by: Parity Benchmarking Bot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments.
One slight concern I have that I'm not sure is valid is: there are a few places where you note that it's important to not call things out of order or at most once. It's a bit of a code smell and I wonder if we can do better than that, OnceCell
-something-something. More something to ponder than a request for change.
* master: Companion for sub/7040 (#1719) Companion PR for paritytech/substrate#7463 (#1948) Bump serde from 1.0.119 to 1.0.120 (#2292)
Sorry I forgot to answer you. Now that the telemetry can be started in the middle of anything I think the only thing that remains is if you start the telemetry twice for the same substrate node. It's not really breaking, it will just extend the existing connections if necessary but I think it will add another connection message so every time the connection re-establishes it will send 2 connection messages instead of one. I could fix that to make it override the previous connection message. |
* Adapt service creation to new substrate API * Fix test * WIP * Revert "WIP" This reverts commit 816a3633e91abc943b12b2bfa77ce98b959e78b2. * WIP * Adapt to current code * Fix tests * Yet another fix * CLEANUP * WIP * WIP * WIP * Adapt code to changes on substrate * Adapt code * Introduce kick. * Fixes * WIP * WIP * WIP * WIP * Bump * Update sp-io * WIP Co-authored-by: Gav Wood <[email protected]> Co-authored-by: Parity Benchmarking Bot <[email protected]>
* Adapt service creation to new substrate API * Fix test * WIP * Revert "WIP" This reverts commit 816a3633e91abc943b12b2bfa77ce98b959e78b2. * WIP * Adapt to current code * Fix tests * Yet another fix * CLEANUP * WIP * WIP * WIP * Adapt code to changes on substrate * Adapt code * Introduce kick. * Fixes * WIP * WIP * WIP * WIP * Bump * Update sp-io * WIP Co-authored-by: Gav Wood <[email protected]> Co-authored-by: Parity Benchmarking Bot <[email protected]>
polkadot companion: paritytech/polkadot#1948