Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(instrumentation-http)!: drop url.parse in favor of URL constructor #5091

Merged

Conversation

pichlermarc
Copy link
Member

Which problem is this PR solving?

Important

Note for reviewers: I noticed that there's quite a few edge cases with this. I tried to address all of them, but please review this in-depth. Try things out if possible.

See #5060 - some characters cause an error in url.parse which will cause requests to fail. This PR replaces usages of url.parse in favor of the URL constructor.

#5085 did some preparation for this, removing one occurence, this PR removes the rest of them. Please review this PR in-depth. The URL and url.parse have very different behavior and we rely quite a bit on that old behavior. It also throws more often then url.parse did. Therefore we need to add a bit more defensive code and some quite ugly workarounds - please let me know if you know of more elegant solutions to accomplish this.

Fixes #5060

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Unit tests
  • Manual testing

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.17%. Comparing base (030aff3) to head (601c64a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5091      +/-   ##
==========================================
- Coverage   93.18%   93.17%   -0.02%     
==========================================
  Files         315      315              
  Lines        8086     8086              
  Branches     1617     1617              
==========================================
- Hits         7535     7534       -1     
- Misses        551      552       +1     

see 1 file with indirect coverage changes

@pichlermarc pichlermarc marked this pull request as ready for review October 25, 2024 14:52
@pichlermarc pichlermarc requested a review from a team as a code owner October 25, 2024 14:52
requestUrl?.hostname ||
host?.replace(/^(.*)(:[0-9]{1,5})/, '$1') ||
'localhost';
const host = headers.host;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the host header guaranteed to be there? what about previous check to get the host from the URL?, it is not there anymore for host and hostname calculation, maybe I'm missing something still going through the changes

Copy link
Member Author

@pichlermarc pichlermarc Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, IncomingMessage.url never includes the host or protocol. (https://nodejs.org/en/learn/modules/anatomy-of-an-http-transaction#method-url-and-headers, the actual API docs don't seem to state that though)

Under the assumption that this is true, I derived that in the old implementation - when parsing the URL with url.parse on L679 - there can never be a hostname that was parsed from the URL. On L680 of the old code, we'll therefore always fall back to IncomingMessage.headers.host because the requestUrl.hostname it's always null.

For the new implementation that means: we don't try to parse the URL to get the host because it'll never be there, our only chance to get it is the host header. If it's not there, we can fall back to localhost as the old code did.

@pichlermarc pichlermarc added bug Something isn't working priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc pkg:instrumentation-http labels Nov 5, 2024
@@ -7,10 +7,16 @@ All notable changes to experimental packages in this project will be documented

### :boom: Breaking Change

* fix(instrumentation-http): drop url.parse in favor of URL constructor [#5091](https://github.com/open-telemetry/opentelemetry-js/pull/5091) @pichlermarc
* (user-facing): signature of `getRequestInfo()` now requires a `DiagLogger` to be passed at the first position
Copy link
Member Author

@pichlermarc pichlermarc Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: we're exporting lots of utils, I don't think this is intentional but it's still technically a breaking change.

Copy link
Member

@hectorhdzg hectorhdzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

host?.replace(/^(.*)(:[0-9]{1,5})/, '$1') ||
'localhost';
const host = headers.host;
const hostname = host?.replace(/^(.*)(:[0-9]{1,5})/, '$1') || 'localhost';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const hostname = host?.replace(/^(.*)(:[0-9]{1,5})/, '$1') || 'localhost';
const hostname = host?.replace(/^(.*)(:[0-9]{1,5})$/, '$1') || 'localhost';

Perhaps anchor at the end of the string? I'm not sure if an IPv6 address (with :[0-9]+ segments) could get in here as host. If not, ignore this comment.

Ah, I see that this is just re-using code that was already there. I think you can ignore this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that this is just re-using code that was already there. I think you can ignore this comment.

Yep, I just re-used it 🙂 Let's keep it for now and adjust it later if necessary.

@pichlermarc pichlermarc changed the title fix(instrumentation-http): drop url.parse in favor of URL constructor fix(instrumentation-http)!: drop url.parse in favor of URL constructor Nov 8, 2024
@pichlermarc pichlermarc added this pull request to the merge queue Nov 8, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Nov 8, 2024
@pichlermarc pichlermarc enabled auto-merge November 8, 2024 16:51
@pichlermarc pichlermarc added this pull request to the merge queue Nov 8, 2024
Merged via the queue into open-telemetry:main with commit 87bd98e Nov 8, 2024
21 checks passed
@pichlermarc pichlermarc deleted the fix/non-ascii-requests branch November 8, 2024 17:01
povilasv pushed a commit to povilasv/opentelemetry-js that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:instrumentation-http priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

instrumentation-http uses the wrong method to parse URLs for outgoing requests
4 participants