Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(openai): safe require tiktoken for webpack bundlers #4433

Merged
merged 6 commits into from
Jun 25, 2024

Conversation

sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Jun 24, 2024

What does this PR do?

Add-on to #4366, where we added support for leveraging tiktoken if it is installed to capture token metrics for the openai integration. However, this does not play nicely with webpack, which popular framework Next.js uses under the hood. The workaround involves setting resolve.fallback.tiktoken = false in webpack.config.js or next.config.js. However, to make using the tracer as seamless as possible, this fix wraps the require in a function to bypass this behavior (by not directly requiring it).

Additionally, cleans up some test logic and utilizes tiktoken as a dev dependency for regression purposes.

Motivation

Fixes #4424

The following webpack error would come up when bundling:

⚠ ../../node_modules/dd-trace/packages/datadog-plugin-openai/src/index.js
Module not found: Can't resolve 'tiktoken' in 'node_modules/dd-trace/packages/datadog-plugin-openai/src'

Testing

Unit

Added tiktoken as a dev dependency for OpenAI tests. All tests now run against this, and any changes to that library or how we use it (ie typos) are reflected in failing tests. To test the in-house estimations, I moved the estimateTokens function into its own file, and added a small set of regression tests to assert basic behavior. This can be cleaned up more in a refactor of the OpenAI integration & tests in a future PR.

Local

Ran webpack on a test script importing dd-trace, before and after this fix. I was getting the error before, and not after. I was wondering why our Next.js tests weren't picking up on this, but I believe it's because of how we set up those tests, and how we use a server.js file, which I'm not sure gets compiled with webpack.

Copy link

github-actions bot commented Jun 24, 2024

Overall package size

Self size: 6.71 MB
Deduped: 61.97 MB
No deduping: 62.25 MB

Dependency sizes

name version self size total size
@datadog/native-appsec 8.0.1 15.59 MB 15.6 MB
@datadog/native-iast-taint-tracking 2.1.0 14.91 MB 14.92 MB
@datadog/pprof 5.3.0 9.85 MB 10.22 MB
protobufjs 7.2.5 2.77 MB 6.56 MB
@datadog/native-iast-rewriter 2.3.1 2.15 MB 2.24 MB
@opentelemetry/core 1.14.0 872.87 kB 1.47 MB
@datadog/native-metrics 2.0.0 898.77 kB 1.3 MB
@opentelemetry/api 1.8.0 1.21 MB 1.21 MB
import-in-the-middle 1.8.1 71.67 kB 741.34 kB
msgpack-lite 0.1.26 201.16 kB 281.59 kB
opentracing 0.14.7 194.81 kB 194.81 kB
semver 7.5.4 93.4 kB 123.8 kB
pprof-format 2.1.0 111.69 kB 111.69 kB
@datadog/sketches-js 2.1.0 109.9 kB 109.9 kB
lodash.sortby 4.7.0 75.76 kB 75.76 kB
lru-cache 7.14.0 74.95 kB 74.95 kB
ignore 5.2.4 51.22 kB 51.22 kB
int64-buffer 0.1.10 49.18 kB 49.18 kB
shell-quote 1.8.1 44.96 kB 44.96 kB
istanbul-lib-coverage 3.2.0 29.34 kB 29.34 kB
tlhunter-sorted-set 0.1.0 24.94 kB 24.94 kB
limiter 1.1.5 23.17 kB 23.17 kB
dc-polyfill 0.1.4 23.1 kB 23.1 kB
retry 0.13.1 18.85 kB 18.85 kB
jest-docblock 29.7.0 8.99 kB 12.76 kB
crypto-randomuuid 1.0.0 11.18 kB 11.18 kB
path-to-regexp 0.1.7 6.78 kB 6.78 kB
koalas 1.0.2 6.47 kB 6.47 kB
module-details-from-path 1.0.3 4.47 kB 4.47 kB

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Jun 24, 2024

Benchmarks

Benchmark execution time: 2024-06-25 19:04:47

Comparing candidate commit 5099972 in PR branch sabrenner/openai-tiktoken-bundling-fix with baseline commit fec9a91 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 260 metrics, 6 unstable metrics.

@sabrenner sabrenner marked this pull request as ready for review June 24, 2024 18:57
@sabrenner sabrenner requested review from a team as code owners June 24, 2024 18:57
Copy link
Member

@tlhunter tlhunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam and I spoke OOB. Will need a test as the encodingForModel typo alluded.

Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.42%. Comparing base (b1f1f85) to head (5099972).
Report is 15 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4433       +/-   ##
===========================================
- Coverage   92.64%   65.42%   -27.23%     
===========================================
  Files         116       95       -21     
  Lines        4173     2756     -1417     
  Branches       33       33               
===========================================
- Hits         3866     1803     -2063     
- Misses        307      953      +646     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sabrenner sabrenner merged commit 170e337 into master Jun 25, 2024
141 checks passed
@sabrenner sabrenner deleted the sabrenner/openai-tiktoken-bundling-fix branch June 25, 2024 20:15
juan-fernandez pushed a commit that referenced this pull request Jul 10, 2024
* fix

* change typo

* add tiktoken as dev dependency for testing

* change tests to check tiktoken usage

* test in-house estimator separately

* add tiktoken license to third party
juan-fernandez pushed a commit that referenced this pull request Jul 10, 2024
* fix

* change typo

* add tiktoken as dev dependency for testing

* change tests to check tiktoken usage

* test in-house estimator separately

* add tiktoken license to third party
This was referenced Jul 10, 2024
juan-fernandez pushed a commit that referenced this pull request Jul 11, 2024
* fix

* change typo

* add tiktoken as dev dependency for testing

* change tests to check tiktoken usage

* test in-house estimator separately

* add tiktoken license to third party
juan-fernandez pushed a commit that referenced this pull request Jul 11, 2024
* fix

* change typo

* add tiktoken as dev dependency for testing

* change tests to check tiktoken usage

* test in-house estimator separately

* add tiktoken license to third party
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module not found: Can't resolve 'tiktoken'
2 participants