Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(traces): evaluation annotations on traces for associating spans with eval metrics #1693

Merged
merged 22 commits into from
Dec 1, 2023

Conversation

mikeldking
Copy link
Contributor

@mikeldking mikeldking commented Nov 1, 2023

resolves #1691

This adds the ability to associate eval results to trace datasets. It notably keeps track of a single eval run in a new TraceEvaluations - which contains the eval results as well as the name of the eval.

Considerations

  • Made a SpanEvaluations contain information about one eval - this gives room for there to be meta-data to be associated with the eval such as model used, etc.
  • Made the evaluations on the TraceDataset be a list - this is to make it easy to append to and allows for multiple evaluations to be run (including duplicate). While this would cause "squashing" of evals, it doesn't cause data loss in any true sense so is a bit more future proof than say a dict

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mikeldking mikeldking force-pushed the 1691-eval-annotations-in-dataset branch from eda9807 to ea0b642 Compare November 2, 2023 05:03
@mikeldking mikeldking changed the title feat(traces): evaluation annotations on traces for associating spans with evaluations feat(traces): evaluation annotations on traces for associating spans with eval metrics Nov 2, 2023
@mikeldking mikeldking marked this pull request as ready for review November 2, 2023 14:45
@mikeldking mikeldking requested a review from RogerHYang November 3, 2023 01:58
src/phoenix/trace/spans_dataframe_utils.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_eval_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_eval_dataset.py Outdated Show resolved Hide resolved
@axiomofjoy
Copy link
Contributor

Looking good so far.

@axiomofjoy
Copy link
Contributor

I like the name "run" for the output from an evaluation rather than "dataset". It makes it more clear that there can be multiple.

@mikeldking
Copy link
Contributor Author

I like the name "run" for the output from an evaluation rather than "dataset". It makes it more clear that there can be multiple.

Naming is hard - I think on one hand I do like a consistency of language but I hear you - it's a set of EvaluationResults.
Screenshot 2023-11-09 at 2 32 37 PM

@mikeldking mikeldking marked this pull request as draft November 30, 2023 22:58
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
@mikeldking mikeldking marked this pull request as ready for review December 1, 2023 03:56
src/phoenix/trace/trace_evaluations.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_evaluations.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_evaluations.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_evaluations.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Outdated Show resolved Hide resolved
src/phoenix/trace/trace_dataset.py Show resolved Hide resolved
src/phoenix/trace/trace_evaluations.py Outdated Show resolved Hide resolved
@mikeldking mikeldking force-pushed the 1691-eval-annotations-in-dataset branch from c6449a4 to 0bd2737 Compare December 1, 2023 20:31
@mikeldking mikeldking force-pushed the 1691-eval-annotations-in-dataset branch from 1a84535 to 7ea193b Compare December 1, 2023 20:58
@mikeldking mikeldking merged commit a218a65 into main Dec 1, 2023
10 checks passed
@mikeldking mikeldking deleted the 1691-eval-annotations-in-dataset branch December 1, 2023 21:21
mikeldking added a commit that referenced this pull request Dec 1, 2023
…with eval metrics (#1693)

* feat: initial associations of evaluations to traces

* add some documentaiton

* wip: add dataframe utils

* Switch to a single evaluation per dataframe

* make copy the default

* fix doc string

* fix name

* fix notebook

* Add immutability

* remove value from being required

* fix tutorials formatting

* make type a string to see if it fixes tests

* fix test to handle un-parsable

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* change to trace_evaluations

* cleanup

* Fix formatting

* pr comments

* cleanup notebook

* make sure columns are dropped

* remove unused test

---------

Co-authored-by: Xander Song <[email protected]>
mikeldking added a commit that referenced this pull request Dec 1, 2023
…with eval metrics (#1693)

* feat: initial associations of evaluations to traces

* add some documentaiton

* wip: add dataframe utils

* Switch to a single evaluation per dataframe

* make copy the default

* fix doc string

* fix name

* fix notebook

* Add immutability

* remove value from being required

* fix tutorials formatting

* make type a string to see if it fixes tests

* fix test to handle un-parsable

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* change to trace_evaluations

* cleanup

* Fix formatting

* pr comments

* cleanup notebook

* make sure columns are dropped

* remove unused test

---------

Co-authored-by: Xander Song <[email protected]>
mikeldking added a commit that referenced this pull request Dec 4, 2023
* fix: trace dataset to disc

* feat(traces): evaluation annotations on traces for associating spans with eval metrics (#1693)

* feat: initial associations of evaluations to traces

* add some documentaiton

* wip: add dataframe utils

* Switch to a single evaluation per dataframe

* make copy the default

* fix doc string

* fix name

* fix notebook

* Add immutability

* remove value from being required

* fix tutorials formatting

* make type a string to see if it fixes tests

* fix test to handle un-parsable

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* change to trace_evaluations

* cleanup

* Fix formatting

* pr comments

* cleanup notebook

* make sure columns are dropped

* remove unused test

---------

Co-authored-by: Xander Song <[email protected]>

* delete the metadata

* optemize removal of metadata

* shallow copy of dataframe

---------

Co-authored-by: Xander Song <[email protected]>
jlopatec pushed a commit to jlopatec/phoenix that referenced this pull request Dec 4, 2023
* fix: trace dataset to disc

* feat(traces): evaluation annotations on traces for associating spans with eval metrics (Arize-ai#1693)

* feat: initial associations of evaluations to traces

* add some documentaiton

* wip: add dataframe utils

* Switch to a single evaluation per dataframe

* make copy the default

* fix doc string

* fix name

* fix notebook

* Add immutability

* remove value from being required

* fix tutorials formatting

* make type a string to see if it fixes tests

* fix test to handle un-parsable

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* Update src/phoenix/trace/trace_eval_dataset.py

Co-authored-by: Xander Song <[email protected]>

* change to trace_evaluations

* cleanup

* Fix formatting

* pr comments

* cleanup notebook

* make sure columns are dropped

* remove unused test

---------

Co-authored-by: Xander Song <[email protected]>

* delete the metadata

* optemize removal of metadata

* shallow copy of dataframe

---------

Co-authored-by: Xander Song <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[traces][rag] Add the ability to store evaluations alongside a trace dataset
3 participants