Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

figure out encodings for columns #10

Open
WeihaoGe1009 opened this issue Aug 13, 2024 · 0 comments
Open

figure out encodings for columns #10

WeihaoGe1009 opened this issue Aug 13, 2024 · 0 comments
Assignees

Comments

@WeihaoGe1009
Copy link
Collaborator

Not all columns needs to be encoded by llm tokenizers.

author: definitely encoder or index
time: no need to put in the model for the current stage, but can do some descriptive stats
title, selftext: input values, llm tokenizer
note: intermediate value? llm tokenizer
jurisdictions: MultiLabelBinarizer
relevance: one-hot
poster's legal status: one-hot
predictions (misconception and unclear knowledge): llm tokenizer or MultiLabelBinarizer?
category: one-hot
background: llm tokenizer or one-hot or MultiLabelBinarizer?

how about prompting? tokenize everything with llm? or other ways to predict + decode?

@WeihaoGe1009 WeihaoGe1009 self-assigned this Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant