-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce LLM-based single-table model. #129
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #129 +/- ##
==========================================
- Coverage 80.35% 79.87% -0.48%
==========================================
Files 66 69 +3
Lines 3003 3250 +247
==========================================
+ Hits 2413 2596 +183
- Misses 590 654 +64 ☔ View full report in Codecov by Sentry. |
Some of the comments are still incomplete at the moment, I will add them as soon as possible. In addition, the unit test coverage is insufficient, I will add some test cases. After completing this I will set the PR status to Ready, developers are also welcome to help me improve the above two contents. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use snakecase for the filename
tests/models/test_singletableGPT.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use snakecase for the filename, maybe test_singletable_gpt.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I'll change the filename.
for more information, see https://pre-commit.ci
Split part of the single table gpt model as a base class
for more information, see https://pre-commit.ci
the metadata. | ||
""" | ||
|
||
off_table_features = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of variable off_table_features
is an interesting idea. :)
How about release 0.2.0 for it? |
Good idea, I need make another few changes (google colab examples, readme updates...) before releasing. |
Description
For a long time, LLM has been used to understand and generate various types of data. In fact, LLM also has certain capabilities in tabular data generation. More over, it has some abilities that cannot be achieved by traditional (based on GAN methods or statistical methods) .
In this PR, we introduce
sdgx.models.LLM.single_table.SingleTableGPT.SingleTableGPTModel
, our first synthetic data generation model integrating LLM.Motivation and Context
Compared with existing models,
SingleTableGPTModel
implements two new features:In addition,
SingleTableGPTModel
can directly generate data without complicated and time-consuming steps such as manual labeling and feature engineering, which will save a lot of operator time and allow them to focus on creative work.How has this been tested?
We currently provide some test cases at tests/models/test_singletableGPT.py. This test file contains some content returned by GPT. We will not repeatedly request GPT in the unit test to avoid consuming a large amount of tokens.
I will continue to improve these test cases.
Types of changes
Checklist: