Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB query optimization and reducing sqlalchemy logs #575

Merged

Conversation

shreyas-damle
Copy link
Collaborator

@shreyas-damle shreyas-damle commented Sep 26, 2024

This includes two fixes:

  1. Query optimization, in order to reduce the execution time for loading UI.
  2. Moved sqlalchemy logging only when logging level is set to debug. See logs below.
  3. Added timeit function for 4 entry point UI functions. This can be used to see how long did it take for backend processing for data generating for dashboard(loader and retriever), app details page(loader and retriever). Time tracking would be done only when logging level is debug.
  4. Fixed import warning.

Query Optimisation Results:
Here’s a table comparing the execution times before and after:

Function Name Execution Time (Before) Execution Time (After)
get_all_loader_apps 6.1778 seconds 1.0898 seconds
get_all_retriever_apps 0.1952 seconds 0.0177 seconds
get_loader_app_details 2.8017 seconds 0.2574 seconds
get_retriever_app_details 0.0142 seconds 0.0140 seconds

SQLAlchemy logging improvement:
With logging level info:

(venv-3-11) ➜  pebblo git:(shreyas-db-query-optimization) ✗ pebblo
DeprecationWarning: 'file' in storage type is deprecated, use 'db' instead
Pebblo server version 0.1.19 starting ...
Downloading topic, entity classifier models ...
Initializing topic classifier ...
 30%|██████████████████████████████████████████████████▍                                                                                                                     | 3/10 [00:02<00:05,  1.35it/s]/Users/shreyasdamle/work/cloud_defense/pebblo/venv-3-11/lib/python3.11/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Initializing topic classifier ... done
Initializing entity classifier ...
Initializing entity classifier ... done
 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                  | 7/10 [00:04<00:01,  1.59it/s]Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.53it/s]
2024-09-26 17:22:24.932 - pebblo.app.config.service - INFO - Starting Pebblo Server with config {'daemon': {'host': 'localhost', 'port': 8000}, 'reports': {'format': 'pdf', 'renderer': 'xhtml2pdf', 'cacheDir': '~/.pebblo', 'anonymizeSnippets': False}, 'classifier': {'mode': 'all', 'anonymizeSnippets': None}, 'logging': {'level': 'INFO', 'file': '/tmp/logs/pebblo.log', 'maxFileSize': 8388608, 'backupCount': 3}, 'storage': {'type': 'file', 'db': None, 'location': '/Users/shreyasdamle/work/cloud_defense/shreyas-damle/pebblo', 'name': 'pebblo'}}
2024-09-26 17:22:24,945 - uvicorn.error - INFO - Started server process [85821]
2024-09-26 17:22:24,945 - uvicorn.error - INFO - Waiting for application startup.
2024-09-26 17:22:24,945 - uvicorn.error - INFO - Application startup complete.
2024-09-26 17:22:24,950 - uvicorn.error - INFO - Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

With logging level as debug:

(venv-3-11) ➜  pebblo git:(shreyas-db-query-optimization) ✗ pebblo --config pebblo/app/config/config.yaml
Pebblo server version 0.1.19 starting ...
Downloading topic, entity classifier models ...
Initializing topic classifier ...
 30%|██████████████████████████████████████████████████▍                                                                                                                     | 3/10 [00:02<00:04,  1.50it/s]/Users/shreyasdamle/work/cloud_defense/pebblo/venv-3-11/lib/python3.11/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Initializing topic classifier ... done
Initializing entity classifier ...
Initializing entity classifier ... done
 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                  | 7/10 [00:04<00:01,  1.67it/s]2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aiapp")
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aidataloader")
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("airetrieval")
2024-09-26 17:23:39,904 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aidatasource")
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aidocument")
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aisnippets")
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("aiuser")
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-09-26 17:23:39,905 INFO sqlalchemy.engine.Engine COMMIT
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at daxa-ai/pebblo-classifier were not used when initializing DistilBertForSequenceClassification: ['classifier.lora_A.default.weight', 'classifier.lora_B.default.weight', 'pre_classifier.lora_A.default.weight', 'pre_classifier.lora_B.default.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.55it/s]
2024-09-26 17:23:41.844 - pebblo.app.config.service - INFO - Starting Pebblo Server with config {'daemon': {'host': 'localhost', 'port': 8000}, 'reports': {'format': 'pdf', 'renderer': 'xhtml2pdf', 'cacheDir': '~/.pebblo', 'anonymizeSnippets': False}, 'classifier': {'mode': 'all', 'anonymizeSnippets': None}, 'logging': {'level': 'DEBUG', 'file': '/tmp/logs/pebblo.log', 'maxFileSize': 8388608, 'backupCount': 3}, 'storage': {'type': 'db', 'db': 'sqlite', 'location': '/Users/shreyasdamle/work/cloud_defense/shreyas-damle/pebblo', 'name': 'pebblo'}}
2024-09-26 17:23:41,856 - uvicorn.error - INFO - Started server process [85955]
2024-09-26 17:23:41,856 - uvicorn.error - INFO - Waiting for application startup.
2024-09-26 17:23:41,856 - uvicorn.error - INFO - Application startup complete.
2024-09-26 17:23:41,858 - uvicorn.error - INFO - Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

Fixed import warning while running UTs:

pebblo/app/models/sqltables.py:14
  /Users/shreyasdamle/work/cloud_defense/shreyas-damle/pebblo/pebblo/app/models/sqltables.py:14: MovedIn20Warning: The ``declarative_base()`` function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
    Base = declarative_base()

@shreyas-damle shreyas-damle changed the title Used in construct and reduced number of queries performed while getti… DB query optimisation and reducing sqlalchemy logs Sep 26, 2024
@shreyas-damle shreyas-damle requested a review from srics September 26, 2024 12:16
@shreyas-damle shreyas-damle force-pushed the shreyas-db-query-optimization branch from d526db2 to fe53ef1 Compare September 26, 2024 15:45
@sridhar-daxa sridhar-daxa changed the title DB query optimisation and reducing sqlalchemy logs DB query optimization and reducing sqlalchemy logs Sep 27, 2024
pebblo/app/storage/sqlite_db.py Outdated Show resolved Hide resolved
Raj725
Raj725 previously approved these changes Sep 27, 2024
@shreyas-damle shreyas-damle merged commit c68f56e into daxa-ai:main Sep 30, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants