Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Automatic Video Analysis with NLP 📺 #4323

Merged
merged 65 commits into from
Feb 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
eed3c3d
i18n
martinb-ai Jan 16, 2023
737cb89
large commit
martinb-ai Jan 16, 2023
6792f8e
fixes with vids
martinb-ai Jan 16, 2023
6a30094
working
martinb-ai Jan 18, 2023
f7f1ef2
more features
martinb-ai Jan 18, 2023
0d377ce
working summarizer
martinb-ai Jan 18, 2023
74cf852
working with urls
martinb-ai Jan 20, 2023
1a96579
many upgrades
martinb-ai Jan 20, 2023
2b848f8
git ignore cache
martinb-ai Feb 21, 2023
57c30f4
fb model
martinb-ai Feb 21, 2023
94b5002
fix chucks
martinb-ai Feb 21, 2023
42690f2
tqdm
martinb-ai Feb 21, 2023
1e67353
trailmap
martinb-ai Feb 23, 2023
b06419b
poetry
martinb-ai Feb 23, 2023
b716a5c
sentiment analysis
martinb-ai Feb 24, 2023
c628aac
refactor
martinb-ai Feb 24, 2023
b211fca
changing to base model default
martinb-ai Feb 24, 2023
effe1b3
rounding
martinb-ai Feb 24, 2023
ca7f251
Merge branch 'develop' into feature/whisper
martinb-ai Feb 24, 2023
1a3e899
renewing poetry
martinb-ai Feb 24, 2023
0a56643
fixed poetry
martinb-ai Feb 24, 2023
7f5b4f9
spacing
martinb-ai Feb 24, 2023
962c397
black
martinb-ai Feb 24, 2023
f0712ae
linting v2
martinb-ai Feb 24, 2023
87d1309
silly imports
martinb-ai Feb 24, 2023
b053bfa
adding tempfile
martinb-ai Feb 24, 2023
7393c85
reordering imports
martinb-ai Feb 24, 2023
14e7532
Merge branch 'develop' into feature/whisper
martinb-ai Feb 24, 2023
80d1eb7
Merge branch 'develop' into feature/whisper
martinb-ai Feb 24, 2023
fee073c
Merge branch 'develop' into feature/whisper
jmaslek Feb 24, 2023
e02e6ff
update spec
jmaslek Feb 24, 2023
062d318
add hidden import
jmaslek Feb 24, 2023
7485cdc
add hidden imports
jmaslek Feb 24, 2023
fd61422
imports again
jmaslek Feb 24, 2023
65c0383
valid url check
martinb-ai Feb 24, 2023
5463761
new lines and prompt
martinb-ai Feb 24, 2023
7a0151a
split imports
martinb-ai Feb 24, 2023
95bf5e9
more spec
jmaslek Feb 24, 2023
dbff054
hook test for pyinstaller
tehcoderer Feb 27, 2023
081defc
Update hook-whisper.py
tehcoderer Feb 27, 2023
695f4e3
oops lol
tehcoderer Feb 27, 2023
8f68393
catch for error message
martinb-ai Feb 27, 2023
345c6e7
fix for frozendict
andrewkenreich Feb 27, 2023
60857ef
caching search and prompting on hub
martinb-ai Feb 27, 2023
b395fbc
whisper model prompts
martinb-ai Feb 27, 2023
af09546
sdk updates
martinb-ai Feb 28, 2023
3d9bc4c
Merge branch 'develop' into feature/whisper
tehcoderer Feb 28, 2023
857fdc9
update dep files
tehcoderer Feb 28, 2023
4f210a0
Update trail_map_forecasting.csv
tehcoderer Feb 28, 2023
c308fcc
import ordering pylint?
martinb-ai Feb 28, 2023
546687d
fix trailing comma
tehcoderer Feb 28, 2023
f894187
Merge branch 'feature/whisper' of https://github.com/OpenBB-finance/O…
tehcoderer Feb 28, 2023
9655b88
sorted trailmaps
tehcoderer Feb 28, 2023
c76484e
update deps
tehcoderer Feb 28, 2023
115ea8e
req files
tehcoderer Feb 28, 2023
1a1c32a
ruff
jmaslek Feb 28, 2023
0131988
mypy
jmaslek Feb 28, 2023
c53723c
ruff v2..
martinb-ai Feb 28, 2023
8c7a84e
ruff v3
martinb-ai Feb 28, 2023
3ca3405
util linting changes
martinb-ai Feb 28, 2023
afb4883
linting for controller
martinb-ai Feb 28, 2023
f39e297
Merge branch 'develop' into feature/whisper
martinb-ai Feb 28, 2023
f384fe2
test
jmaslek Feb 28, 2023
427cab0
see if this sticks
jmaslek Feb 28, 2023
f1a993a
Fix linters. Add --video if not provided
jmaslek Feb 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/intel_macos_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
environment-file: build/conda/conda-3-9-env-full.yaml
activate-environment: build_env
use-only-tar-bz2: true # Needed for caching some reason
- name: Run Poetry
- name: Install requirements
run: |
pip list
python -m pip install -r requirements-full.txt
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,9 @@ darts_logs/
custom_imports/*.csv
custom_imports/*/*.csv

# cache
cache/

# lightning logs
lightning_logs/

35 changes: 35 additions & 0 deletions build/pyinstaller/hooks/hook-whisper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import importlib

import importlib_metadata
from PyInstaller.utils.hooks import copy_metadata

datas = copy_metadata("transformers")
datas += copy_metadata("tokenizers")
datas += copy_metadata("tqdm")
datas += copy_metadata("regex")
datas += copy_metadata("requests")
datas += copy_metadata("packaging")
datas += copy_metadata("filelock")
datas += copy_metadata("numpy")
datas += copy_metadata("torch")

candidates = [
"tensorflow",
"tensorflow-cpu",
"tensorflow-gpu",
"tf-nightly",
"tf-nightly-cpu",
"tf-nightly-gpu",
"intel-tensorflow",
"intel-tensorflow-avx512",
"tensorflow-rocm",
"tensorflow-macos",
"tensorflow-aarch64",
]
for candidate in candidates:
try:
if importlib.util.find_spec(candidate):
datas += copy_metadata(candidate)
break
except importlib_metadata.PackageNotFoundError:
pass
18 changes: 15 additions & 3 deletions build/pyinstaller/terminal.spec
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ added_files = [
(os.path.join(pathex, "user_agent"), "user_agent"),
(os.path.join(pathex, "vaderSentiment"), "vaderSentiment"),
(os.path.join(pathex, "prophet"), "prophet"),
(os.path.join(pathex, "whisper"), "whisper"),
(os.path.join(pathex, "transformers"), "transformers"),
(
os.path.join(pathex, "linearmodels", "datasets"),
os.path.join("linearmodels", "datasets"),
Expand All @@ -77,10 +79,14 @@ added_files = [
(os.path.join(pathex, "blib2to3", "PatternGrammar.txt"), "blib2to3"),
]
if is_win:
added_files.append(
(os.path.join(f"{os.path.dirname(scipy.__file__)}.libs"), "scipy.libs/"),
added_files.extend(
[
(os.path.join(f"{os.path.dirname(scipy.__file__)}.libs"), "scipy.libs/"),
(os.path.join(pathex, "frozendict", "version.py"), "frozendict"),
]
)


# Python libraries that are explicitly pulled into the bundle
hidden_imports = [
"sklearn.utils._cython_blas",
Expand All @@ -98,15 +104,21 @@ hidden_imports = [
"statsmodels",
"user_agent",
"vaderSentiment",
"pyEX",
"feedparser",
"_sysconfigdata__darwin_darwin",
"prophet",
"debugpy",
"scipy.sparse.linalg._isolve._iterative",
"whisper",
"transformers",
"yt_dlp",
"textwrap3",
]


if is_win:
hidden_imports.append("frozendict")

analysis_kwargs = dict(
scripts=[os.path.join(os.getcwd(), "terminal.py")],
pathex=[pathex, "."],
Expand Down
1 change: 1 addition & 0 deletions openbb_terminal/core/config/paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ def get_user_data_directory():
USER_REPORTS_DIRECTORY = USER_DATA_DIRECTORY / "reports"
USER_CUSTOM_REPORTS_DIRECTORY = USER_DATA_DIRECTORY / "reports" / "custom reports"
USER_FORECAST_MODELS_DIRECTORY = USER_DATA_DIRECTORY / "exports" / "forecast_models"
USER_FORECAST_WHISPER_DIRECTORY = USER_DATA_DIRECTORY / "exports" / "whisper"
4 changes: 4 additions & 0 deletions openbb_terminal/core/sdk/models/economy_sdk_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ class EconomyRoot(Category):
`bigmac`: Display Big Mac Index for given countries\n
`bigmac_chart`: Display Big Mac Index for given countries\n
`country_codes`: Get available country codes for Bigmac index\n
`cpi`: Obtain CPI data from FRED. [Source: FRED]\n
`cpi_chart`: Plot CPI data. [Source: FRED]\n
`currencies`: Scrape data for global currencies\n
`events`: Get economic calendar for countries between specified dates\n
`fred`: Get Series data. [Source: FRED]\n
Expand Down Expand Up @@ -52,6 +54,8 @@ def __init__(self):
self.bigmac = lib.economy_nasdaq_model.get_big_mac_indices
self.bigmac_chart = lib.economy_nasdaq_view.display_big_mac_index
self.country_codes = lib.economy_nasdaq_model.get_country_codes
self.cpi = lib.economy_fred_model.get_cpi
self.cpi_chart = lib.economy_fred_view.plot_cpi
self.currencies = lib.economy_wsj_model.global_currencies
self.events = lib.economy_nasdaq_model.get_economic_calendar
self.fred = lib.economy_fred_model.get_aggregated_series_data
Expand Down
1 change: 1 addition & 0 deletions openbb_terminal/core/sdk/models/forecast_sdk_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,4 @@ def __init__(self):
self.theta_chart = lib.forecast_theta_view.display_theta_forecast
self.trans = lib.forecast_trans_model.get_trans_data
self.trans_chart = lib.forecast_trans_view.display_trans_forecast
self.whisper = lib.forecast_whisper_model.transcribe_and_summarize
1 change: 1 addition & 0 deletions openbb_terminal/core/sdk/sdk_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@
theta_view as forecast_theta_view,
trans_model as forecast_trans_model,
trans_view as forecast_trans_view,
whisper_model as forecast_whisper_model,
)
except ImportError:
FORECASTING_TOOLKIT_ENABLED = False
Expand Down
2 changes: 1 addition & 1 deletion openbb_terminal/core/sdk/trail_map.csv
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ econometrics.root,econometrics_model.get_root,econometrics_view.display_root
economy.available_indices,economy_yfinance_model.get_available_indices,
economy.bigmac,economy_nasdaq_model.get_big_mac_indices,economy_nasdaq_view.display_big_mac_index
economy.country_codes,economy_nasdaq_model.get_country_codes,
economy.cpi,economy_fred_model.get_cpi,economy_fred_view.plot_cpi
economy.currencies,economy_wsj_model.global_currencies,
economy.events,economy_nasdaq_model.get_economic_calendar,
economy.fred,economy_fred_model.get_aggregated_series_data,economy_fred_view.display_fred_series
Expand All @@ -203,7 +204,6 @@ economy.search_index,economy_yfinance_model.get_search_indices,
economy.spectrum,economy_finviz_view.display_spectrum,
economy.treasury,economy_econdb_model.get_treasuries,economy_econdb_view.show_treasuries
economy.treasury_maturities,economy_econdb_model.get_treasury_maturities,
economy.cpi,economy_fred_model.get_cpi,economy_fred_view.plot_cpi,
economy.usbonds,economy_wsj_model.us_bonds,
economy.valuation,economy_finviz_model.get_valuation_data,
etf.candle,stocks_helper.display_candle,
Expand Down
1 change: 1 addition & 0 deletions openbb_terminal/core/sdk/trail_map_forecasting.csv
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@ forecast.tcn,forecast_tcn_model.get_tcn_data,forecast_tcn_view.display_tcn_forec
forecast.tft,forecast_tft_model.get_tft_data,forecast_tft_view.display_tft_forecast
forecast.theta,forecast_theta_model.get_theta_data,forecast_theta_view.display_theta_forecast
forecast.trans,forecast_trans_model.get_trans_data,forecast_trans_view.display_trans_forecast
forecast.whisper,forecast_whisper_model.transcribe_and_summarize,
173 changes: 161 additions & 12 deletions openbb_terminal/forecast/forecast_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,61 @@
"Please install the forecast version of the terminal. Instructions can be found "
"under the python tab: https://docs.openbb.co/terminal/quickstart/installation"
)

try:
import whisper
import transformers
from whisper.tokenizer import LANGUAGES, TO_LANGUAGE_CODE
from openbb_terminal.forecast.whisper_utils import str2bool

transformers_ver = transformers.__version__
# if imports are successful, set flag to True
WHISPER_AVAILABLE = True

except ModuleNotFoundError:
raise ModuleNotFoundError(
"Please use poetry to install latest whisper model and dependencies. \n"
"poetry install -E forecast \n"
"\n"
"If you are not using poetry, please install whisper model. Instructions can be found here: \n"
"https://github.com/openai/whisper \n"
"Please install the transformers library with the following command: \n"
"pip install transformers \n"
)

import pandas as pd
import psutil


# ignore pylint(ungrouped-imports)
# pylint: disable=ungrouped-imports

from openbb_terminal import feature_flags as obbff
from openbb_terminal.common import common_model

from openbb_terminal.core.config.paths import (
USER_CUSTOM_IMPORTS_DIRECTORY,
USER_EXPORTS_DIRECTORY,
USER_FORECAST_WHISPER_DIRECTORY,
)
from openbb_terminal.custom_prompt_toolkit import NestedCompleter
from openbb_terminal.decorators import log_start_end

from openbb_terminal.helper_funcs import (
check_positive,
check_positive_float,
NO_EXPORT,
EXPORT_ONLY_FIGURES_ALLOWED,
EXPORT_ONLY_RAW_DATA_ALLOWED,
log_and_raise,
valid_date,
parse_and_split_input,
)

from openbb_terminal.menu import session
from openbb_terminal.parent_classes import BaseController
from openbb_terminal.rich_config import console, MenuText

from openbb_terminal.forecast import (
anom_view,
autoarima_view,
Expand All @@ -60,19 +104,8 @@
tft_view,
theta_view,
trans_view,
whisper_model,
)
from openbb_terminal.helper_funcs import (
EXPORT_ONLY_FIGURES_ALLOWED,
EXPORT_ONLY_RAW_DATA_ALLOWED,
NO_EXPORT,
check_positive,
check_positive_float,
log_and_raise,
valid_date,
)
from openbb_terminal.menu import session
from openbb_terminal.parent_classes import BaseController
from openbb_terminal.rich_config import MenuText, console

logger = logging.getLogger(__name__)
empty_df = pd.DataFrame()
Expand Down Expand Up @@ -133,6 +166,7 @@ class ForecastController(BaseController):
"which",
"nhits",
"anom",
"whisper",
]
pandas_plot_choices = [
"line",
Expand Down Expand Up @@ -228,6 +262,22 @@ def get_dataset_columns(self):
for column in dataframe.columns
}

def parse_input(self, an_input: str) -> List:
"""Parse controller input

Overrides the parent class function to handle YouTube video URL conventions.
See `BaseController.parse_input()` for details.
"""
# Filtering out YouTube video parameters like "v=" and removing the domain name
youtube_filter = r"(youtube\.com/watch\?v=)"

custom_filters = [youtube_filter]

commands = parse_and_split_input(
an_input=an_input.replace("https://", ""), custom_filters=custom_filters
)
return commands

def update_runtime_choices(self):
# Load in any newly exported files
self.DATA_FILES = forecast_model.get_default_files()
Expand Down Expand Up @@ -309,6 +359,9 @@ def print_help(self):
mt.add_raw("\n")
mt.add_info("_anomaly_")
mt.add_cmd("anom", self.files)
mt.add_raw("\n")
mt.add_info("_misc_")
mt.add_cmd("whisper", WHISPER_AVAILABLE)

console.print(text=mt.menu_text, menu="Forecast")

Expand Down Expand Up @@ -3219,3 +3272,99 @@ def call_anom(self, other_args: List[str]):
start_date=ns_parser.s_start_date,
end_date=ns_parser.s_end_date,
)

@log_start_end(log=logger)
def call_whisper(self, other_args: List[str]):
"""Utilize Whisper Model to transcribe a video. Currently only supports Youtube URLS"""
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
add_help=False,
prog="whisper",
description="""
Utilize Whisper Model to transcribe a video. Currently only supports Youtube URLS:
https://github.com/openai/whisper
""",
)
parser.add_argument(
"--video",
dest="video",
type=str,
default="",
help="video URLs to transcribe",
)
parser.add_argument(
"--model_name",
dest="model_name",
choices=whisper.available_models(),
default="base",
help="name of the Whisper model to use",
)
parser.add_argument(
"--subtitles_format",
dest="subtitles_format",
type=str,
choices=["vtt", "srt"],
help="the subtitle format to output",
)
parser.add_argument(
"--verbose",
dest="verbose",
type=str2bool,
default=False,
help="Whether to print out the progress and debug messages",
)
parser.add_argument(
"--task",
dest="task",
type=str,
choices=["transcribe", "translate"],
help="whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')",
)
parser.add_argument(
"--language",
dest="language",
type=str,
default=None,
choices=sorted(LANGUAGES.keys())
+ sorted([k.title() for k in TO_LANGUAGE_CODE.keys()]),
help="language spoken in the audio, skip to perform language detection",
)
parser.add_argument(
"--breaklines",
dest="breaklines",
type=int,
default=0,
help="Whether to break lines into a bottom-heavy pyramid shape if line length exceeds N characters. 0 disables line breaking.",
)
parser.add_argument(
"--save",
dest="save",
type=str,
default=USER_FORECAST_WHISPER_DIRECTORY,
help="Directory to save the subtitles file",
)

parser = self.add_standard_args(
parser,
)
if other_args and "--video" not in other_args:
other_args.insert(0, "--video")
ns_parser = self.parse_known_args_and_warn(
parser,
other_args,
)

if ns_parser:
if ns_parser.save is None:
ns_parser.save = USER_FORECAST_WHISPER_DIRECTORY

whisper_model.transcribe_and_summarize(
video=ns_parser.video,
model_name=ns_parser.model_name,
subtitles_format=ns_parser.subtitles_format,
verbose=ns_parser.verbose,
task=ns_parser.task,
language=ns_parser.language,
breaklines=ns_parser.breaklines,
output_dir=ns_parser.save,
)
Loading