📈 퍼스트펭귄

Keyword Extraction task를 이용한 KOSPI 키워드 추출 및 KOSPI index 예측

0. Archive

1. Team

Members

고우진_T4006	김상윤_T4036	현승엽_T4231

Contribution

Member	Contribution
고우진(PM)	논문 조사, Embedding model 구현 및 학습, Price prediction model 구현 및 학습
김상윤	데이터 구축 및 처리, Embedding model 구현 및 학습, Demo 제작, Batch serving 구축
현승엽	논문 조사, 데이터 EDA, 검색량 데이터 수집, Embedding model 구현 및 학습

2. Process

3. Data

Namuwiki Text : huggingface에 업로드되어 있는 덤프파일 이용

Seed keyword : 통계청 제공 경제키워드, 논문, 구글링을 통해 KOSPI와 연관성 높은 키워드 지정

네이버 검색량 : 네이버 Developers 데이터랩 API 이용하여 수집

KOSPI index : 야후 파이낸스에서 제공하는 KOSPI(코드 : ^KS11) - yfinance 라이브러리 활용하여 수집

4. Model

For Text Embedding

KLUE RoBERTa large (Link)

RoBERTa 모델을 한국어 데이터(KLUE)를 이용해 pre-training한 언어 모델

KPF-BERT (Link)

한국언론진흥재단에서 구축한 20년치에 달하는 약 4천만 건의 뉴스기사 데이터를 이용해 학습한 모델

KB-ALBERT (Link)

구글의 ALBERT에 경제/금융 도메인에 특화된 대량의 한국어 데이터를 학습시킨 모델

For Predicting KOSPI index

LSTM

5. Demo

서비스 구조

🖥️ Web 예시(Streamlit)

6. How to Use

File Directory

├── codes
│   ├── corr_given_time.py
│   ├── get_anual.py
│   └── inference_price.py
├── dags
│   └── operator_dag.py
├── data
│   ├── 2016keyword.csv
│   ├── 2017keyword.csv
│   ├── 2018keyword.csv
│   ├── 2019keyword.csv
│   ├── 2020keyword.csv
│   ├── 2021keyword.csv
│   ├── 2022keyword.csv
│   ├── ensemble_tomorrow_price.txt
│   ├── final_candi_list.csv
│   ├── final_candi_search_volume.json
│   └── predict_past.csv
├── pages
│   ├── get_keywords.py
│   └── price_inference.py
├── .gitignore
├── README.md
├── main.py
└── requirements.txt

가상환경

# 가상환경 생성
python3 -m venv $ENV_NAME
# 가상환경 활성화
source $ENV_NAME/bin/activate
# 라이브러리 설치
pip3 install --upgrade pip
pip3 install -r requirements.txt
# 가상환경 종료
deactivate

Streamlit

streamlit run main.py

Airflow

# 절대경로로 기본 디렉토리 지정
export AIRFLOW_HOME=~/nlp02
# airflow DB 초기화 -> 기본 파일 생성
airflow db init
airflow users create --username admin --password 1234 --firstname boocam --lastname kim --role Admin --email [email protected]
airflow webserver --port 8080

# 스케줄러 실행
export AIRFLOW_HOME=~/nlp02
airflow scheduler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 퍼스트펭귄

Table of Contents

0. Archive

1. Team

Members

Contribution

2. Process

3. Data

4. Model

For Text Embedding

KLUE RoBERTa large (Link)

KPF-BERT (Link)

KB-ALBERT (Link)

For Predicting KOSPI index

LSTM

5. Demo

서비스 구조

🖥️ Web 예시(Streamlit)

6. How to Use

File Directory

가상환경

Streamlit

Airflow

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
codes		codes
dags		dags
data		data
pages		pages
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

boostcampaitech4lv23nlp1/final-project-level3-nlp-02

Folders and files

Latest commit

History

Repository files navigation

📈 퍼스트펭귄

Table of Contents

0. Archive

1. Team

Members

Contribution

2. Process

3. Data

4. Model

For Text Embedding

KLUE RoBERTa large (Link)

KPF-BERT (Link)

KB-ALBERT (Link)

For Predicting KOSPI index

LSTM

5. Demo

서비스 구조

🖥️ Web 예시(Streamlit)

6. How to Use

File Directory

가상환경

Streamlit

Airflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages