Skip to content

This project was re-uploaded for employers to view and will be deleted soon. Original repo is private.

License

Notifications You must be signed in to change notification settings

TeddTech/NLP_Firm_Prediction

Repository files navigation

Sauder School of Business Capstone Project 2018

Ted Thompson    

Python

Overview

The UBC Sauder School of Business group has a large database from the U.S. Securities and Exchange Commission (SEC) fillings, Wharton financial fundamental data, and stock price history. In this project, we seek to build a framework for leveraging SEC filings to obtain industry intelligence. Specifically, we are given two prediction problems: classification of firm survival and predicting firm performance.

Data

This is a data-intensive project, you can find all files required to run the project in the /data/ folder.

Many of these can and should be changed depending on the target of your analysis. Please see the data README herefor more information.

Scripts

This project can be executed in two main ways, either by using Make (documented here), or by running the python scripts individually (documented here).

A brief overview of the data Pipeline can be seen below:

LINK TO WORKFLOW IMAGE

Tests

An overview of all unit tests for every function in the project could be found in here

Techniques

Natural Language Processing:

Topic Analysis We use LDA, and NMF to try and model the topics found in item1 and item7 from the SEC filings.

Word distribution testing with 10 topics: Below is the word distribution of CV-LDA with 1000 filings. The selected topic is topic 2 - medical.

Sentiment Analysis We extract polarity, subjectivity and certainty scores in item1 and item7 from the SEC filings.

An example can be seen as below:

Dependencies

Contributing

To contribute to this project, please see our Contributing Guidelines

Code of Conduct

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

For more information, please see our Code of Conduct

About

This project was re-uploaded for employers to view and will be deleted soon. Original repo is private.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published