Sauder School of Business Capstone Project 2018

Ted Thompson

Overview

The UBC Sauder School of Business group has a large database from the U.S. Securities and Exchange Commission (SEC) fillings, Wharton financial fundamental data, and stock price history. In this project, we seek to build a framework for leveraging SEC filings to obtain industry intelligence. Specifically, we are given two prediction problems: classification of firm survival and predicting firm performance.

Data

This is a data-intensive project, you can find all files required to run the project in the /data/ folder.

Many of these can and should be changed depending on the target of your analysis. Please see the data README herefor more information.

Scripts

This project can be executed in two main ways, either by using Make (documented here), or by running the python scripts individually (documented here).

A brief overview of the data Pipeline can be seen below:

Tests

An overview of all unit tests for every function in the project could be found in here

Techniques

Natural Language Processing:

Topic Analysis We use LDA, and NMF to try and model the topics found in item1 and item7 from the SEC filings.

Word distribution testing with 10 topics: Below is the word distribution of CV-LDA with 1000 filings. The selected topic is topic 2 - medical.

Sentiment Analysis We extract polarity, subjectivity and certainty scores in item1 and item7 from the SEC filings.

An example can be seen as below:

Dependencies

Contributing

To contribute to this project, please see our Contributing Guidelines

Code of Conduct

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

For more information, please see our Code of Conduct

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
bin		bin
data		data
doc		doc
logs		logs
proposal		proposal
results		results
src		src
tests		tests
.DS_Store		.DS_Store
CITATION.md		CITATION.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sauder School of Business Capstone Project 2018

Ted Thompson

Overview

Data

Scripts

Tests

Techniques

Natural Language Processing:

Dependencies

Contributing

Code of Conduct

About

Releases

Packages

Languages

License

TeddTech/NLP_Firm_Prediction

Folders and files

Latest commit

History

Repository files navigation

Sauder School of Business Capstone Project 2018

Ted Thompson

Overview

Data

Scripts

Tests

Techniques

Natural Language Processing:

Dependencies

Contributing

Code of Conduct

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages