JPMC6: Company Relationship Analysis Tool

Project Overview

The Company Relationship Analysis Tool is an AI-driven project that leverages historical stock price movements to uncover relationships between US companies. We experiment with clustering companies based on industry sectors and stock price correlations, for stock price movements predictions with the hope of improving accuracy and uncovering inter-company relationships.

Objectives and Goals

Identify Intercompany Relationships: Highlight partnerships, shared dependencies, or financial risks.
Enhance Stock Price Prediction: Use advanced clustering and regression techniques to refine predictions.
Visualize Relationships: Employ heatmaps and hierarchical clustering dendrograms to represent correlations and relationships.
Business Impact:
- Investment Insights: Detect partnerships and acquisition opportunities.
- Risk Mitigation: Uncover vulnerabilities in shared dependencies.
- Market Expansion: Facilitate partnerships or mergers.
- Customer Insights: Identify shared customer bases for targeted marketing.
- Crisis Management: Anticipate cascading effects during disruptions.

Methodology

Data Collection and Preparation

Data Source: Historical stock price data from Yahoo Finance for all S&P 500 companies.
Time Period: 4 years of data (September 2020 - September 2024).
Features:
- Daily percent returns calculated to capture relative price movements.
Preprocessing:
- Aggregated data into prices, volume, and daily returns.
- Removed rows with NaNs to ensure data integrity.

Analysis Techniques

Heatmaps:
- Created absolute correlation heatmaps to visualize similarities between companies.
- Generated dissimilarity heatmaps to explore distances between company clusters.
Hierarchical Clustering:
- Used dissimilarity matrices and various linkage methods (e.g., single, complete, ward) to form clusters.
- Visualized clusters with dendrograms to understand company relationships.
Silhouette Analysis:
- Evaluated cluster quality by measuring cohesion and separation.
- Experimented with different numbers of clusters and linkage methods for optimal results.

Modeling

Baseline Model: Predicts the mean of the training set.
XGBoost Models:
- 3 statregies:
  - Trained on all companies and all stock data.
  - Trained a model per cluster (companies clustered by their industry).
  - Trained a model per cluster (companies clustered by hierarchical clustering).
- Used Monday-Thursday data as features to predict Friday returns.

Results and Key Findings

Heatmap Insights:
- Companies within the same sector (e.g., technology, healthcare) show higher correlations.
- Cross-sector relationships highlight unique interdependencies.
Clustering Performance:
- Hierarchical Clustering:
  - Dendrograms provided a detailed view of intercompany relationships.
  - Optimal clusters identified using silhouette analysis.
- Industry-Based Clustering:
  - Produced intuitive results but lacked predictive improvement.
Model Evaluation:
- Baseline Model:
  - RMSE: 4.605
- No Clustering:
  - RMSE: 1.797, R²: 0.85
- Sector-Based Clustering:
  - Average RMSE: 1.786, R²: 0.833
- Hierarchical Clustering:
  - RMSE: 1.779, R²: 0.453

Key Insight: While clustering provided valuable insights into company relationships, it did not significantly improve predictive accuracy. The simplest model (no clustering) performed best overall.

Potential Next Steps

Model Improvements:
- Fine-tune and train the no-clustering model further.
- Experiment with LightGBM and CatBoost for faster training and improved accuracy.
Enhanced Data Integration:
- Incorporate news and social media sentiment analysis.
- Implement dynamic retraining with updated data.
User Interaction:
- Develop interactive interfaces for analysts, including "what-if" scenario tools and company relationship visualization.

Installation

Prerequisites

Python 3.8+
Jupyter Notebook
Required libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- xgboost
- yfinance

Step-by-Step Instructions

Clone the repository:

git clone https://github.com/your-repo-url.git
cd your-repo-folder

Set up a virtual environment (optional but recommended):

python3 -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```
Launch Jupyter Notebook:
```
jupyter notebook
```

Usage

Preprocessing Data

Open preprocess.ipynb and follow the steps to preprocess stock price data.
Ensure you use Yahoo Finance or other APIs for downloading historical stock price data.

Exploratory Analysis

Use matrices_and_heatmaps.ipynb for exploratory data analysis.
Visualize correlation matrices and hierarchical clusters.

Model Training

Train models using:
- naive_xgboost.ipynb for a single model on all data.
- task3a-linkage-silhouette.ipynb to experiment with clustering methods and silhouette analysis.
Evaluate model performance using RMSE and R² metrics.

Results Interpretation

Visualize and analyze results from clustering and XGBoost models in task2_matrices_and_heatmaps_with_task3a.ipynb.

License

Apache License 2.0

Credits and Acknowledgments

Team Members

Sara Deshmukh (Rutgers University - New Brunswick)
Victoria Kim (Virginia Tech)
Alaina Lin (Brown University)
Chelsey Parker (Georgia State University)
Raj Rana (Stevens Institute of Technology)

Advisors

Kassie Papasotiriou
Annita Vapsi
Antony Papadimitriou

Teaching Assistants

Samy Lokanandi
Jesse Dylan Ward

Tools and Libraries

Yahoo Finance API: For retrieving historical stock price data.
Python Libraries:
- pandas: Data manipulation and analysis.
- numpy: Numerical computing.
- scikit-learn: Machine learning and data

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
data		data
data_sectors		data_sectors
.DS_Store		.DS_Store
.gitignore		.gitignore
Dissimilarity.ipynb		Dissimilarity.ipynb
LICENSE		LICENSE
README.md		README.md
main.py		main.py
naive_xgboost.ipynb		naive_xgboost.ipynb
naive_xgboost_Cluster_1(2-Cluster).ipynb		naive_xgboost_Cluster_1(2-Cluster).ipynb
naive_xgboost_Cluster_1.ipynb		naive_xgboost_Cluster_1.ipynb
naive_xgboost_Cluster_2(2-Cluster).ipynb		naive_xgboost_Cluster_2(2-Cluster).ipynb
naive_xgboost_Cluster_2.ipynb		naive_xgboost_Cluster_2.ipynb
naive_xgboost_Cluster_3.ipynb		naive_xgboost_Cluster_3.ipynb
naive_xgboost_per_sector.ipynb		naive_xgboost_per_sector.ipynb
preprocess.ipynb		preprocess.ipynb
task2_matrices_and_heatmaps_with_task3a.ipynb		task2_matrices_and_heatmaps_with_task3a.ipynb
task3a-linkage-silhouette.ipynb		task3a-linkage-silhouette.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JPMC6: Company Relationship Analysis Tool

Project Overview

Objectives and Goals

Methodology

Data Collection and Preparation

Analysis Techniques

Modeling

Results and Key Findings

Potential Next Steps

Table of Contents

Installation

Prerequisites

Step-by-Step Instructions

Usage

Preprocessing Data

Exploratory Analysis

Model Training

Results Interpretation

License

Credits and Acknowledgments

Team Members

Advisors

Teaching Assistants

Tools and Libraries

About

Releases

Packages

Contributors 5

Languages

License

alainalin/JPMC-analysis-tool

Folders and files

Latest commit

History

Repository files navigation

JPMC6: Company Relationship Analysis Tool

Project Overview

Objectives and Goals

Methodology

Data Collection and Preparation

Analysis Techniques

Modeling

Results and Key Findings

Potential Next Steps

Table of Contents

Installation

Prerequisites

Step-by-Step Instructions

Usage

Preprocessing Data

Exploratory Analysis

Model Training

Results Interpretation

License

Credits and Acknowledgments

Team Members

Advisors

Teaching Assistants

Tools and Libraries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages