I’m an experienced data scientist with almost a decade of employment sucessfully applying scientific methods, to different disciplines, between in industry and academia. I hold three advanced degrees, two in Geophysics (Ph.D., and BSc.), and one in Risk and Uncertainty Management (MRes.) awarded by the University of Liverpool (Liverpool, UK). I typically use Python as my language of choice, given its wide support and integration withing the machine learning community, but I also frequently use shell (Bash/ZSH/Powershell), SQL, HTML/CSS, Rust, and have some experience with JavaScript, Ruby, C++, C#, Perl, Fortran, and MATLAB.
My passion for data science stems from my curiosity about the natural phenomena that shape our planet and the data that can help us comprehend them better. That’s why I obtained a Ph.D. in geophysics, where I investigated seismic waves and their interactions with the Earth’s structure. I utilized Python and MATLAB to process and examine large datasets of seismic recordings (digital time series) from around the world. As a researcher, I authored three peer-reviewed publications in reputable journals.
After completing my degree, I joined a leading seismological observatory as a researcher (University of Utah Seismograph Stations), where I continued to work on seismic data analysis and earthquake magnitude modeling. I cooperated with other scientists from the USGS, AFRL, LLNL, and other institutions to enhance our understanding of earthquake hazards and risks.
I then decided to switch gears and explore other domains where data science can make an impact. I joined Bayer crop sciences as a data scientist, where I worked on models that forecast corn seed yield given historic growth and other agronomic factors. I used Python, SQL, and Domino to manipulate and explore large datasets of crop measurements, weather data, soil data, and satellite imagery. I constructed the predictive models using scikit-learn.
My most recent position was working as a Data Scientist at Coyote Logistics, which is a 3PL service provider where I work on sophisticated predictive models that estimate potential distributions of market cost for a given load of freight. I use Python, SQL (and noSQL), and Databricks to ingest and transform data from various sources such as carriers, shippers, brokers, and third-party APIs. I also use LightGBM, scikit-learn, and PyTorch to build and fit models that capture the uncertainty and variability of the freight market.
Here are some of the projects I've worked on or I'm currently working on:
-
Freight Market Cost Distribution: I was the team lead of the spot pricing team, which developed a cutting-edge machine learning system that used a custom Boosted Gradiant Random Forest algorithm, and LightGBM / H2O.ai Gradient Boosting models, integrated through a live API model service. The model API produced a predicted distribution, that showed the most probable range of costs to transport different types of freight equipment and cargo, to and from distribution centers across the USA, Canada, and Mexico. This model helped us generate millions of dollars of extra revenue, as it enabled faster and more intelligent buying and brokering negotiations at scale.
-
Corn Yield Prediction: Created new and improved existing random forest models to predict corn seed yield across the world, leveraging multi-year spanning historic yields, and agronomic features (e.g., weather conditions, soil conditions, field clusters defined using remote sensing data, etc.).
-
Earthquake Magnitude Estimation: Created physics-informed empirical models of amplitude decay over distance via non-parametric inversions, pre-conditioned using events with known earthquake magnitudes. This method was applied for both Local (Richter), and Moment Magnitudes estimations, with the latter focusing on small magnitude estimation, which is difficult to obtain via conventional physical modeling. This work resulted in two peer-reviewed publications which were published in the Bulletin of the Seismological Society of America, and Seismological Research Letters.
- Portfolio Website: https://sgjholt.github.io - I built myself a portfolio/blog-style website using the Jekyll framework (based on Ruby) which is hosted on Github Pages The website uses Github Actions to automatically redploy the code when a change is pushed to the default branch. Here's a link to the repo: https://github.com/sgjholt/sgjholt.github.io
Here are some of the skills I've learned or improved during my data science journey:
- Programming Languages: Python (advanced), SQL (intermediate), Matlab (intermediate)
- Data Analysis Tools: Pandas (advanced), NumPy (advanced), SciPy (advanced), PySpark (intermediate), Polars (intermediate)
- Data Visualization Tools: Matplotlib (advanced), Seaborn (advanced), Plotly (advanced), Streamlit (intermediate)
- Machine Learning Tools: scikit-learn (advanced), statsmodels (intermediate), PyTorch (basic)
- Cloud Computing Platforms: Azure (advanced), AWS (beginner)
- Data Engineering Tools: Databricks (Airflow (beginner), Docker (beginner), Kubernetes (beginner)
- Version Control Systems: Git, GitHub, Azure DevOps