Welcome to the Storytelling repository! This repository is dedicated to the Storytelling discipline taken in the 1stsemester of 2024 in the Data Science and Humanistic Artificial Intelligence undergraduate program at the Pontifical Catholic University of São Paulo (PUC-SP).
We extend our sincere gratitude to Professor ✨ Rooney Ribeiro Albuquerque Coelho for his invaluable guidance and expertise throughout this course. His dedication to excellence in teaching has been instrumental in shaping our understanding of storytelling and data science.
- About Storytelling
- Projects
- Resources
- Readings
- Python Libraries for Data Science and Artificial Intelligence
- How to Run the Code
- Contributing
- License
Storytelling is the art of telling stories, essential in Data Science and Artificial Intelligence for effectively communicating findings and insights.
Through Storytelling, students learn to transform raw data into stories that inform, persuade, and inspire. This is crucial because while data analysis can reveal valuable insights, these insights are useless if they cannot be effectively communicated.
The Storytelling discipline in the Data Science and Artificial Intelligence undergraduate program at PUC-SP aims to equip students with the necessary skills to tell effective stories with data.
-
Key Storytelling concepts include:
-
Narrative: The structure and flow of the story you are telling with your data.
-
Data Visualization: The graphical representation of data to highlight trends and patterns.
-
Context: The background information that helps frame and interpret the data .
-
Simplicity: The ability to convey complex information in a simple and easy-to-understand manner.
-
Here, you will find a variety of projects developed during the course. ach project is an opportunity to explore and apply the concepts learned in the classroom, allowing us to tell powerful stories with data.
Explore notable projects developed during the course. Each project is an opportunity to apply classroom concepts and create compelling data stories.
✨ Highlighted Projects
This repository also contains links to the main libraries and platforms used during the course, essential for the development of our projects and to enhance our skills in data science and storytelling.
If you're looking for more resources related to storytelling and data science, check out the following:
✨ Additional Resources
Additionally, we provide files of books studied during the program. These readings complement our classroom learning into the art and science of storytelling.
-
Python Data Science Handbook, Essential Tools for Working with Data - Jake VanderPlas
-
Vector Quantized Predictive Models for Planning - Scales to DeepMind Lab
Fundamental libraries for data manipulation, mathematical computations, and general support:
-
NumPy: Python library used for working with arrays. It also has functions for working in the domain of linear algebra, Fourier transform, and matrices.
-
Pandas: Library for data manipulation and analysis, providing data structures like DataFrames for handling tabular data.
-
Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations in Python.
-
Seaborn: High-level interface for creating informative and attractive statistical graphics.
-
SciPy: Library for scientific and technical computing, providing functions for optimization, integration, interpolation, eigenvalue problems, and other mathematical tasks.
Libraries for statistical data analysis and modeling:
-
Statsmodels: Provides classes and functions for estimating statistical models, as well as conducting statistical tests.
-
Pingouin: Statistical tests, effect sizes, and Bayesian analysis in Python.
-
PyMC: Probabilistic programming framework for Bayesian statistical modeling and machine learning.
-
Scipy.stats: Functions for statistical analysis and hypothesis testing, including distributions, tests, and more.
-
Reliability: Python package for reliability analysis and statistical modeling.
Widely used tools for machine learning and neural networks:
-
Scikit-learn: A simple and efficient tool for data mining and data analysis, featuring a wide variety of machine learning algorithms.
-
TensorFlow: An open-source framework for machine learning and deep learning, often used for training neural networks.
-
Keras: A high-level neural networks API that runs on top of TensorFlow, designed for fast prototyping.
-
PyTorch: A deep learning framework for flexibility and performance in neural network modeling.
-
LightGBM: A gradient boosting framework that is particularly effective with large datasets.
-
XGBoost: A powerful, efficient implementation of gradient boosting algorithms.
-
CatBoost: An algorithm for categorical data that helps create high-performance machine learning models.
Libraries for working with natural language:
-
SpaCy: Industrial-strength NLP library, known for its speed and accuracy.
-
Transformers: State-of-the-art natural language processing (NLP) models, including BERT, GPT-3, and more.
-
NLTK: A toolkit for working with human language data, providing easy access to text processing libraries.
-
Gensim: Topic modeling and document similarity analysis.
Libraries and tools for working with image data:
-
OpenCV: A library for computer vision and machine learning, offering tools for image processing and manipulation.
-
PyTorch Vision: A collection of computer vision tools integrated with PyTorch for image-based tasks.
-
TensorFlow Image: Image processing functions for TensorFlow, including resizing, cropping, and filtering.
-
Keras Applications: Pre-trained deep learning models for computer vision tasks, such as image classification.
-
Albumentations: An image augmentation library that provides various transformations for image preprocessing.
-
SimpleCV: A framework for building computer vision applications using Python.
Libraries for creating graphs, dashboards, and interactive maps:
-
Matplotlib: A comprehensive library for creating static, animated, and interactive visualizations.
-
Seaborn: High-level interface for drawing attractive statistical graphics.
-
Plotly: A graphing library for creating interactive visualizations.
-
Bokeh: Visualization library for creating interactive plots and dashboards.
-
Altair: Declarative visualization library for statistical data visualization.
-
Dash: A framework for creating interactive, web-based data dashboards.
Libraries for geospatial data analysis and mapping:
-
Geopandas: Python library for working with geospatial data, including tools for geometric operations and map visualization.
-
Shapely: A library for manipulation and analysis of geometric shapes.
-
Folium: A library for creating interactive maps with Leaflet.js.
-
Kepler.gl: A powerful geospatial data visualization tool for large-scale data exploration.
-
Cartopy: A library for cartographic projections and geospatial data visualization.
-
Pyproj: A library for performing coordinate transformations and projections.
-
Rasterio: Library for reading and writing geospatial raster data.
-
OSMnx: Tools for downloading and analyzing street networks from OpenStreetMap.
-
Geopy: Geocoding library for performing forward and reverse geocoding.
Tools for preparing technical documents:
-
Overleaf: Online LaTeX editor for collaborative writing of technical documents.
-
Jupyter Notebooks: Web-based interactive environment for data analysis, combining code and rich text.
Libraries that simplify model training:
-
H2O.ai: Open-source machine learning and AutoML platform for building models at scale.
-
TPOT: AutoML tool based on genetic algorithms to optimize machine learning pipelines.
-
Auto-sklearn: AutoML system for scikit-learn that automatically selects models and tunes hyperparameters.
Libraries for time series data:
-
Statsmodels: Provides tools for time series analysis, regression, and statistical modeling.
-
Prophet: Forecasting tool from Facebook for handling time series data.
-
Darts: Library for deep learning-based time series forecasting.
Tools for business intelligence, reporting, and dashboarding:
- Power BI: Business analytics tool for creating interactive reports and dashboards from data.
Additional tools for specialized analyses:
-
Orange: Open-source data visualization and analysis tool, designed for both novice and expert users.
-
BeautifulSoup: Library for parsing HTML and XML documents and extracting data.
-
Scrapy: Framework for building web scrapers and extracting data from websites.
- Clone este repositório
git clone https://github.com/seu-repositorio/storytelling-2024.git
- Instale as dependências:
pip install -r requirements.txt
- Execute os scripts no diretório principal:
python main_script.py
We welcome contributions to this project! If you'd like to contribute, please follow these steps:
- Fork the repository: Click the "Fork" button at the top of the repository page to create your own copy of the project.
- Clone the repository: Clone your forked repository to your local machine using the following command:
git clone https://github.com/your-username/your-forked-repository.git