- this is a repo that contains general development tools that i use, like, or want to explore
- currently has two sections:
- general resources
- training and general ed
-
Geocoding / geographical
- Keplr: dd for for geocoding: https://kepler.gl/
- Pandas Moving -> https://anitagraser.github.io/movingpandas/
- GeoPandas
- geopy
- geog (https://pypi.org/project/geog/)
- Collection of python geospatial _>
- Article about using SingleStore https://medium.com/@VeryFatBoy/using-singlestore-as-a-geospatial-database-28ddf92684af
- Enrichment of geographic data:
- General data by zipcode
- zipcode / zcta / judicial crosswalk for ACS data: https://github.com/censusreporter/acs-aggregate/tree/master/crosswalks
- zipcode to FIPS crosswalk: https://www.huduser.gov/portal/datasets/usps_crosswalk.html (select
ZIP-COUNT
in dropdown) - https://censusreporter.org/
- https://github.com/censusreporter/census-api/blob/master/API.md
- Demographics
- Economics
- Families
- Housing
- Social
- Health insurance
- Poverty (food stamps / SNAP)
- Historical
- Weather
- Historical weather data -> https://www.visualcrossing.com/weather-api
- COVID
- https://apidocs.covidactnow.org/#register // historical covid with state or FIP code
- Weather
- Health + Disparities ->
- https://www.neighborhoodatlas.medicine.wisc.edu/
- Created my own github repo for easy access to data files: https://github.com/hantswilliams/countyhealthrankings
- https://www.countyhealthrankings.org/
- https://www.neighborhoodatlas.medicine.wisc.edu/
- Food Environments (USDA)
- Food Retail Stores
- Example NY -> https://catalog.data.gov/ne/dataset/retail-food-stores
- General data by zipcode
- Regulated Angencies
- https://www.ttb.gov/ (wine, alcohol, fuel, guns, etc...)
- Examples:
- Washington DC liquer -> https://opendata.dc.gov/datasets/liquor-licenses/explore?location=38.902497%2C-77.008884%2C12.73
-
Data Storage
- De-centralized cloud storage
- https://www.storj.io/
- Very cool - I would want to do this for encrypting messages in future
- https://www.storj.io/
- Data reading/writing made easy (GCP, Azure, AWS -> )
- De-centralized cloud storage
-
AWS specific / labs
- Data Wrangler - https://github.com/awslabs/aws-data-wrangler
- Labs- https://github.com/awslabs
-
Deployement
- Quick deployment of localhost test -> https://ngrok.com/
-
Diagrams:
- AWS diagram creator https://alanblackmore.medium.com/aws-diagram-creator-8f596052952c
-
Data
- Onboarding and Ingestion (ETL/ELT)
- Flatfile Data Onboarding platform // https://flatfile.io/
- Fivetran Cloud data integration platform // https://fivetran.com/
- Matillion Cloud data integration platform
- Apache Gobblin Open Source distributed data integration framework
- Singer "Open Source standard for writing scripts that move data" // https://www.singer.io/
- Meltano Open Source ELT for the DataOps
- Airbyte Open Source data integration platform // https://airbyte.io/
- Stitch Simple, extensible Cloud ETL platform (Talend) // https://www.stitchdata.com/
- Hevo No-code data pipeline as a service
- Apache Hop Open Source data integration platform project
- Meroxa Real-time data ingestion infrastructure
- Portable Cloud Hosted ELT Platform
- Talend, StreamSets, Alooma (Google), Xplenty, Striim, Panoply, Stambia, HVR
- Transformations
- dbt -> transformations / https://www.getdbt.com/
- apache beam
- Data Lineage
- Pachyderm - Pachyderm
- Data management
- Dataframe.ai - https://josephmoon.medium.com/dataframe-ai-a-comprehensive-data-context-management-tool-for-modern-data-teams-df47c8a1ce17 (dataframe.ai)
- Have also built a search-like tool called WHALE -> https://github.com/hyperqueryhq/whale
- Data Catalogs
- Good comparison - - data hub / atlas / amunddsen -> https://medium.com/@gosin/finding-the-right-data-catalog-solution-a265a4b3c0c3
- https://medium.com/hipay-tech/setting-up-a-data-discovery-tool-why-and-which-solution-to-choose-5e03fcbed458
- Open Meta Data (https://blog.open-metadata.org/openmetadata-0-8-0-release-ca09bd2fbf54)
- Onboarding and Ingestion (ETL/ELT)
-
APIs
- Multiple/APIs services (get multiple)
- Public API search (new): https://publicapis.sznm.dev/
- https://listt.xyz/
- https://m3o.com/
- Communication focused: https://www.nylas.com/
- Payments
- Stripe Connect - https://stripe.com/connect
- Stripe Payment Link - https://stripe.com/payments/payment-links
- Security
- Typing biometric - https://www.typingdna.com/
- Scheduling
- Speech/NLP
- Food
- News
- Signatures
- Multiple/APIs services (get multiple)
-
Frontend
- Multi-deployment
- flutter: https://flutter.dev/
- flutterflow - https://flutterflow.io/
- Multi-deployment
-
Authorization
-
Automation
- Airflow
-
Notebook tools
- Hex // https://github.com/louislam/uptime-kuma
- Deepnote //
- PolyNote - https://medium.com/dataseries/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks-4bdab6b8d0ae
- Resources / collections:
-
Statistics and Data Manipulation
- Summary Statistics - Python Sidecar - https://levelup.gitconnected.com/sidetable-an-efficient-tool-to-summarize-pandas-dataframe-330958528a82
- Test selection
- Dates
- Timeseries
- Pinguin - stats - https://github.com/raphaelvallat/pingouin
- Repeated Measures (highly reputable professor): https://www.drizopoulos.com/courses/EMC/CE08.pdf
- Some of the best book(s)
- Handbook of Parametric and Nonparametric Statistical Procedures, Fifth Edition
- Data checks/schema/types:
-
Linux
- Command cheetsheets
-
Visualization
- Superset
- Streamlit
- Streamlit with Sweetviz - https://discuss.streamlit.io/t/this-is-how-to-use-sweetviz-with-streamlit/10897
-
SaaS starter kits
- Next JS
- Enterprise: https://nextlessjs.com/
- Free (same author): https://github.com/ixartz/Next-js-Boilerplate
- Codebase generator (you get to choose backend, frontend framework, deployemnt, etc...)
- Other paid:
- Free
- Hants Ordering:
- Free:
- https://nextacular.co/
- Billing: Stripe
- Documentation: Limited // work in progress // https://docs.nextacular.co/
- Deployment: vercel (auto SSL)
- Databases: only relational (SQL/PostgreSQL/Aurora)
- Pros: multi-domain; DB -> relational; teams + workspaces; strip; tailwind; email handling
- https://www.saasstarterkit.com/
- Billing: Stripe
- Documentation: Good // mostly flushed out // https://docs.saasstarterkit.com/docs/intro/welcome/
- Deployment: on your own
- Databases: relational (postgres) and non-relational (mongoDB)
- Pros: ML example buil in; on-boarding; docker; stripe; AWS APIs;
- https://nextacular.co/
- Paid:
- https://bedrock.mxstbr.com/ (396 p/project)
- https://serverless.page/ (199 lifetime)
- https://reactapp.dev/ (19 lifetime)
- https://saasrock.com/ (149 lifetime)
- https://nextlessjs.com/ (699 p/project)
- Free:
- Next JS
-
Low Code / No code
- Backend
- Supabase
- Parse - https://parseplatform.org/
- Appwrite - https://appwrite.io/
- Nhost - https://nhost.io
- Hasura
- Pocketbase - https://github.com/pocketbase/pocketbase
- Frontend
- AppGyver
- Backend
-
Data bases DBs
- Modern:
- FaunaDB - https://medium.com/codesphere-cloud/is-faunadb-the-next-big-database-technology-4c5a67915d6e
- SingleStore - https://www.singlestore.com
- Apache Druid / would view this as a competitor to SingleStore - https://druid.apache.org/
- RethinkDB - live DB - https://rethinkdb.com
- Firebase
- DyanmoDB
- Modern:
-
Frontend - Styles
- Fonts / Styles
- Examples of good stuff
-
Opensource alternatives:
- Airflow:
- Calendly:
-
Documentation
- Docusaurus - https://docusaurus.io
- Docz - https://www.docz.site
- API documentation
- readme.com / https://docs.readme.com/
- redoc (redocly) / https://redocly.com/ - https://github.com/Redocly/redoc
-
UUID - url friendly
-
Search tools:
- Meili search tool
- Typesense https://typesense.org/
- DeepHaven - https://deephaven.io
- Elastic search - https://diawahad.medium.com/elasticsearch-the-open-source-distributed-restful-json-based-search-engine-ready-for-the-big-640430fd655b
-
Testing
- Chaos Engineering
-
Monitoring tools
- Uptime Kuma - https://github.com/louislam/uptime-kuma
-
Security
- Cloud checks
- Scout Suite (cloud specific)
- Identity management
- Secrets management
- Doppler - https://www.doppler.com
- Vault
- Dealing with multi-cloud account / going back and forth
- Pen testing
- Metasploit (now managed by rapid7) - https://GitHub.com/rapid7/metasploit-framework
- VM hackable/volunerable machine testing - metasploitable - https://github.com/rapid7/metasploitable3
- Onion browser - https://github.com/OnionBrowser/OnionBrowser
- python based
- non-python based
- https://github.com/enaqx/awesome-pentest (most starred)
- https://github.com/coreb1t/awesome-pentest-cheat-sheets
- https://github.com/jesusprubio/awesome-nodejs-pentest
- https://github.com/CyberSecurityUP/Awesome-Cloud-PenTest
- https://github.com/CyberSecurityUP/Awesome-PenTest-Practice
- Cloud checks
-
CAAS (cloud as a service?):
- my own (based on pulumi and aws):
- Openstack: https://www.openstack.org/software/
- Cloudstack: https://cloudstack.apache.org/
- OpenNebula: https://opennebula.io/evaluate-opennebula/#try_now
-
IAAS:
- Pulumi
- Terraform related tools:
- Infracost // https://github.com/infracost/infracost
- Brainbooard - automatically create terraform // https://www.brainboard.co/
- Checkgov - Checkov - looks for config errors
- Ansible
- devOps examples: https://github.com/geerlingguy/ansible-for-devops
- playbook best examples: https://github.com/ansible/ansible-examples
-
Data stacks
- Article - https://www.datafold.com/blog/the-modern-data-stack-open-source-edition
- Github - good list - > https://github.com/victorcouste/data-tools#ingestion
-
Webscraping
-
Kubernet tools
- Helm - Helm
- Knative - https://knative.dev/docs/
- Kubeflow - Kubeflow
- Crossplane - https://crossplane.io
-
ML
-
Primary: tensorflow (google)
-
Primary: pyTorch (facebook)
-
Cheat sheet
-
Labeling tool
-
Bias
-
Drift/Monitoring
-
Scary Usecases
-
Explainable
-
Forecasting
-
Features
-
NLP - langauge translation
- meta open source FairSeq: https://github.com/facebookresearch/fairseq/tree/nllb
-
NLP
- Spacy
- NER annotation - https://github.com/vopani/waveton/tree/main/apps/data_apps/ner_annotation
-
NLP + Science Journals
- Pubmed Analysis - https://github.com/bepnye/EBM-NLP
- Taking articles and converting to JSON - > https://github.com/allenai/s2orc-doc2json
- Similarities across papers - https://github.com/stephenleo/stripnet
- Analyze PDFs -> tables, figures, etc... = https://github.com/Layout-Parser/layout-parser
- NLP with graph and human search - https://github.com/boopalanjayaraman/athena - https://www.tigergraph.com/graph-for-all/winners/third-place-most-impactful/
-
Premade ML/AI
-
AutoML
- MLJar
- https://auto.gluon.ai/stable/index.html
- http://epistasislab.github.io/tpot/
- https://github.com/mljar/mljar-supervised
- AWS solution - AWS Sagemaker Autopilot - https://aws.amazon.com/sagemaker/autopilot/
-
Recommendation
- Microsoft best practices - https://github.com/microsoft/recommenders
- Suprise - http://surpriselib.com/
- Collaborative
- Good walkthrough of collaborative - https://sunjackson.github.io/2016/05/30/3cca3bba88363e21bbe6e536e4178018/
- https://github.com/benfred/implicit
-
Session Based Recommendations
- Competition: https://recsys.acm.org/challenges/, http://www.recsyschallenge.com/2022/
- Python Notebooks for Session-based:
- NextItNet, GRU4Rec, Caser https://github.com/microsoft/recommenders/blob/main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb
- SR-GNN https://colab.research.google.com/drive/1XwQ0njqSZL8vbHJMnRRHlH4ar0kYYFVz?usp=sharing
- SASRec https://github.com/microsoft/recommenders/blob/main/examples/00_quick_start/sasrec_amazon.ipynb
- Transformer4Rec https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/main/examples/tutorial/03-Session-based-recsys.ipynb
- Python Libraries for Session-based:
-
Computer Vision
- Text to Pictures -> https://github.com/lucidrains/DALLE2-pytorch
- Text to Picture (Free) -> https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image
- Image augmentation (FB) -> https://github.com/facebookresearch/AugLy
- Bunch of random faces for training -> https://github.com/microsoft/FaceSynthetics
- Simple facial recognition -> https://github.com/ageitgey/face_recognition
- Real time tracking -> https://github.com/tryolabs/norfair
- Item tracking --> DEEPSORT --> https://learnopencv.com/understanding-multiple-object-tracking-using-deepsort/
- Digital Cloning Examples
- Toolkit: https://github.com/titanlambda/identity-cloning-toolkit-ICT
- Voice cloning: https://github.com/CorentinJ/Real-Time-Voice-Cloning
- Voice cloning: https://github.com/BenAAndrew/Voice-Cloning-App
- Paid service: https://www.synthesia.io/
-
Github -
Awesome
ML repos-
General ML - https://github.com/EthicalML/awesome-production-machine-learning
-
General jupyter resources 1 - https://github.com/markusschanta/awesome-jupyter
-
General jupyter resources 2 - https://github.com/ml-tooling/best-of-jupyter
-
Deep learning (fakes, audio video, pose, etc...) - https://github.com/tugstugi/dl-colab-notebooks
-
-
Notebook examples
- General Notebooks/starter templates (GoogleSheets,Airtable,Sendgrid,Slack,etc..) https://github.com/jupyter-naas/awesome-notebooks
- https://github.com/trekhleb/homemade-machine-learning
- https://github.com/lazyprogrammer/machine_learning_examples
- https://github.com/susanli2016/Machine-Learning-with-Python
- Tensorflow -- https://github.com/aymericdamien/TensorFlow-Examples
- Pytorch -- https://github.com/pytorch/examples
- AWS sagemaker examples -- https://github.com/aws/amazon-sagemaker-examples
-
-
Datasets for testing:
-
Free Cloud Resources
- ORACLE - https://www.oracle.com/cloud/free/#always-free
- VERCEL - has always free
- HEROKU
-
Resources
- Blogs
- Podcasts:
-
Training and learning stuff:
-
Arch design / good book - https://bytebytego.com/ - system design the big archcive - https://media-exp2.licdn.com/dms/document/C561FAQHBrAeW02s_yw/feedshare-document-pdf-analyzed/0/1655057292746?e=1655942400&v=beta&t=PhyXPS-R-LcFIPYLrkfgiEXoiLKxNhSwIRg2nlClaSA
-
Quick read on architecture -> https://orkhanscience.medium.com/software-architecture-patterns-5-mins-read-e9e3c8eb47d2
-
ML learning with scikit learn: https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn
-
More Ai Visual Exapmle -> http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
-
Ml fun example -> https://medium.com/mlearning-ai/if-i-buy-a-diaper-i-will-surely-pick-up-a-beer-e692895a0c65
-
SSL Tunneling -> https://goteleport.com/blog/ssh-tunneling-explained/
-
Stanford Ml systems -> https://stanford-cs329s.github.io/syllabus.html
-
Explainable Ai -> https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
-
ML cheatsheet - https://github.com/soulmachine/machine-learning-cheat-sheet
-
Kaggle Allstar book: https://github.com/abhishekkrthakur/approachingalmost
-
MIT - Assortment of tools / instructions; https://github.com/shervinea/mit-15-003-data-science-tools
-
Powerful python: https://powerfulpython.com/bootcamp/
-
Interpretable ML: https://christophm.github.io/interpretable-ml-book/index.html
-
Resource of resources: https://www.linkedin.com/posts/vipulppatel_ultimate-guide-to-ai-data-science-machine-activity-6860243081434828800--w_7
-
Udemy course I like: https://www.udemy.com/user/kirilleremenko/
-
Sql training - forget what I found this:
- SQL Fiddle - a playground environment that let’s you create tables and run SQL queries in the browser (http://sqlfiddle.com/)
- SQL Bolt - An interactive tutorial great for beginners (https://sqlbolt.com/)
- Select Star SQL - An interactive tutorial (https://selectstarsql.com/)
- SQL Murder Mystery - For intermediate/advanced SQL. Solve a murder mystery by running SQL queries (https://lnkd.in/en3-VnT9)
- SQL Indexing for Devs - Indexing is an important concept for making SQL queries more efficient. This blog series provides a good introduction (https://lnkd.in/egBCqJPa)
- SQL Zoo - Another interactive tutorial (https://lnkd.in/eeGAxE7q)
- The SQL Tutorial for Data Analysis - Another great tutorial that segments topics by beginner, intermediate, and advanced (https://lnkd.in/eUq7VvMp)
- Other sql to review -
- Databases & SQL for DS - IBM - https://www.coursera.org/learn/sql-data-science
- Learn SQL Basics for Ds - https://www.coursera.org/specializations/learn-sql-basics-data-science
- SQL Cookbook - O'Reilly
- SQL 57 Practice Problems – Sylvia Vasilik
- SQL for Data Analytics – by Packt
-
Python Basics / Intro level
- Official Python: https://bugs.python.org/file47781/Tutorial_EDIT.pdf
- Pandas tutorials: https://pandastutor.com/index.html
-
Good jupyter notebooks for learning // https://github.com/milaan9?tab=repositories (02_python_datatypes; 04_python_dictionaries)
- Stanford:
- https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/readings/python-review.pdf
- https://stanfordpython.com/#/lectures
- Stanford: https://stanfordpython.com/lecture/lecture-2-full.pdf
- Cloud computing intro: https://web.stanford.edu/class/cs349d/docs/L01_overview.pdf
- Harvard:
- UCSF:
- ECU:
- Duke:
- Emertxe Information Technologies: https://www.slideshare.net/EmertxeSlides/presentations/5
- Other python to review:
- 2022 Complete Python Bootcamp - https://www.udemy.com/course/complete-python-bootcamp
- Complete Python Developer - https://www.udemy.com/course/complete-python-developer-zero-to-mastery
- Python Crash Course – 2nd Edition book by Eric Matthews
- Python Cookbook: Recipes for Mastering Python 3 by O’Reilly
- Elements of Programming Interviews in Python: The insider’s Guide by Adnan, Amit
-
stats to review:
- Intro to Statistics - https://www.udacity.com/course/intro-to-statistics--st101
- Intro to Inferential Statistics - https://www.udacity.com/course/intro-to-inferential-statistics--ud201
- Statistics and probability by Khan Academy - https://www.khanacademy.org/math/statistics-probability
- Statistics in Plain English by Timothi C Urdan
- Head First Statistics: A Brain-Friendly Guide by Dawn Griffiths
- ISLR
- ESLR
- blog - https://lnkd.in/gpYNgFBn
-