You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm starting to dig more deeply into Dask user experience. I'd like to experiment with a more data-driven approach based on the various places where we interact with Dask users, specifically StackOverflow, Discourse and Github.
As a first step in that direction I've been playing around with scraping all StackOverflow questions and running some basic analyses on that data. Pretty rudimentary still atm but thought I'd share here for visibility and feedback.
Some Interesting Findings
Most popular questions (by views)
Interesting to see a mix of pandas-related questions (expected) as well as Parquet and Spark-related questions. A list like this could be used to inform Dask messaging and content creation on the blog or social channels. These are clearly the topics folks are interested in most.
Most popular tags
These are the top 10 most popular "sub-tags". Assuming that all "pandas"-tagged questions are DataFrame questions, that would mean that almost half of the questions (48.7%) are related to Dask DataFrame.
Possible Next Steps
It would be interesting to:
Run topic modelling on the full questions to get more fine-grained understanding of what folks are running into
include timestamp data to see how questions/usage seems to change over time
Build out analyses for other platforms and compare -- I'd be curious to see if particular platforms see more activity around particular Dask Collections, for example.
...much more to dig into, I'm sure...
I'm going to keep chugging ahead here but would welcome any feedback and further discussion. If anyone's interested in helping me build out analyses for Discourse and Github, I'd be very happy to join forces.
The text was updated successfully, but these errors were encountered:
avriiil
changed the title
Analysing Dask user questions to
Analysing Dask user questions to better understand user pain
Nov 10, 2022
avriiil
changed the title
Analysing Dask user questions to better understand user pain
Analysing Dask user questions to better understand usage / user pain
Nov 10, 2022
Thanks for doing this @rrpelgrim . I'd like to also suggest that you read through a bunch of the top issues (maybe 100 or so?) to get a feel for them as well. I think that getting dirty with the actual questions is important to understand things. I welcome the quantitative analysis that you've done here. I'd like to also encourage you to get in and get qualitative as well. I suspect that it gives a different kind of knowledge.
I'm starting to dig more deeply into Dask user experience. I'd like to experiment with a more data-driven approach based on the various places where we interact with Dask users, specifically StackOverflow, Discourse and Github.
As a first step in that direction I've been playing around with scraping all StackOverflow questions and running some basic analyses on that data. Pretty rudimentary still atm but thought I'd share here for visibility and feedback.
Some Interesting Findings
Interesting to see a mix of pandas-related questions (expected) as well as Parquet and Spark-related questions. A list like this could be used to inform Dask messaging and content creation on the blog or social channels. These are clearly the topics folks are interested in most.
These are the top 10 most popular "sub-tags". Assuming that all "pandas"-tagged questions are DataFrame questions, that would mean that almost half of the questions (48.7%) are related to Dask DataFrame.
Possible Next Steps
It would be interesting to:
I'm going to keep chugging ahead here but would welcome any feedback and further discussion. If anyone's interested in helping me build out analyses for Discourse and Github, I'd be very happy to join forces.
NOTEBOOK: https://github.com/rrpelgrim/dask-stackoverflow/blob/main/scrape-stackoverflow.ipynb
The text was updated successfully, but these errors were encountered: