diff --git a/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json b/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json new file mode 100644 index 000000000000..0bdba951bd4c --- /dev/null +++ b/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "9165d1720fe0c8a1bb6597adcb1ed61d", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: ClickHouse\nfreeze: auto\n---\n\n[Ibis](https://ibis-project.com) supports reading and querying data using\n[ClickHouse](https://clickhouse.com/) as a backend.\n\nIn this example we'll demonstrate using Ibis to connect to a ClickHouse server,\nand to execute a few queries.\n\n::: {#518cc416 .cell execution_count=1}\n``` {.python .cell-code}\nfrom ibis.interactive import *\n```\n:::\n\n\n## Creating a Connection\n\nFirst we need to connect Ibis to a running ClickHouse server.\n\nIn this example we'll run queries against the publicly available [ClickHouse\nplayground](https://clickhouse.com/docs/en/getting-started/playground) server.\n\nTo run against your own ClickHouse server you'd only need to change the\nconnection details.\n\n::: {#cde12fc5 .cell execution_count=2}\n``` {.python .cell-code}\ncon = ibis.connect(\"clickhouse://play@play.clickhouse.com:443\")\n```\n:::\n\n\n## Listing available tables\n\nThe ClickHouse playground server has a number of interesting datasets\navailable. To see them, we can examine the tables via the `.tables` attribute.\n\nThis shows a list of all tables available:\n\n::: {#489c4882 .cell execution_count=3}\n``` {.python .cell-code}\ncon.tables\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\nTables\n------\n- actors\n- benchmark_results\n- benchmark_runs\n- cell_towers\n- checks\n- cisco_umbrella\n- covid\n- dish\n- dns\n- dns2\n- food_facts\n- github_events\n- hackernews\n- hits\n- lineorder\n- loc_stats\n- menu\n- menu_item\n- menu_item_denorm\n- menu_page\n- minicrawl\n- ontime\n- opensky\n- pypi\n- query_metrics_v2\n- rdns\n- recipes\n- repos\n- repos_raw\n- run_attributes_v1\n- search_clickhouse_stackoverflow\n- search_stackoverflow\n- stackoverflow\n- stock\n- tranco\n- trips\n- uk_price_paid\n- uk_price_paid_updater\n- wikistat\n- workflow_jobs\n```\n:::\n:::\n\n\n## Inspecting a Table\n\nLets take a look at the `hackernews` table. This table contains all posts and\ncomments on [Hacker News](https://news.ycombinator.com/).\n\nWe can access the table by attribute as `con.tables.hackernews`.\n\n::: {#d93dd2dd .cell execution_count=4}\n``` {.python .cell-code}\nt = con.tables.hackernews\n```\n:::\n\n\nWe can then take a peak at the first few rows using the `.head()` method.\n\n::: {#d2d8613b .cell execution_count=5}\n``` {.python .cell-code}\nt.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ id       deleted  type     by           time                 text     dead    parent   poll     kids                  url                                                                            score   title                                                        parts            descendants ┃\n┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ !uint32!uint8!string!string!timestamp(0)!string!uint8!uint32!uint32!array<!uint32>!string!int32!string!array<!uint32>!int32      │\n├─────────┼─────────┼─────────┼─────────────┼─────────────────────┼─────────┼────────┼─────────┼─────────┼──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────┼─────────────┤\n│       10story  pg         2006-10-09 18:21:51~000[15, 234509, ... +2]http://ycombinator.com                                                       57Y Combinator                                               []15 │\n│       20story  phyllis    2006-10-09 18:30:28~000[]http://www.paulgraham.com/mit.html                                           16A Student's Guide to Startups                              []0 │\n│       30story  phyllis    2006-10-09 18:40:33~000[531602]http://www.foundersatwork.com/stevewozniak.html                              7Woz Interview: the early days of Apple                     []0 │\n│       40story  onebeerdave2006-10-09 18:47:42~000[]http://avc.blogs.com/a_vc/2006/10/the_nyc_develop.html                       5NYC Developer Dilemma                                      []0 │\n│       50story  perler     2006-10-09 18:51:04~000[]http://www.techcrunch.com/2006/10/09/google-youtube-sign-more-separate-deals/7Google, YouTube acquisition announcement could come tonight[]0 │\n└─────────┴─────────┴─────────┴─────────────┴─────────────────────┴─────────┴────────┴─────────┴─────────┴──────────────────────┴───────────────────────────────────────────────────────────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────┴─────────────┘\n
\n```\n:::\n:::\n\n\n## Finding the highest scoring posts\n\nHere we find the top 5 posts by score.\n\nPosts have a title, so we:\n\n- `filter` out rows that lack a title\n- `select` only the columns we're interested in\n- `order` them by score, descending\n- `limit` to the top 5 rows\n\n::: {#fc032d84 .cell execution_count=6}\n``` {.python .cell-code}\ntop_posts_by_score = (\n t.filter(_.title != \"\")\n .select(\"title\", \"score\")\n .order_by(ibis.desc(\"score\"))\n .limit(5)\n)\n\ntop_posts_by_score\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓\n┃ title                        score  ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩\n│ !string!int32 │\n├─────────────────────────────┼────────┤\n│ Stephen Hawking has died   6015 │\n│ A Message to Our Customers 5771 │\n│ Steve Jobs has passed away.4338 │\n│ Bram Moolenaar has died    4310 │\n│ Mechanical Watch           4298 │\n└─────────────────────────────┴────────┘\n
\n```\n:::\n:::\n\n\n## Finding the most prolific commenters\n\nHere we find the top 5 commenters by number of comments made.\n\nTo do this we:\n\n- `filter` out rows with no author\n- `group_by` author\n- `count` all the rows in each group\n- `order_by` the counts, descending\n- `limit` to the top 5 rows\n\n::: {#95a37395 .cell execution_count=7}\n``` {.python .cell-code}\ntop_commenters = (\n t.filter(_.by != \"\")\n .group_by(\"by\")\n .agg(count=_.count())\n .order_by(ibis.desc(\"count\"))\n .limit(5)\n)\n\ntop_commenters\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━┓\n┃ by            count ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━┩\n│ !stringint64 │\n├──────────────┼───────┤\n│ dang        64937 │\n│ tptacek     61479 │\n│ jacquesm    56408 │\n│ pjmlp       54785 │\n│ dragonwriter51150 │\n└──────────────┴───────┘\n
\n```\n:::\n:::\n\n\nThis query could also be expressed using the `.topk` method, which is\na shorthand for the above:\n\n::: {#da5c0545 .cell execution_count=8}\n``` {.python .cell-code}\n# This is a shorthand for the above\ntop_commenters = t.filter(_.by != \"\").by.topk(5)\n\ntop_commenters\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ by            Count(by) ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ !stringint64     │\n├──────────────┼───────────┤\n│ dang        64937 │\n│ tptacek     61479 │\n│ jacquesm    56408 │\n│ pjmlp       54785 │\n│ dragonwriter51150 │\n└──────────────┴───────────┘\n
\n```\n:::\n:::\n\n\n## Finding top commenters by score\n\nHere we find the top 5 commenters with the highest cumulative scores. In this\ncase the `.topk` shorthand won't work and we'll need to write out the full\n`group_by` -> `agg` -> `order_by` -> `limit` pipeline.\n\n::: {#cdcf83c2 .cell execution_count=9}\n``` {.python .cell-code}\ntop_commenters_by_score = (\n t.filter(_.by != \"\")\n .group_by(\"by\")\n .agg(total_score=_.score.sum())\n .order_by(ibis.desc(\"total_score\"))\n .limit(5)\n)\n\ntop_commenters_by_score\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ by            total_score ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ !stringint64       │\n├──────────────┼─────────────┤\n│ ingve       405248 │\n│ tosh        301694 │\n│ pseudolus   295067 │\n│ Tomte       276776 │\n│ todsacerdoti269524 │\n└──────────────┴─────────────┘\n
\n```\n:::\n:::\n\n\n## Next Steps\n\nThere are lots of other interesting queries one might ask of this dataset.\n\nA few examples:\n\n- What posts had the most comments?\n- How do post scores fluctuate over time?\n- What day of the week has the highest average post score? What day has the lowest?\n\nTo learn more about how to use Ibis with Clickhouse, see [the\ndocumentation](https://ibis-project.org/backends/ClickHouse/).\n\n", + "supporting": [ + "clickhouse_files" + ], + "filters": [], + "includes": { + "include-in-header": [ + "\n\n\n" + ] + } + } +} \ No newline at end of file diff --git a/docs/tutorials/data-platforms/clickhouse.qmd b/docs/tutorials/data-platforms/clickhouse.qmd index 00fbe18eea34..acc1848abda6 100644 --- a/docs/tutorials/data-platforms/clickhouse.qmd +++ b/docs/tutorials/data-platforms/clickhouse.qmd @@ -1,5 +1,6 @@ --- title: ClickHouse +freeze: auto --- [Ibis](https://ibis-project.com) supports reading and querying data using