diff --git a/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json b/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json new file mode 100644 index 000000000000..0bdba951bd4c --- /dev/null +++ b/docs/_freeze/tutorials/data-platforms/clickhouse/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "9165d1720fe0c8a1bb6597adcb1ed61d", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: ClickHouse\nfreeze: auto\n---\n\n[Ibis](https://ibis-project.com) supports reading and querying data using\n[ClickHouse](https://clickhouse.com/) as a backend.\n\nIn this example we'll demonstrate using Ibis to connect to a ClickHouse server,\nand to execute a few queries.\n\n::: {#518cc416 .cell execution_count=1}\n``` {.python .cell-code}\nfrom ibis.interactive import *\n```\n:::\n\n\n## Creating a Connection\n\nFirst we need to connect Ibis to a running ClickHouse server.\n\nIn this example we'll run queries against the publicly available [ClickHouse\nplayground](https://clickhouse.com/docs/en/getting-started/playground) server.\n\nTo run against your own ClickHouse server you'd only need to change the\nconnection details.\n\n::: {#cde12fc5 .cell execution_count=2}\n``` {.python .cell-code}\ncon = ibis.connect(\"clickhouse://play@play.clickhouse.com:443\")\n```\n:::\n\n\n## Listing available tables\n\nThe ClickHouse playground server has a number of interesting datasets\navailable. To see them, we can examine the tables via the `.tables` attribute.\n\nThis shows a list of all tables available:\n\n::: {#489c4882 .cell execution_count=3}\n``` {.python .cell-code}\ncon.tables\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\nTables\n------\n- actors\n- benchmark_results\n- benchmark_runs\n- cell_towers\n- checks\n- cisco_umbrella\n- covid\n- dish\n- dns\n- dns2\n- food_facts\n- github_events\n- hackernews\n- hits\n- lineorder\n- loc_stats\n- menu\n- menu_item\n- menu_item_denorm\n- menu_page\n- minicrawl\n- ontime\n- opensky\n- pypi\n- query_metrics_v2\n- rdns\n- recipes\n- repos\n- repos_raw\n- run_attributes_v1\n- search_clickhouse_stackoverflow\n- search_stackoverflow\n- stackoverflow\n- stock\n- tranco\n- trips\n- uk_price_paid\n- uk_price_paid_updater\n- wikistat\n- workflow_jobs\n```\n:::\n:::\n\n\n## Inspecting a Table\n\nLets take a look at the `hackernews` table. This table contains all posts and\ncomments on [Hacker News](https://news.ycombinator.com/).\n\nWe can access the table by attribute as `con.tables.hackernews`.\n\n::: {#d93dd2dd .cell execution_count=4}\n``` {.python .cell-code}\nt = con.tables.hackernews\n```\n:::\n\n\nWe can then take a peak at the first few rows using the `.head()` method.\n\n::: {#d2d8613b .cell execution_count=5}\n``` {.python .cell-code}\nt.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ id ┃ deleted ┃ type ┃ by ┃ time ┃ text ┃ dead ┃ parent ┃ poll ┃ kids ┃ url ┃ score ┃ title ┃ parts ┃ descendants ┃\n┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ !uint32 │ !uint8 │ !string │ !string │ !timestamp(0) │ !string │ !uint8 │ !uint32 │ !uint32 │ !array<!uint32> │ !string │ !int32 │ !string │ !array<!uint32> │ !int32 │\n├─────────┼─────────┼─────────┼─────────────┼─────────────────────┼─────────┼────────┼─────────┼─────────┼──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼────────┼─────────────────────────────────────────────────────────────┼─────────────────┼─────────────┤\n│ 1 │ 0 │ story │ pg │ 2006-10-09 18:21:51 │ ~ │ 0 │ 0 │ 0 │ [15, 234509, ... +2] │ http://ycombinator.com │ 57 │ Y Combinator │ [] │ 15 │\n│ 2 │ 0 │ story │ phyllis │ 2006-10-09 18:30:28 │ ~ │ 0 │ 0 │ 0 │ [] │ http://www.paulgraham.com/mit.html │ 16 │ A Student's Guide to Startups │ [] │ 0 │\n│ 3 │ 0 │ story │ phyllis │ 2006-10-09 18:40:33 │ ~ │ 0 │ 0 │ 0 │ [531602] │ http://www.foundersatwork.com/stevewozniak.html │ 7 │ Woz Interview: the early days of Apple │ [] │ 0 │\n│ 4 │ 0 │ story │ onebeerdave │ 2006-10-09 18:47:42 │ ~ │ 0 │ 0 │ 0 │ [] │ http://avc.blogs.com/a_vc/2006/10/the_nyc_develop.html │ 5 │ NYC Developer Dilemma │ [] │ 0 │\n│ 5 │ 0 │ story │ perler │ 2006-10-09 18:51:04 │ ~ │ 0 │ 0 │ 0 │ [] │ http://www.techcrunch.com/2006/10/09/google-youtube-sign-more-separate-deals/ │ 7 │ Google, YouTube acquisition announcement could come tonight │ [] │ 0 │\n└─────────┴─────────┴─────────┴─────────────┴─────────────────────┴─────────┴────────┴─────────┴─────────┴──────────────────────┴───────────────────────────────────────────────────────────────────────────────┴────────┴─────────────────────────────────────────────────────────────┴─────────────────┴─────────────┘\n\n```\n:::\n:::\n\n\n## Finding the highest scoring posts\n\nHere we find the top 5 posts by score.\n\nPosts have a title, so we:\n\n- `filter` out rows that lack a title\n- `select` only the columns we're interested in\n- `order` them by score, descending\n- `limit` to the top 5 rows\n\n::: {#fc032d84 .cell execution_count=6}\n``` {.python .cell-code}\ntop_posts_by_score = (\n t.filter(_.title != \"\")\n .select(\"title\", \"score\")\n .order_by(ibis.desc(\"score\"))\n .limit(5)\n)\n\ntop_posts_by_score\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓\n┃ title ┃ score ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩\n│ !string │ !int32 │\n├─────────────────────────────┼────────┤\n│ Stephen Hawking has died │ 6015 │\n│ A Message to Our Customers │ 5771 │\n│ Steve Jobs has passed away. │ 4338 │\n│ Bram Moolenaar has died │ 4310 │\n│ Mechanical Watch │ 4298 │\n└─────────────────────────────┴────────┘\n\n```\n:::\n:::\n\n\n## Finding the most prolific commenters\n\nHere we find the top 5 commenters by number of comments made.\n\nTo do this we:\n\n- `filter` out rows with no author\n- `group_by` author\n- `count` all the rows in each group\n- `order_by` the counts, descending\n- `limit` to the top 5 rows\n\n::: {#95a37395 .cell execution_count=7}\n``` {.python .cell-code}\ntop_commenters = (\n t.filter(_.by != \"\")\n .group_by(\"by\")\n .agg(count=_.count())\n .order_by(ibis.desc(\"count\"))\n .limit(5)\n)\n\ntop_commenters\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━┓\n┃ by ┃ count ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━┩\n│ !string │ int64 │\n├──────────────┼───────┤\n│ dang │ 64937 │\n│ tptacek │ 61479 │\n│ jacquesm │ 56408 │\n│ pjmlp │ 54785 │\n│ dragonwriter │ 51150 │\n└──────────────┴───────┘\n\n```\n:::\n:::\n\n\nThis query could also be expressed using the `.topk` method, which is\na shorthand for the above:\n\n::: {#da5c0545 .cell execution_count=8}\n``` {.python .cell-code}\n# This is a shorthand for the above\ntop_commenters = t.filter(_.by != \"\").by.topk(5)\n\ntop_commenters\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ by ┃ Count(by) ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ !string │ int64 │\n├──────────────┼───────────┤\n│ dang │ 64937 │\n│ tptacek │ 61479 │\n│ jacquesm │ 56408 │\n│ pjmlp │ 54785 │\n│ dragonwriter │ 51150 │\n└──────────────┴───────────┘\n\n```\n:::\n:::\n\n\n## Finding top commenters by score\n\nHere we find the top 5 commenters with the highest cumulative scores. In this\ncase the `.topk` shorthand won't work and we'll need to write out the full\n`group_by` -> `agg` -> `order_by` -> `limit` pipeline.\n\n::: {#cdcf83c2 .cell execution_count=9}\n``` {.python .cell-code}\ntop_commenters_by_score = (\n t.filter(_.by != \"\")\n .group_by(\"by\")\n .agg(total_score=_.score.sum())\n .order_by(ibis.desc(\"total_score\"))\n .limit(5)\n)\n\ntop_commenters_by_score\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ by ┃ total_score ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ !string │ int64 │\n├──────────────┼─────────────┤\n│ ingve │ 405248 │\n│ tosh │ 301694 │\n│ pseudolus │ 295067 │\n│ Tomte │ 276776 │\n│ todsacerdoti │ 269524 │\n└──────────────┴─────────────┘\n\n```\n:::\n:::\n\n\n## Next Steps\n\nThere are lots of other interesting queries one might ask of this dataset.\n\nA few examples:\n\n- What posts had the most comments?\n- How do post scores fluctuate over time?\n- What day of the week has the highest average post score? What day has the lowest?\n\nTo learn more about how to use Ibis with Clickhouse, see [the\ndocumentation](https://ibis-project.org/backends/ClickHouse/).\n\n", + "supporting": [ + "clickhouse_files" + ], + "filters": [], + "includes": { + "include-in-header": [ + "\n\n\n" + ] + } + } +} \ No newline at end of file diff --git a/docs/tutorials/data-platforms/clickhouse.qmd b/docs/tutorials/data-platforms/clickhouse.qmd index 00fbe18eea34..acc1848abda6 100644 --- a/docs/tutorials/data-platforms/clickhouse.qmd +++ b/docs/tutorials/data-platforms/clickhouse.qmd @@ -1,5 +1,6 @@ --- title: ClickHouse +freeze: auto --- [Ibis](https://ibis-project.com) supports reading and querying data using