Skip to content

Latest commit

 

History

History
79 lines (57 loc) · 2.96 KB

README.md

File metadata and controls

79 lines (57 loc) · 2.96 KB

SSB Package Statistics Viewer

This project provides a tool to download, process, and interactively explore statistics about public packages using the Libraries.io API. It fetches data about all public packages associated with Statistics Norway and presents the results in an interactive table format.

If you're just interested in the processed results, visit the GitHub Pages deployment.

Features

  • Package Data Fetching: Fetches data about all public packages from Libraries.io.
  • Interactive Table: Displays package data in a dynamic, searchable, and sortable table using Tabulator.js.
  • CSV Download: Allows users to download the dataset as a CSV file for offline use.
  • DuckDB Integration: Easily query and sample the data in the DuckDB Web Shell for further analysis.

Requirements

To fetch and process data using the Libraries.io API:

  1. Libraries.io API Key:

    • You'll need a valid API key from Libraries.io. You can sign up for one here.
    • Add your API key to the appropriate part of the data-fetching script.
  2. Python Environment:

    • Install the required Python dependencies:
      pip install pandas requests

How to Use

1. Download and Process Data

Run the data-fetching script to download the package data:

python fetch_data.py

The script will:

  • Fetch all public packages associated with Statistics Norway from Libraries.io.
  • Save the results as results.csv in the src/ directory.

2. Open the Results Viewer

  • Open index.html in your browser to view the interactive table.

3. Explore the Results

  • Use the "Download CSV" button to save the data for offline use.
  • Use the "Open in DuckDB Web Shell" button to query the dataset directly in the DuckDB Web Shell.

Preprocessed Results

If you don't want to fetch and process the data yourself, you can access the processed results directly:

DuckDB Query Example

The DuckDB Web Shell button includes a query to:

  1. Load the dataset into a table called ssb_packages.
  2. Sample 10 random rows from the table.

The SQL query used:

-- Load CSV file and create a table
CREATE TABLE ssb_packages AS
SELECT *
FROM read_csv_auto('https://trygu.github.io/ssb-pypi-statistics/results.csv');

-- Sample 10 rows from the table
FROM ssb_packages USING SAMPLE 10;

Development Notes

  • Ensure the API key is correctly configured in the fetch_data.py script before running it.
  • The data viewer (index.html) is designed to use a preprocessed results.csv. Modify the DuckDB query URL in the HTML if hosting the dataset elsewhere.

Credits

License

This project is licensed under the MIT License.