-
I have recently built a dashboard using Quarto with Observable JS and DuckDB. All the data is stored in parquet files that are queried by DuckDB-Wasm in response to Observable Inputs, and these update Observable plots and tables. It's working beautifully ... for Mac users on several types of browsers. I've just discovered during internal testing that Windows users encounter errors on both Chrome and Edge. The errors go away in private browsing mode. After some digging, I believe that they are due to an issue with DuckDB-Wasm discussed here. The gist of it is that on Chrome and Edge on Windows, there's a caching issue that doesn't have robust solution yet. The result is that after one or a few times successfully querying a parquet file, an error is thrown that prevents further successful queries. Private browsing disables caching, hence no problems. The dashboard I'm working on will be publicly released to provide access to the data produced by the non-profit I work at, so I need to be confident it will function for people on a wide variety of machines and browsers. In response to a question I asked about a more minor problem I'm having, @mbostock suggested that switching from Quarto to Observable Framework might be a better experience. I'm willing to try porting my dashboard over (though it will further stretch my limited JS skills), but want to determine if it's likely to solve my Windows users caching problems -- and if so, why? If I understand data loaders correctly, they provide a way to generate files of data to support a dashboard, but the data is still accessed via file attachment, which is what I'm doing now. I'm using DuckDB because the data is rather large to load all of at once, so querying only what I need in response to user inputs from the parquet files seemed like a good option. Is there anything about using Observable Framework that would fundamentally change my problem and allow me to provide access to the same amount of data interactively without running into the errors we're currently experiencing? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
This unfortunately appears to be a DuckDB-Wasm issue (as you linked, and as described in #1470) so I’m not sure there’s anything we can do in Framework to resolve this issue automatically. But there are a variety of ways you could workaround it. Perhaps the most direct workaround is to avoid caching on Windows. To ensure that the browser has not cached the Parquet file, you could add a query string with the current time to the file name. The SQL front matter only supports statically-registered sources, but you can use ```js
const sql = DuckDBClient.sql({
gaia: FileAttachment("./lib/gaia-sample.parquet").href + (navigator.userAgent.includes("Windows") ? `?t=${Date.now()}` : "")
});
``` Another way to avoid caching would be to register a service worker that intercepts requests to load Parquet files to add the To register the service worker: navigator.serviceWorker.register("./service-worker.js", {type: "module"}); Then in addEventListener("fetch", async (event) => {
const {pathname} = new URL(event.request.url);
if (!pathname.endsWith(".parquet")) return;
if (!event.request.headers.get("User-Agent")?.includes("Windows")) return;
const headers = new Headers(event.request.headers);
headers.set("Cache-Control", "no-cache");
const request = new Request(event.request.url, {headers});
event.respondWith(fetch(request));
}); You’ll also need to register the Another possibility would be to avoid the Parquet format. DuckDB has a native file format you could consider instead of Parquet, as well as support for many other file formats include CSV. You could have your data loader output a different format, or use a data loader to convert a Parquet file to another format. Finally you could consider avoiding DuckDB-Wasm entirely and instead use DuckDB at build time within a data loader. Maybe you could elaborate a bit more on your use case, how big your data is, how you’re currently using DuckDB-Wasm, what sort of user interaction you want to support, etc. |
Beta Was this translation helpful? Give feedback.
This unfortunately appears to be a DuckDB-Wasm issue (as you linked, and as described in #1470) so I’m not sure there’s anything we can do in Framework to resolve this issue automatically. But there are a variety of ways you could workaround it.
Perhaps the most direct workaround is to avoid caching on Windows. To ensure that the browser has not cached the Parquet file, you could add a query string with the current time to the file name. The SQL front matter only supports statically-registered sources, but you can use
DuckDBClient.sql
to register SQL sources dynamically in JavaScript.