HTTP 500 errors during indexing #328

skitt · 2024-09-12T11:07:57Z

See https://elixir.bootlin.com/linux/v6.10.9/source/block/blk-core.c:

This happens on all versions of the kernel I’ve tried, back to https://elixir.bootlin.com/linux/v2.6.39.4/source/block/blk-core.c at least.

fstachura · 2024-09-12T11:41:19Z

Hello,

I can't reproduce this. Does this happen when you click on an identifier? Or on a source page?

skitt · 2024-09-12T11:47:03Z

Wow, it’s working now... Presumably a transient issue then. It happened whenever I tried accessing blk-core.c in any way, including through the links in the initial post.

tleb · 2024-09-12T14:36:57Z

The logs have traces:

2024-09-12 13:13:04 [FALCON] [ERROR] GET /linux/v2.6.39.4/source/block/blk-core.c => Traceback (most recent call last):
  File "falcon/app.py", line 375, in falcon.app.App.__call__
  File "/usr/local/elixir/elixir/web.py", line 103, in on_get
    resp.status, resp.text = generate_source_page(req.context, query, project, version, path)
  File "/usr/local/elixir/elixir/web.py", line 412, in generate_source_page
    'code': generate_source(q, project, version, path),
  File "/usr/local/elixir/elixir/web.py", line 313, in generate_source
    code = q.query('file', version, path)
  File "/usr/local/elixir/elixir/query.py", line 191, in query
    (lib.compatibleFamily(self.db.defs.get(tok2).get_families(), family) or
  File "/usr/local/elixir/elixir/data.py", line 167, in get
    p = self.ctype(p)
  File "/usr/local/elixir/elixir/data.py", line 60, in __init__
    self.data, self.families = data.split(b'#')
ValueError: too many values to unpack (expected 2)

This is somewhat known; but we have (1) no tracking issue and (2) no fix currently. I haven't digged deeper but it is surprising that it can resolve itself over time. IE we seem to get an error from data read from database, but it stops appearing over time.

fstachura · 2024-09-24T06:15:19Z

I did some research on this and it's not clear to me if Elixir is handling database concurrency correctly. Berkeley DB has this concept of "products" with solutions for different concurrency/availability requirements. See: https://docs.oracle.com/cd/E17276_01/html/programmer_reference/intro_products.html

	Berkeley DB Data Store	Berkeley DB Concurrent Data Store
	Provides indexed, single-reader/single-writer embedded data storage	Adds simple locking with multiple-reader/single-writer capabilities
Provides concurrent read-write access	No	Yes

It seems to me that Elixir currently does not use the Concurrent Data Store
The following documents suggest, that Berkeley DB environments have to be used for it to work. But AFAIK Elixir just opens the database directly, without an environment.
Chapter 10. Berkeley DB Concurrent Data Store Applications
Chapter 9. The Berkeley DB Environment

But also, from the products document:

The Berkeley DB Data Store product is an embeddable, high-performance data store. This product supports multiple concurrent threads of control, including multiple processes and multiple threads of control within a process. However, Berkeley DB Data Store does not support locking, and hence does not guarantee correct behavior if more than one thread of control is updating the database at a time.

So maybe these just issues with how databases are shared between threads in the update script.

If neither of that is the problem, then perhaps the values in the database are not updated "atomically". What I mean is that it is possible that the update script sometimes writes a temporary value that does not have the correct format into defs/refs databases. Maybe the key is initially empty and that's why split with unpack fails.

@tleb You probably know more about the update script than I do.

tldr three things to try/investigate:

Concurrent Data Store
How databases are shared in update.py, also see DB_THREAD flag
Correctness of single key-level updates

tleb · 2024-09-24T08:30:43Z

Hm, a somewhat simpler approach would be to avoid any kind of concurrent access. Current production server can handle a duplicate of the data from any project. Biggest project data is Linux at 31G, and prod server has 39G left. We could scale its disk as well, that is an option.

That way we avoid any potential concurrency issue.

That has a few consequences:

[-] It raises the bar for hosting an instance.
[+] It gives us a path for indexing on another server.
[-] We need to handle that side-copy. We must copy the data, then do indexing work, then update the data used by the HTTP server (using symlinks I guess). The end operation must be done atomically else we lost the benefit of not requiring synchronisation.
[+] It lets us switch to another (non-concurrent) database if we ever desire that.

Question:

What do you think?
What part of the code should implement that? Maybe utils/index-all-repositories? Or should it be done by update.py?

$ du -hc linux/data/* | sort -hr
31G	total
15G	linux/data/references.db
12G	linux/data/versions.db
4.0G	linux/data/definitions.db
196M	linux/data/hashes.db
149M	linux/data/blobs.db
113M	linux/data/doccomments.db
102M	linux/data/filenames.db
34M	linux/data/compatibledts.db
5.2M	linux/data/compatibledts_docs.db
8.0K	linux/data/variables.db

tleb · 2024-11-08T16:24:10Z

The same sort of HTTP 500 trace still appears on Oct 28, Oct 29, Nov 1, Nov 2 and Nov 4. All those dates are after the deploy of 0b8d735 (Oct 11). So the shared flag is not enough.

I maintain the simpler approach is to do indexing on the side and update symlinks. Prod server storage size is even less of an issue now that I ran some git gc, we have 69G available.

skitt closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2024

tleb reopened this Sep 12, 2024

tleb changed the title ~~500 error on block/blk-core.c in the kernel~~ HTTP 500 errors during indexing Sep 24, 2024

fstachura mentioned this issue Oct 1, 2024

update: Make database usage thread safe #341

Closed

fstachura mentioned this issue Oct 18, 2024

Error referencing some files #347

Closed

tleb added the bug label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP 500 errors during indexing #328

HTTP 500 errors during indexing #328

skitt commented Sep 12, 2024

fstachura commented Sep 12, 2024

skitt commented Sep 12, 2024

tleb commented Sep 12, 2024

fstachura commented Sep 24, 2024

tleb commented Sep 24, 2024

tleb commented Nov 8, 2024

HTTP 500 errors during indexing #328

HTTP 500 errors during indexing #328

Comments

skitt commented Sep 12, 2024

fstachura commented Sep 12, 2024

skitt commented Sep 12, 2024

tleb commented Sep 12, 2024

fstachura commented Sep 24, 2024

tleb commented Sep 24, 2024

tleb commented Nov 8, 2024