Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpmdb ‑‑initdb Database problem on Fedora because of a problem in libdb. Can’t install or remove any packages. #2852

Closed
ytrezq opened this issue Jan 18, 2018 · 21 comments

Comments

@ytrezq
Copy link

ytrezq commented Jan 18, 2018

I freshly installed Fedora 28 on build 17063.1000, and I can’t install any packages. rpmdb ‑‑rebuilddb exit itself but fails to perform any actions despite having removed all __db.00X files in /var/lib/rpm.

rpm ‑‑initdb segfault when launched as root, but it seems the syscall crashing it also hang ɢᴅʙ. So I don’t know what’s the system call triggering the problem. Before segfaulting, it creates /var/lib/rpm/__db.001, which makes running rpm ‑‑initdb hang in a manner it is immune to all syscall, and launching ɢᴅʙ on it’s ᴘɪᴅ makes ɢᴅʙ halting in the same way.

Though it really seems to have to be with libdb5.3.28, because running chdir /var/lib/rpm;db_dump /var/lib/__db.001 as root trigger the same problem (I know it’s contents are invalid but it should at least respond to SIGTERM). You can download __db.001.

Here’s the ltrace :

[root@WINDOWS81 rpm]# ltrace db_dump /var/lib/rpm/__db.001
__db_rpath(0x7fffe7d9f4b7, 0x7fffe7d9f268, 0x7fffe7d9f280, 0x7f304b216718)                   = 0
db_version(0x7fffe7d9f00c, 0x7fffe7d9f010, 0x7fffe7d9f014, 0x7f304b216718)                   = 0x7f304b5c7158
getopt(2, 0x7fffe7d9f268, "d:D:f:F:h:klL:m:NpP:rRs:V")                                       = -1
sigemptyset(<>)                                                                              = 0
sigaction(SIGHUP, { 0x7f304bc02010, <>, 0, nil }, { nil, <>, 0x7, 0x7f304b80b790 })          = 0
sigemptyset(<>)                                                                              = 0
sigaction(SIGINT, { 0x7f304bc02010, <>, 0, nil }, { nil, <1>, 0x7, 0x7f304b80b790 })         = 0
sigemptyset(<>)                                                                              = 0
sigaction(SIGPIPE, { 0x7f304bc02010, <>, 0, nil }, { nil, <>, 0x7, 0x7f304b80b790 })         = 0
sigemptyset(<>)                                                                              = 0
sigaction(SIGTERM, { 0x7f304bc02010, <>, 0, nil }, { nil, <14>, 0, 0x7f304b80bc69 })         = 0
db_env_create(0x7fffe7d9f0b8, 0, 0x7f05060451f0, 0)                                          = 0

I can’t provide the strace because strace fails like ɢᴅʙ before printing post process start up system calls.

@ytrezq ytrezq changed the title rpmdb ‑‑initdb Database problem on Fedora. rpmdb ‑‑initdb Database problem on Fedora because of a problem in libdb. Can’t install or remove any packgages. Jan 18, 2018
@ytrezq ytrezq changed the title rpmdb ‑‑initdb Database problem on Fedora because of a problem in libdb. Can’t install or remove any packgages. rpmdb ‑‑initdb Database problem on Fedora because of a problem in libdb. Can’t install or remove any packages. Jan 18, 2018
@WSLUser
Copy link

WSLUser commented Jan 18, 2018

@ytrezq are you using the WSL-Distribution-Switcher?

@ytrezq
Copy link
Author

ytrezq commented Jan 18, 2018

@DarthSpock I tried with and without.

Though I handed up in handling things manually since Fedora 28 isn’t an available version. But I’m not having this problem if I import the rootfs on a regular Linux partition.
rpm -qa is working normally if I launch it as a normal user.

@WSLUser
Copy link

WSLUser commented Jan 18, 2018

I would follow this issue for Fedora. I was only curious to know if Fedora magically showed up in the Store randomly without warning. Now I know that's not the case. I recommend sticking with Ubuntu and OpenSuse for now. @therealkenc may be able to help you with this issue.

@ytrezq
Copy link
Author

ytrezq commented Jan 18, 2018

@DarthSpock I reported it to Fedora bugzilla and they closed the issue explaining it was a Microsoft problem because it works fine with a native Linux kernel.

It seems to be definitely a system call implementation problem. I need to use latest packages only available on Fedora like ɴaᴄl toolchains.

@WSLUser
Copy link

WSLUser commented Jan 18, 2018

Yes it's a MS problem but MS isn't likely to troubleshoot unimplemented syscalls for a distro that isn't officially supported yet. For all we know those syscalls will be available when Fedora is officially available from the Store (doubtful but hopeful at same time). As I said, follow the issue I linked above to track Fedora's availability. Until then, you're in a as-is status. If you can't wait for availability in the Store, run it in a Docker container or a VM (though apparently there are issues in this department on Insider's as well).

@ytrezq
Copy link
Author

ytrezq commented Jan 18, 2018

@DarthSpock but if you download [__db.001](https://user-images.githubusercontent.com/3824869/35076460-dc4b81be-fbf8-11e7-846f-584340872afa.png), and run db_dump __db.001 from Fedora 28 you’ll reproduce the bug immediately.

@WSLUser
Copy link

WSLUser commented Jan 18, 2018

I don't have Fedora installed and that won't change until it's available in the Store. I also don't have native Fedora as an available OS either, just CentOS and RHEL and I'm not able to download that file from work. Also, i'm not member of WSL, just another user like you, while I might be able to repro some things using Ubuntu, this isn't something I can repro.

@therealkenc
Copy link
Collaborator

but if you download __db.001, and run db_dump __db.001 from Fedora 28 you’ll reproduce the bug immediately.

That's a link to a tiny .png. This probably has nothing to do with Fedora, and that file is a rpm Berkeley DB. Seriously, just a repro is all we ask for (on a good day). Something like:

sudo apt install rpm db-util
cd ~ && mkdir dbfail && cd dbfail
wget https://somewhere/__db.001
db_dump __db.001

@ytrezq
Copy link
Author

ytrezq commented Jan 18, 2018

@therealkenc except that’s it’s not a png file but a libdb5.3 database for rpm. The problem is GitHub (unlike sourceforge) is only able to serve attachments on issues if they are images mime types.

@therealkenc
Copy link
Collaborator

therealkenc commented Jan 18, 2018

Then your absent repro steps are also missing a mv.

sudo apt install rpm db-util
cd ~ && mkdir dbfail && cd dbfail
wget https://user-images.githubusercontent.com/3824869/35076460-dc4b81be-fbf8-11e7-846f-584340872afa.png
mv 35076460-dc4b81be-fbf8-11e7-846f-584340872afa.png __db.001
strace -f -o db_dump.strace db_dump __db.001

It just seeks off into space here:

2236  open("__db.001", O_RDWR)          = 3
2236  fcntl(3, F_GETFD)                 = 0
2236  fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
2236  fstat(3, {st_mode=S_IFREG|0666, st_size=352256, ...}) = 0
2236  read(3, "\227\10\22\0\0\0\0\0\5\0\0\0\3\0\0\0\34\0\0\0\356\230\231\343\21\372B@\0\0\0\0"..., 160) = 160
2236  lseek(3, 25769803780, SEEK_SET)   = 25769803780

This is with libdb 5.3.28-11ubuntu0.1.

@ytrezq
Copy link
Author

ytrezq commented Feb 3, 2018

@therealkenc : but aren’t the database files I uploaded a reproducer ?

@therealkenc
Copy link
Collaborator

Yep your database file is corrupt. All of the WSL system calls execute flawlessly and confirm it is corrupt. Steps to create the corrupt database file is the repro.

Actual reproduction steps here aren't going to be straightforward because you are not on a supported platform. Posting a set of repro steps from clean Ubuntu that reproduces a corrupt database file because of a WSL syscall misbehaviour would be possible in theory. But unless you are feeling highly motivated, you are better off waiting for #2584 to flip. Then the first repro step will be 1. Install Fedora from the Store.

@ytrezq
Copy link
Author

ytrezq commented Feb 4, 2018

@therealkenc : On Linux,

sudo apt install rpm db-util
cd ~ && mkdir dbfail && cd dbfail
wget https://user-images.githubusercontent.com/3824869/35076460-dc4b81be-fbf8-11e7-846f-584340872afa.png
mv 35076460-dc4b81be-fbf8-11e7-846f-584340872afa.png __db.001
strace -f -o db_dump.strace db_dump __db.001

doesn’t hang the program. The system call which hangs libdb (making it only replying to sigkill) is the same that create such corrupt database.

@therealkenc
Copy link
Collaborator

On Ubuntu:

$ sudo apt install rpm db-util
$ sudo rpm --initdb --dbpath /tmp/foo
$ sudo db_dump /tmp/foo/__db.001

I assume those are the steps you are talking about. I have to assume this, of course, because you still have not posted your repro steps. This behaves the same on Real Linux™ and WSL for me. db_dump prints a bunch of "db_dump: BDB0196 Encrypted checksum: no encryption key specified" errors, on both. It does not hang. This is with rpm 4.12.0.1 and db-util 5.3.21.

@ytrezq
Copy link
Author

ytrezq commented Feb 4, 2018

@therealkenc : Ok, please retry db_dump with libdb5.3.28 and Windows® build 17074.1002.

Nothing is printed in that case on the screen.

@therealkenc
Copy link
Collaborator

Nothing is printed in that case on the screen doing what??? Those three steps I posted?

My three repro steps were performed with libdb 5.3.28-11ubuntu0.1 and db-util 1:5.3.21~exp1ubuntu2. Neither of which has to do with my Windows build, which is 17083.1000. Those packages are shipped by Canonical.

Speculating, please try running the three steps again. Cut and paste this time. But the time-sink value on this one is depreciating rapidly.

@ytrezq
Copy link
Author

ytrezq commented Feb 10, 2018

@therealkenc : I forgot something important : the stall only occurs if db_dump or rpm is ran as root.

@ytrezq
Copy link
Author

ytrezq commented Feb 15, 2018

@therealkenc : did you retry to run the repro as root ?

@therealkenc
Copy link
Collaborator

therealkenc commented Feb 15, 2018

run the repro as root ?

What repro.

Feb 4 I gave the following repro steps.

$ sudo apt install rpm db-util
$ sudo rpm --initdb --dbpath /tmp/foo
$ sudo db_dump /tmp/foo/__db.001

So, yes.

@therealkenc
Copy link
Collaborator

There was never a repro here so I am putting this issue out of its misery and duping #902.

@ytrezq
Copy link
Author

ytrezq commented Mar 30, 2018

@therealkenc : Sorry, I forgot to reply. Here’s the complete database to be used with the rpm command : /var/lib/rpm.tar.xz.

Please re‑open : this seems to has nothing to do with duplicate !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants