Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash because podcast_episodes published_date column has null values #1572

Closed
1 task done
mchowning opened this issue Nov 29, 2023 · 4 comments
Closed
1 task done
Labels
[Type] Bug Not functioning as intended.

Comments

@mchowning
Copy link
Contributor

mchowning commented Nov 29, 2023

Description

Starting on November 27, we've seen a significant increase in crashes caused by the app finding a null value is being found in the publishedAt column of the podcast_episodes database table. This shouldn't be possible because that column is not allowed to be null. and it's not clear what the cause is.

This almost entirely seems to occur after users download the episode json for "The Daily" podcast. In addition, there is a sharp spike in these crashes early in the morning in the US (shortly after "The Daily" releases a new episode), and then very quickly falls off.

image

The number of affected user as of today (Nov. 29) is around 3.5k, so this does not seem to be a widespread issue. In addition, the issue is mostly occuring on the 7.51 release. This is likely because the 7.52 release has just started rolling out so most of our users are on 7.51. It is interesting that the issue started on Nov. 27 though because that was well after the Nov. 27 release, which indicates that there were no code changes in the app that would have caused the issue to start.

Step-by-step reproduction instructions

I don't know how to reproduce this.

Screenshots or screen recording

No response

Did you search for existing bug reports?

  • I have searched for existing bug reports.

Device, Operating system, and Pocket Casts app version

No response

@mchowning mchowning added the [Type] Bug Not functioning as intended. label Nov 29, 2023
@mchowning
Copy link
Contributor Author

mchowning commented Nov 29, 2023

I was able to sort of reproduce this issue by updating the DateTypeConverter to always return null. That did create a crash with the same error message that our users are seeing, but when I sent the crash to Sentry from a release build, my crash appeared to originate from a different part of the generated EpisodeDao_Impl class than from where the crashes our users are seeing are coming from.

In particular, I saw the in my build the crash originated from a null publishedAt column in the generated EpisodeDao_Impl:findPodcastOrderPublishedDateDesc, but our users seem to all be seeing the crash only in the rx version: EpisodeDao_Impl::observeByPodcastOrderPublishedDateDesc. 🤔

My guess is that forcing the DateTypeConverter to always return null is overly broad and causes the crash to occur more quickly than it is happening for our users on regular builds, but this does leave me without full confidence that this way of recreating couldn't be missing something.

@mchowning
Copy link
Contributor Author

mchowning commented Dec 1, 2023

Turns out that the type converter change did not fix this error. It did change the exception a bit though, and I think that gave us a bit more information. In particular, it seems to confirm that there actually is a null value in the published_date column of the podcast_episodes table even though that column has a non-null constraint.

When we had the type converter capable of returning null, this was the relevant part of the generated code:

if (_cursor.isNull(_cursorIndexOfPublishedDate)) {
  _tmp = null;
} else {
  _tmp = _cursor.getLong(_cursorIndexOfPublishedDate);
}
final Date _tmp_1 = __dateTypeConverter.toDate(_tmp);
if(_tmp_1 == null) {
  throw new IllegalStateException("Expected non-null java.util.Date, but it was null.");
} else {
  _tmpPublishedDate = _tmp_1;
}

The "Expected non null java.util.Date" exception was thrown with this code.

We tried to work around this by just having the type converter never return null in #1573, which resulted in this generated code:

final Date _tmpPublishedDate;
final long _tmp;
_tmp = _cursor.getLong(_cursorIndexOfPublishedDate);
final Long _tmp_1;
_tmp_1 = _tmp;      // No longer checking for null here because the type converter never returns null values
_tmpPublishedDate = __shouldNotBeNullDateTypeConverter.toDate(_tmp_1);

That didn't work though. It did, however, give a new exception. Now we get an "IllegalStateException: Couldn't read row , col 2 from CursorWindow. Make sure the Cursor is initialized correctly before accessing data from it." from the line calling _cursor.getLong(_cursorIndexOfPublishedDate).

That is informative because that line never got called with the code generated from the nullable type-converter because the _cursor.getLong(...) call was guarded by a check that _cursor.isNull(_cursorIndexOfPublishedDate) was false. Presumably, that means that _cursor.isNull(_cursorIndexOfPublishedDate) must have been returning true since if it returned false we would have gotten the "Couldn't read row" exception from the call to _cursor.getLong(_cursorIndexOfPublishedDate).

In other words, there is a null value in the published_date column even though that column has a non-null constraint:

data class PodcastEpisode(
  // ...
  @ColumnInfo(name = "published_date") override var publishedDate: Date // Date is not nullable here

To verify this I ran an SQL query to manually update the published_date column for one row in the table, and the query threw an error because it violated the column's non-null constraint.

More details about what Cursor::isNull is actually does

The `Cursor::isNull` method calls the `AbstractWindowedCursor` implementation:
public boolean isNull(int columnIndex) {
  checkPosition();
  return mWindow.getType(mPos, columnIndex) == Cursor.FIELD_TYPE_NULL;
}

and mWindow.getType(...) is just calling through to native code, which is returning the int value that represents a null value in the column:

public @Cursor.FieldType int getType(@IntRange(from = 0) int row, @IntRange(from = 0) int column) {
  acquireReference();
  try {
    return nativeGetType(mWindowPtr, row - mStartPos, column);
  } finally {
    releaseReference();
  }
}

@mchowning
Copy link
Contributor Author

mchowning commented Dec 1, 2023

I've looked at the episode json files the server has been returning for the daily from the last few days for a bit, and I don't see anything suspicious or different about them. Admittedly they're large files, so it's not like I'm going over every detail in them, and I could be missing something, but I did do a regex check on all of the published fields that seem to be causing the problem and they all matched the regex.

Also, if there was an issue about the server response that would cause this, I would think think that more than ~5k users would have been affected. It's possibly that the server could have returned a bad value only very briefly, if that were the case I wouldn't expect that we would be getting the same spike again each day at almost exactly the same time (which just happens to be shortly after "The Daily" releases a new episode).

@mchowning
Copy link
Contributor Author

Closing this as fixed by #1589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Bug Not functioning as intended.
Projects
None yet
Development

No branches or pull requests

1 participant