Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTools.DateToString throws AggregateException after upgrade from 4.8.0-beta00015 to 4.8.0-beta00016 #772

Closed
mpdunlop opened this issue Nov 24, 2022 · 8 comments · Fixed by #853
Labels
is:bug pri:high up-for-grabs This issue is open to be worked on by anyone

Comments

@mpdunlop
Copy link

When calling DateTools.DateToString for date "0001-01-01 00:00:00Z" in 4.8.0-beta00016 an AggregateException is thrown:

        [Fact]
        public void DateToString_MinDate_ShouldNotThrowAggregateException()
        {
            var date = new DateTime();
            var value = DateTools.DateToString(date, DateResolution.DAY); // Exception thrown here
            Assert.Equal("00010101", value);
        }

System.AggregateException : One or more errors occurred. (The UTC time represented when the offset is applied must be between year 0 and 10,000. (Parameter 'offset'))

In 4.8.0-beta00015 this works without issue and returns "00010101":

        [Fact]
        public void DateToString_MinDate_ShouldNotThrowAggregateException()
        {
            var date = new DateTime();
            var value = DateTools.DateToString(date, DateTools.Resolution.DAY);
            Assert.Equal("00010101", value);
        }

I'm happy to create a pull request and try fix if this wasn't intentional.

@mpdunlop
Copy link
Author

I suspect this is being caused by my local timezone being GMT+8. If this is now adjusted to UTC in the latest beta, this would fall below DateTime.MinValue.

@NightOwl888
Copy link
Contributor

Thanks for the report.

Yea, looks like you are right. I have confirmed this is a bug, although I am not sure what the solution is, yet.

@NightOwl888 NightOwl888 added up-for-grabs This issue is open to be worked on by anyone is:bug pri:high labels Nov 26, 2022
@mpdunlop
Copy link
Author

mpdunlop commented Nov 28, 2022

I'd like to change DateTools.cs to more closely match the Java source, but there's quite a few differences that I don't want to remove without confirming the project's approach to porting first.

Is it right to have additional public methods in Lucene.net that aren't present in the Java source? I ask because there's public methods in Lucene.net DateTools which accept TimeZone as a parameter, which aren't present in Java Lucene 4.8.

For example, in the Java source there is only:

public static String dateToString(Date date, Resolution resolution)

But in Lucene.net there are two:

public static string DateToString(DateTime date, DateResolution resolution)

and

public static string DateToString(DateTime date, TimeZoneInfo timeZone, DateResolution resolution)

Looking at https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/document/DateTools.java, it looks like DateTools.Resolution contains both the TimeZone and the resolution to return the time in, and the TimeZone is always assumed to be in GMT:

    Resolution(int formatLen) {
      this.formatLen = formatLen;
      // formatLen 10's place:                     11111111
      // formatLen  1's place:            12345678901234567
      this.format = new SimpleDateFormat("yyyyMMddHHmmssSSS".substring(0,formatLen),Locale.ROOT);
      this.format.setTimeZone(GMT);
    }

Additionally, in commit 46e2437, the DateTools.Resolution enum was denested and renamed DateResolution, but it remains as DateTools.Resolution in the Java source. I'm not sure that this is necessarily a problem, but it is a departure from the source so I wanted to confirm that this was intentional before making any modifications.

There's also some tests in Lucene.net's TestDateTools.cs which aren't in the Lucene 4.8 source (e.g. TestDateToolsUTC_Ticks). These tests fail when I restore the old logic. I don't think there's a problem with adding additional tests, but I need to either remove or adjust these so they no longer fail if we assume everything is in UTC.

To summarise:

  • I'd like to change DateTools.cs to remove any method TimeZone parameters, remove any DateTimeOffset adjustmentsand assume all input dates are UTC, as per the Java Lucene source.
  • I'd like to remove the tests present in Lucene.net's TestDateTools.cs that aren't present in TestDateTools.java that fail after making the above changes.
  • I'd like to add an additional test as per the original issue to ensure that DateToString does not throw AggregateException when the input DateTime is 01-01-0001, the DateTime.Kind is Local and and the local timezone is ahead of UTC.

@NightOwl888
Copy link
Contributor

The time zone parameter was recently added. This is to align with the Java DateFormat which allows for the time zone to be changed. We don't have a complete port of DateFormat (not very sensible in .NET), but the flexible QueryParser uses NumericDateFormat to convert the date to a number, which uses this format. In Lucene, this accepted a DateFormat as a parameter, but we made a hybrid implementation instead since .NET has no such animal. There is DateTimeFormatInfo, but it doesn't provide the expected functionality. I also attempted to make DateTime work, but DateTimeOffset seems to be the only thing in .NET that lets you specify a timezone other than Local or UTC and then converted into the expected string.

Refer to dotnet/runtime#62247 to understand the limitations of the .NET time zone feature.

The problem you are seeing is that in .NET the DateTime defaults to 0001-01-0001, but in Java the Date class defaults to 1970-01-01 (both midnight UTC). The default .NET date cannot be represented with a time zone because time zones hadn't been invented yet (they are political, not geographical). But in Java, the default works fine because it is within the expected range.

The answer to solving this lies in the .NET source code. I see there is an enum value that allows suppression of an invalid time zone so it doesn't throw. We need to use the same validation logic to ensure we don't call the time zone conversion functions when it is out of range. What to do instead isn't quite clear to me yet. But a guess would be to call the date.ToString() method like we did in beta 00015. However, that doesn't shift the time zone appropriately.

Perhaps we should do the time zone conversion on the current date to get a timespan (difference from UTC), then use that to create a new DateTimeOffset(date, offset).ToString(CultureInfo.InvariantCulture) on that. That is, only in the case where it is out of range to be considered a valid timezone. When it is in range, the current code works fine.

@NightOwl888
Copy link
Contributor

As for de-nesting the enum, see CA1034: Nested types should not be visible. While we aren't strictly following this rule everywhere, in this case it made sense to make the enum more discoverable.

@NightOwl888
Copy link
Contributor

The answer to solving this lies in the .NET source code. I see there is an enum value that allows suppression of an invalid time zone so it doesn't throw. We need to use the same validation logic to ensure we don't call the time zone conversion functions when it is out of range. What to do instead isn't quite clear to me yet. But a guess would be to call the date.ToString() method like we did in beta 00015. However, that doesn't shift the time zone appropriately.

Perhaps we should do the time zone conversion on the current date to get a timespan (difference from UTC), then use that to create a new DateTimeOffset(date, offset).ToString(CultureInfo.InvariantCulture) on that. That is, only in the case where it is out of range to be considered a valid timezone. When it is in range, the current code works fine.

On second thought, whatever they are doing in the .NET source code when they suppress the error is probably the right answer for what we need to do. Again, only in the case where we are out of range.

NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue May 13, 2023
… dates to UTC, otherwise they can produce ArgumentOutOfRangeException. Fixes apache#772.
NightOwl888 added a commit that referenced this issue May 13, 2023
… dates to UTC, otherwise they can produce ArgumentOutOfRangeException. Fixes #772.
@NightOwl888
Copy link
Contributor

@Dunnymeister

The fix will be in the next beta release. In the meantime, use new DateTime().ToUniversalTime() to work around.

@mpdunlop
Copy link
Author

Looks like a clean solution 🙂 Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug pri:high up-for-grabs This issue is open to be worked on by anyone
Projects
None yet
2 participants