Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing field value types using this method is unpredictable (and not technically supported?) #3115

Closed
JasonElkin opened this issue Apr 13, 2021 · 9 comments

Comments

@JasonElkin
Copy link
Contributor

JasonElkin commented Apr 13, 2021

This is the page with issues: https://github.com/umbraco/UmbracoDocs/edit/main/Reference/Searching/Examine/Indexing/index.md

I recently ran into an issue on a site that's using this method to define a custom field:

provider.FieldDefinitionCollection.AddOrUpdate(new FieldDefinition("eventDate_Long", FieldDefinitionTypes.Long));

After starting up I was getting the following error when trying to run a search.

image

I could fix this by changing my IUserComposer like so:

// before - throws error
composition.Components().Append<ExamineEventsComponent>();

// after - works OK
composition.Components().InsertAfter<ExamineComponent, ExamineEventsComponent>();

This didn't make much sense to me until I looked under the hood at how this works in examine. Examine uses a FieldValueTypeCollection for searching. This collection is immutable and lazily created from the FieldDefinitionCollection.

I.e. this only works if we happen to modify the FieldDefinitionCollection before the FieldValueTypeCollection is initialised and in this particular site it turns out that the FieldValueTypeCollection is being created before we've modified the FieldDefinitionCollection.

I think in this instance (I'm still diagnosing) this is because another package is accessing the index in such a way that's causing the FieldValueTypeCollection to be created before my component fires. However, to my mind a package should be allowed/able to do that. The fact it's creating a side-effect that's breaking this methodology implies that it's this process that's wrong.

Simply updating the docs to InsertAfter<ExamineComponent,... is likely to stop this being an issue but it strikes me that this is still not the proper way to do it and that there should be a way to modify the FieldDefinitionCollection that the ExternalIndex uses before it is created (I will raise that as a separate issue and link to it when I have).

@JasonElkin JasonElkin changed the title Changing field value types in this way is unpredictable (and not technically supported?) Changing field value types using this method is unpredictable (and not technically supported?) Apr 13, 2021
@JasonElkin JasonElkin reopened this Apr 13, 2021
@marcemarc
Copy link
Contributor

Hi Jason

Thanks for raising this issue!

What's the other package you have installed that's possibly also accessing the index?

Agree if it's as simple as warning people to compose after the initial ExamineComponent, to avoid the issue, then that's easy to add to the page as a warning, but what you're saying hints at possibly something that should be addressed in the core architecture, so we can see if HQ have any thoughts on this.

But knowing the additional package will give people the chance to try to replicate the problem.

regards

Marc

@JasonElkin
Copy link
Contributor Author

Thanks @marcemarc I still haven't got to the bottom of exactly which package yet.

There are a number of public methods/APIs that could cause the FieldValueTypeCollection to resolve though - e.g. https://github.com/umbraco/Umbraco-CMS/blob/d428a4543f33bb7094cf7db5f6b6fdc2d1de3063/src/Umbraco.Examine/ExamineExtensions.cs#L179

I'm working on an MVP repo to replicate.

@marcemarc
Copy link
Contributor

Thanks @JasonElkin let us know what you discover! and then we can decide if it can be solved by documentation alone!

@JasonElkin
Copy link
Contributor Author

The package in question looks to be Our.Umbraco.Nexu...
Installed I get the error, uninstalled I don't - but I don't see anything in the package that does anything with the ExternalIndex

@dawoe am I missing something? Is there anything in the package that touches the ExternalIndex that you're aware of? I guess there could be a method provided by Umbraco that's using the External Index somehow behind the scenes.

@Shazwazza
Copy link
Contributor

Within Examine, there's the FieldValueTypeCollection which is resolved lazily, however this isn't the lazy resolution that matters. When this is resolved, the instance of FieldValueTypeCollection is created with the mutable FieldDefinitionCollection. This FieldDefinitionCollection can be changed at any point up until the FieldValueTypeCollection._resolvedValueTypes is resolved. This must be locked and immutable once indexing starts because you cannot change an indexes underlying data type once they are there without rebuilding the index. The FieldValueTypeCollection._resolvedValueTypes is only resolved at the last possible time: When an indexing operation is executed (i.e.. when AddDocument is called) or when a Search is actually executed.

I understand there are public APIs that can cause the lazy collection to be resolved but it was never intended for any indexing or search operations to ever be performed during bootup and I would still question why that is occurring since during bootup typically things shouldn't execute (i realize there are edge cases). Else possibly there's an underlying issue in Umbraco that in some circumstances could cause this to be resolved before user components.

It is possible to inject explicit field definitions for indexes via their ctor. This was the original intended way to work but the dictionary modification approach was added later so that it could be a little friendlier to adjust without taking control over the construction of the indexes. In Umbraco that would be possible by replacing the IUmbracoIndexesCreator which is a heavier operation than most would want to do.

Re: your comment here https://twitter.com/jasonelkin86/status/1381973612656128001

Could we not pop a custom collection of Field Definitions in DI and inject them here alongside the UmbracoFieldDefinitionCollection?

That is an approach that can be taken but in order to determine which field sets get injected into specific indexes I would imagine that would need to be part of another service. I suppose ideally, the field definitions per index could be returned as a method as part of IUmbracoIndexConfig which is already injected into the UmbracoIndexCreator. But due to breaking changes, we'd need to create a IUmbracoIndexConfig2 and then type cast 😭 but these are the problems we constantly face with breaking API changes. Doing something like this is certainly the safest way to achieve the desired outcome and would still be a pretty friendly approach (I'll ensure something like this is the case for netcore for sure).

As for the Examine API, I don't see how the lazy resolution of the value types can be any lazier. They are only ever resolved when absolutely necessary so if something is forcing them to resolve to early - like executing an index or search operation during the boot phase, then I think we need to find out why that code is doing such a thing. Perhaps the Examine APIs can be adjust to throw an exception if this collection is attempted to be modified after the inner _resolvedValueTypes is resolved to make it clear that it's too late to modify. At least that would help mitigate hidden issues.

This test shows what is occurring:

[Test]
public void Cannot_Customize_Index_Field_Type_After_Index_Initialized()
{
    using (var luceneDir = new RandomIdRAMDirectory())
    using (var indexer = new TestIndex(
        new FieldDefinitionCollection(),
        luceneDir,
        new StandardAnalyzer(Version.LUCENE_30)))
    {
        // Acquiring a searcher resolves the FieldValueTypeCollection
        using (var s = (LuceneSearcher)indexer.GetSearcher())
        {
            // AddDocument resolves the FieldValueTypeCollection._resolvedValueTypes
            indexer.IndexItem(new ValueSet(1.ToString(), "content",
                new Dictionary<string, IEnumerable<object>>
                {
                    {"counter", new List<object>(new object[] {123456})}
                }));

            // Modifying the index value types now is too late
            // But this does not throw
            // TODO: It probably should
            indexer.FieldDefinitionCollection.AddOrUpdate(new FieldDefinition("counter", FieldDefinitionTypes.Integer));

            var rangeQuery = s.CreateQuery().RangeQuery<int>(new[] { "counter" }, 123455, 123457);                    
            var queryResult = rangeQuery.Execute();
            // Throws!
            // System.InvalidOperationException :
            // Could not perform a range query on the field counter, it's value type is Examine.LuceneEngine.Indexing.FullTextType
            Assert.Throws<InvalidOperationException>(() =>
            {
                // resolving the TotalItemCount or iterating the results is what fundamentally executes the query
                var itemCount = queryResult.TotalItemCount;
            });
        }
    }
}

Let me know your thoughts.

Side note 1 - if you want to index and use Dates, don't use long, just use DateTime, see https://shazwazza.github.io/Examine/configuration.html#default-value-types

Side note 2 - an easy way to make your composer and therefore your component execute after another one is to use ComposeAfter https://our.umbraco.com/documentation/implementation/composing/#composebefore-and-composeafter

@dawoe
Copy link
Contributor

dawoe commented Apr 14, 2021

Hi @JasonElkin @Shazwazza

My nexu package does not touch any examine index or related code for that matter. It only hooks into the content saved event.

I also just checked a site where I have nexu installed and where I add custom fields to a index. I am not experiencing the issue you are facing.

Dave

@JasonElkin
Copy link
Contributor Author

Thanks @dawoe for taking a look. I can't replicate it with the package elsewhere either. So in this case I suspect it's a side effect of component order or startup timings when the package happens to be present in this particular site.

Unfortunately that doesn't get me any closer to working out what is causing the index to init.

@JasonElkin
Copy link
Contributor Author

JasonElkin commented Apr 14, 2021

Thanks @Shazwazza that all makes a lot of sense.

I think defining any extra fields at the point the FieldDefinitionCollection is provided to the ctor is the simplest way to guarantee that this issue won't be encountered. I appreciate that such a change might not be worth it for v8.

Using ComposeAfter combined with composition.Components().InsertAfter<ExamineComponent,... should minimise the chances of this happening in the wild enough (and I will make a PR to the docs as such).

We can't guarantee that another composer/component won't get executed first (i.e. the scenario below, which will force the error) but the chances should be pretty slim 🙏

    [ComposeAfter(typeof(ExamineComposer))]
    public class MyExamineComposer : IUserComposer
    {
        public void Compose(Composition composition)
        {
            composition.Components().InsertAfter<ExamineComponent, MyComponentThatAddsFields>();
        }
    }

    [ComposeAfter(typeof(ExamineComposer))]
    public class EvilComposer : IUserComposer
    {
        public void Compose(Composition composition)
        {
            composition.Components().InsertAfter<ExamineComponent, EvilComponentThatBreaksThings>();
        }
    }

Re side note 1, yes, that's what I'd normally do (this is inherited) though there are other custom fields in this solution too this is just the first one that causes an error.

@umbrabot
Copy link

umbrabot commented Mar 7, 2022

Hiya @JasonElkin,

Just wanted to let you know that we noticed that this issue got a bit stale and might not be relevant any more.

We will close this issue for now but we're happy to open it up again if you think it's still relevant (for example: it's a feature request that's not yet implemented, or it's a bug that's not yet been fixed).

To open it this issue up again, you can write @umbrabot still relevant in a new comment as the first line. It would be super helpful for us if on the next line you could let us know why you think it's still relevant.

For example:

@umbrabot still relevant
This bug can still be reproduced in version x.y.z

This will reopen the issue in the next few hours.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@umbrabot umbrabot closed this as completed Mar 7, 2022
@umbrabot umbrabot removed the state/needs-investigation This requires input from HQ or community to proceed label Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants