Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

spring-projects-issues · 2018-06-01T14:15:13Z

Ludek Novotny opened DATAMONGO-1991 and commented

When @Field annotation is used to have different Mongo Document field name than bean field name, string distance (org.springframework.beans.PropertyMatches.calculateStringDistance) is calculated for all combinations of fields. If the distance is too big, name from annotation is used as fallback.

This causes a big performance hit in our application. Our solution was to implement cache in PropertyMatches but more permanent solution would be appreciated as we don't really want to maintain our version of spring-beans. We also believe the cache isn't the best solution. Is there a reason why field name from @Field isn't used with highest priority and string distance would be fallback?

This issue is somehow related to BATCH-1876. But our use case is with Mongo. Our application is running on Spring Boot 2.0.0-RELEASE

Reference URL: https://jira.spring.io/browse/BATCH-1876

The text was updated successfully, but these errors were encountered:

spring-projects-issues · 2018-06-01T15:08:12Z

Oliver Drotbohm commented

Can you please clarify in how far this affects Spring Data MongoDB? Given the information provided so far, I don't see any connection here.

I briefly checked and we only use PropertyMatches in case we cannot resolve a PropertyPath and have to prepare an exception message that's helpful. The use of @Field alone doesn't actually trigger that calculation

spring-projects-issues · 2018-06-05T14:38:06Z

Ludek Novotny commented

Yes, it's not caused by @Field alone. The calculation of string distance happens when exception PropertyReferenceException is being created. But the exception is caught in QueryMapper.getPath and null is returned. Yet the bean field is still correctly mapped probably using the name provided in @Field. So the distance in this case was calculated unnecessarily

spring-projects-issues · 2018-06-05T14:59:17Z

Oliver Drotbohm commented

Can you please provide more information on what code you're executing? It feels unusual that you have code that's triggering that exception repeatedly

spring-projects-issues · 2018-06-06T15:40:27Z

Ludek Novotny commented

This is the simplified example. We use an account entity which has around 70 fields, most of them annotated with @Field. Some fields are separate entities which can have 5-10 other annotated fields. We use @Field quite a lot.

import lombok.Data;
import org.springframework.data.mongodb.core.mapping.Document;
import org.springframework.data.mongodb.core.mapping.Field;

@Document(collection = "accounts")
@Data
public class AccountTest {    
   @Field("field1")
   private String fieldOne;    
}

Each entity is loaded from db and processed. We have to process millions unique entities several times and in process, we generate new ones. Let's say that in total, we have to load 100.000.000 entities from db. And this is the test we used to debug it. The first query doesn't trigger exception because we use field name. The second query triggers string distance calculation.

@RunWith(SpringRunner.class)
@SpringBootTest
public class FieldTest {
    
   @Autowired
   private MongoTemplate template;   

   @Before
   public void setup(){
       template.dropCollection("accounts");
   }
   
   @Test
   public void test(){
       AccountTest account = new AccountTest();
       account.setFieldOne("123");       
       template.save(account);

       List<AccountTest> list = template.find(Query.query(Criteria.where("fieldOne").is("123")), AccountTest.class);
       List<AccountTest> list2 = template.find(Query.query(Criteria.where("field1").is("123")), AccountTest.class);       
   }
}

spring-projects-issues · 2018-06-06T17:17:01Z

Oliver Drotbohm commented

Thanks for the detailed writeup, Ludek. I have a couple of follow-up questions:

Why is anyone actually issuing the second query in the first place. Nobody should. If you use the Criteria API refer to property names, always.
Are you triggering said query 100.000.000 times? If so, why?
If you have a use-case that's reading that amount of data, have you considered skipping the object-to-document layer completely and rather use the CollectionCallback etc

spring-projects-issues · 2018-06-07T09:04:07Z

Ludek Novotny commented

Oh, this test doesn't represent our backend, it's just something we put together to debug and identify the sequence of calls which leads to string distance calculation. Maybe I should have posted here the actual backend at the first place. Sorry about that. So this is how it actually works:

We get the Stream<Document> of all documents to be processed. One matching criteria batchId is used to identify a set of documents.

MongoTemplate template;
.......
MongoDatabase db = template.getDb();
MongoCollection<Document> collection = db.getCollection("account");
FindIterable<Document> cursor = collection.find(new BasicDBObject("batchId", batchId));
return StreamSupport.stream(cursor.spliterator(), false);

Each document from stream is then converted to account. We don't have custom Bson2Account converter. It relies only on Spring and driver.

MongoTemplate template;
.....
public Account ConvertBsonDocument2Account(Document accountObj) {
    return template.getConverter().read(Account.class, accountObj);
}

When the conversion happens, the distance is calculated

spring-projects-issues · 2021-01-06T15:50:22Z

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

mp911de · 2023-07-12T13:05:10Z

Superseded by spring-projects/spring-data-commons#2837

spring-projects-issues added status: waiting-for-feedback We need additional information before we can continue type: enhancement A general enhancement labels Dec 30, 2020

spring-projects-issues assigned odrotbohm Dec 30, 2020

spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jan 6, 2021

mp911de mentioned this issue Jul 16, 2021

MetadataBackedField#getPath seems to be very inefficient for field names that are not "simple" spring-projects/spring-data-r2dbc#619

Closed

christophstrobl mentioned this issue Jun 1, 2023

Introduce lightweight invalid property path resolution cache spring-projects/spring-data-commons#2837

Closed

mp911de added status: superseded An issue that has been superseded by another and removed status: waiting-for-triage An issue we've not yet triaged type: enhancement A general enhancement labels Jul 12, 2023

mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

spring-projects-issues commented Jun 1, 2018

spring-projects-issues commented Jun 1, 2018

spring-projects-issues commented Jun 5, 2018

spring-projects-issues commented Jun 5, 2018

spring-projects-issues commented Jun 6, 2018

spring-projects-issues commented Jun 6, 2018

spring-projects-issues commented Jun 7, 2018

spring-projects-issues commented Jan 6, 2021

mp911de commented Jul 12, 2023

Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

Comments

spring-projects-issues commented Jun 1, 2018

spring-projects-issues commented Jun 1, 2018

spring-projects-issues commented Jun 5, 2018

spring-projects-issues commented Jun 5, 2018

spring-projects-issues commented Jun 6, 2018

spring-projects-issues commented Jun 6, 2018

spring-projects-issues commented Jun 7, 2018

spring-projects-issues commented Jan 6, 2021

mp911de commented Jul 12, 2023