Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reference by class names #409

Merged
merged 7 commits into from
Nov 29, 2022
Merged

Implement reference by class names #409

merged 7 commits into from
Nov 29, 2022

Conversation

VincentLanglet
Copy link
Contributor

@VincentLanglet VincentLanglet commented Oct 12, 2022

Hi @greg0ire

Closes #408

I introduce a $class param in a BC way:

  • If you use it, phpstan/psalm understand the return type
  • If you don't use it, it still works.

In next major, the idea would be to make the class param mandatory.
This way, having two fixtures with the same name but different class would be allowed.

@VincentLanglet VincentLanglet marked this pull request as ready for review October 12, 2022 14:18
@greg0ire
Copy link
Member

You should retarget to 1.6.x

@VincentLanglet VincentLanglet changed the base branch from 1.5.x to 1.6.x October 13, 2022 08:37
@VincentLanglet VincentLanglet changed the base branch from 1.6.x to 1.5.x October 13, 2022 08:37
@VincentLanglet VincentLanglet changed the base branch from 1.5.x to 1.6.x October 19, 2022 09:49
@VincentLanglet
Copy link
Contributor Author

VincentLanglet commented Oct 19, 2022

You should retarget to 1.6.x

It should be better now @greg0ire

Deprecation::trigger(
'doctrine/data-fixtures',
'https://github.com/doctrine/data-fixtures/pull/409',
'Argument $class of %s() will be mandatory in DoctrineDataFixtures 2.0.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DoctrineDataFixtures case is typically used for the bundle. I'd remove that name anyway:

Suggested change
'Argument $class of %s() will be mandatory in DoctrineDataFixtures 2.0.',
'Argument $class of %s() will be mandatory in 2.0.',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also fixed the tests

@greg0ire
Copy link
Member

greg0ire commented Oct 22, 2022

This should help with some of the failures: #412 (and means you need to rebase)

@VincentLanglet
Copy link
Contributor Author

Rebased done

@greg0ire
Copy link
Member

vendor/bin/phpcbf should help

@VincentLanglet
Copy link
Contributor Author

vendor/bin/phpcbf

Done, also updated the baseline

@greg0ire
Copy link
Member

There is an example call of getReference in the docs, you need to update it to show the correct usage.

@VincentLanglet
Copy link
Contributor Author

There is an example call of getReference in the docs, you need to update it to show the correct usage.

Nice catch. I updated the doc.

I didn't added use App/.../Role since there was none for User.

@VincentLanglet VincentLanglet changed the title Try to implement reference by class names Implement reference by class names Nov 6, 2022
@VincentLanglet
Copy link
Contributor Author

Friendly ping @stof If you have time to take a new look :)

@VincentLanglet
Copy link
Contributor Author

VincentLanglet commented Nov 29, 2022

I get no news from stof, Do you think it can be merged @greg0ire ?

@greg0ire greg0ire dismissed stof’s stale review November 29, 2022 19:35

The requested changes have been addressed

@greg0ire greg0ire merged commit fb3fa64 into doctrine:1.6.x Nov 29, 2022
@greg0ire greg0ire added this to the 1.6.0 milestone Nov 29, 2022
@greg0ire
Copy link
Member

Sure! Thanks @VincentLanglet !

@VincentLanglet
Copy link
Contributor Author

VincentLanglet commented Dec 22, 2022

You're welcome. Do you mind releasing the 1.6.0 version, or another PR/issue is needed first ? @greg0ire

Also should I provide a PR on 2.x branch to solve the deprecation (into BC-break) I introduced ?

@VincentLanglet VincentLanglet deleted the references branch December 22, 2022 13:33
@greg0ire
Copy link
Member

Please do provide a 2.x PR. I'm AFK, but will try releasing from my phone.

@janklan
Copy link

janklan commented Feb 7, 2023

I'd like to report a shortfall in a scenario where class inheritance is at play. This is a problem when using Doctrine's inheritance mapping.

Consider the example inheritance mentioned on Doctrine's page:

class Person {}
class Employee extends Person {}
class Boss extends Person {}

When you generate a lot of Employees and Boss entities, you're later unable to fetch those entities without also knowing which exact class are you looking for. In other words, you can't do (pseudo-code!):

$this->setReference('person1', new Employee());
$this->getReference('person1', Person::class);

This is because the fixtures are stored in an array indexed by the object's direct class name in ReferenceRepository::$referencesByClass. I think this approach is wrong.

I created my own solution to achieve the intended objective some time ago. A snippet of an abstract class that other Fixture-generating classes extend is below:

<?php

use Doctrine\Bundle\FixturesBundle\Fixture;

abstract class AbstractFixture extends Fixture
{
  /**
     * @template T of object
     * @param string $handle
     * @param class-string<T> $expectedType
     * @return T
     */
    protected function getTypedReference(string $handle, string $expectedType): mixed
    {
        $object = $this->getReference($handle, $expectedType);
        if (!$object instanceof $expectedType) {
            throw new \InvalidArgumentException(sprintf('Handle %s is %s, not %s', $handle, get_debug_type($object), $expectedType));
        }

        return $object;
    }
}

Unless I'm missing something, using if (!$object instanceof $expectedType) { yields a comparable outcome with greater flexibility compared to the method proposed in this PR. It doesn't allow duplicate key usage, but I don't see that as a problem comparing to the negative outcome described above.

I would like to suggest the proposed change might need to be reconsidered.

@VincentLanglet
Copy link
Contributor Author

VincentLanglet commented Feb 7, 2023

When you generate a lot of Employees and Boss entities, you're later unable to fetch those entities without also knowing which exact class are you looking for. In other words, you can't do (pseudo-code!):

$this->setReference('person1', new Employee());
$this->getReference('person1', Person::class);

Unless I'm missing something, using if (!$object instanceof $expectedType) { yields a comparable outcome with greater flexibility compared to the method proposed in this PR. It doesn't allow duplicate key usage, but I don't see that as a problem comparing to the negative outcome described above.

Duplicate key usage is one useful feature of this change. I see no reason to lost this when you can easily do

$this->getReference('person1', Employee::class);

Which seems to be the right way to fetch the fixture since Employee and Boss won't have the same API.

But maybe some others solutions exists which allow both to cohabit, like allowing to call

$this->setReference('person1', new Employee(), Person::class);

@janklan
Copy link

janklan commented Feb 7, 2023

Duplicate key usage is one useful feature of this change. I see no reason to lost this when you can easily do

That's a bold assumption probably specific to your particular use case

I see it exactly the other way:

  1. Make your keys unique so you always know what are you getting exactly what you set, without having to create a composite primary key made from the key and its class name
  2. Don't rely on a class name to fish out the correct entity - this problem only exists because duplicate keys are allowed, and introduces an artificial constraint I'm talking about.

Which seems to be the right way to fetch the fixture since Employee and Boss won't have the same API.

Not true. While there would be differences, part of the API can overlap. That's the whole point of class inheritance. An example would be a comment system that accepts a Person class to record who authored a Comment entity. The comment can be authored by both Boss and Employee or any other child of the Person entity. The overlap in the API could be, for example, a Person::getName(). Every person has a name and the comment system wants to show it next to the comment.

Your fixtures may hold 100 references to random objects of Person child classes, indexed by person1..100. You may randomly select one of those objects to generate a ton of Comment fixtures. When doing $this->getReference('person'.random_int(1, 100)), you can't know what class name to ask for, and frankly, at that time you don't care, because the Comment entity doesn't care.

The above is a simplified illustrative scenario explaining my point, but I have a comparable real-world use case. That's how I discovered the behaviour in the first place. I didn't make it up to be difficult in a random PR.

Come to think of it, I could think of a use case where you are asking for an interface, not even a specific class. The instanceof check just works way better than just recording the ultimate class name and calling it a name.

But maybe some others solutions exist which allow both to cohabit.

Yes: I'd suggest we could walk back the decision of making the $class attribute mandatory. If not provided, no class checks take place and things go back to the way they were.

@VincentLanglet
Copy link
Contributor Author

Yes: I'd suggest we could walk back the decision of making the $class attribute mandatory. If not provided, no class checks take place and things go back to the way they were.

You're still not developing how to keep the duplicate key feature.

One way could be to change the storage from

$data[$class][$name]

to

$data[$name][$class]

If you call getReference('foo', Foo::class) it still works,
and if you call getReference('foo'), it could have the following behavior:

  • If there is only one fixture with the foo name, it returns this one
  • If there is multiple fixtures with the foo name, it throws an exception because the class is required

WDYT ?

@janklan
Copy link

janklan commented Feb 7, 2023

You're still not developing how to keep the duplicate key feature.

That's because I truly believe it's not a good idea to allow duplicate keys.

and if you call getReference('foo'), it could have the following behavior:

  • If there is only one fixture with the foo name, it returns this one
  • If there is multiple fixtures with the foo name, it throws an exception because the class is required

WDYT ?

I'd say that could work, depending on the implementation.

My thoughts are:

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class. Failing to do so, the developer could use any arbitrary string as the $class value and the repository would accept it. Without any further type checking, the combination of $name and $class truly forms a composite primary key without adding any additional type safety checks - the real reason why we ask for a class name in the first place, isn't that right? That's why I did it anyway: to be able to fetch stuff from the repository in a way allowing PHPStorm, phpstan and others to correctly determine the returned object's type, while also actually asserting it in the getter's code.
  2. If $class was set when setting the reference, setReferenceIdentity should ensure the correctness of the user's input by testing either $identity instanceof $class or is_a($identity, $class, true) - this is to mitigate the problem described in the point above. I also think the manual $class attribute should be only used as a signal for the setter to check the type. I believe the $identity::class should be used as the key, as described above.
  3. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.
  4. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.
  5. If the class name is not provided and there is only one element in $data[$name][$class], getReference() wouldn't do any type checking.

This would allow one of the two scenarios:

  1. fetching the only element in $data[$name] using getReference('person1', Person::class) even if the 'person1' was an Employee
  2. allow using duplicate keys as long as the $identity::class would be different.

Please note none of the variants discussed allow using duplicate keys if the class doesn't differ:

$john = new Person();
$doe = new Person();
$registry->setReference('person', $john); // sets john
$registry->setReference('person', $doe); // overwrites john

I repeat, I believe allowing duplicate keys lacks merit.

@VincentLanglet
Copy link
Contributor Author

You're still not developing how to keep the duplicate key feature.

That's because I truly believe it's not a good idea to allow duplicate keys.

That's your point of view and I disagree with it.
I can say the same, it's not a good idea to allow fetching a reference without knowing the class you're fetching.
When you're doing a database request, I wonder if you're writing

$this->entityManager->find(42);

or

$this->entityManager->find(Foo:class, 42);
$this->entityManager->getRepository(Foo::class)->find(42);

While I proposed solutions to allow both, you're just looking for solving your debatable use-case and remove another feature. This mindset doesn't encourage me to keep the discussion.

I'd say that could work, depending on the implementation.

Feel free to provide one.

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class. Failing to do so, the developer could use any arbitrary string as the $class value and the repository would accept it. Without any further type checking, the combination of $name and $class truly forms a composite primary key without adding any additional type safety checks - the real reason why we ask for a class name in the first place, isn't that right? That's why I did it anyway: to be able to fetch stuff from the repository in a way allowing PHPStorm, phpstan and others to correctly determine the returned object's type, while also actually asserting it in the getter's code.
  2. If $class was set when setting the reference, setReferenceIdentity should ensure the correctness of the user's input by testing either $identity instanceof $class or is_a($identity, $class, true) - this is to mitigate the problem described in the point above. I also think the manual $class attribute should be only used as a signal for the setter to check the type. I believe the $identity::class should be used as the key, as described above.

Not sure to understand.

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

  1. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.

There is no need for assertion if you're accessing

$data['foo'][Bar::class];

with the call getReference('foo', Bar::class).

  1. If the class name is not provided and there is only one element in $data[$name][$class], getReference() wouldn't do any type checking.

This would allow one of the two scenarios:

  1. fetching the only element in $data[$name] using getReference('person1', Person::class) even if the 'person1' was an Employee

This would be the weirdest behavior to have:

$john = new Employee();
$registry->setReference('john', $john)
$registry->getReference('john') // work
$registry->getReference('john', Person::class) // work
$registry->getReference('john', Employee::class) // work
$registry->getReference('john', Dog::class) // work since (5) you don't do type checking, this seems wrong
$john = new Dog();
$registry->setReference('john', $john)
$registry->getReference('john') // doesn't work
$registry->getReference('john', Person::class) // doesn't work either, this one doesn't seems natural since it worked earlier
$registry->getReference('john', Employee::class) // work
$registry->getReference('john', Dog::class) // work

Please note none of the variants discussed allow using duplicate keys if the class doesn't differ:

$john = new Person();
$doe = new Person();
$registry->setReference('person', $john); // sets john
$registry->setReference('person', $doe); // overwrites john

I repeat, I believe allowing duplicate keys lacks merit.

This behavior is already the existing one if you look at the method

* Set the reference entry identified by $name
* and referenced to $reference. If $name
* already is set, it overrides it

And if you're using the addReference method

public function addReference($name, $object)

You'll get

$john = new Person();
$doe = new Person();
$dog = new Dog();
$registry->addReference('foo', $john); // works
$registry->addReference('foo', $doe); // fail in 1.x and will fail in 2.x
$registry->addReference('foo', $dog); // fail in 1.x and won't fail in 2.x

@janklan
Copy link

janklan commented Feb 7, 2023

That's your point of view and I disagree with it. I can say the same, it's not a good idea to allow fetching a reference without knowing the class you're fetching.

Scalar primary key made of unique string value seems a simpler identifier than a composite key made of a non-unique string value and a non-unique class name, but hey, we both probably have good reasons for our views. Let's agree to disagree, not a big deal!

When you're doing a database request, I wonder if you're writing

$this->entityManager->find(42);

or

$this->entityManager->find(Foo:class, 42);
$this->entityManager->getRepository(Foo::class)->find(42);

While the first option clearly isn't really an option, the other depends on the use case. Sometimes I know what instance I'm trying to get, sometimes I don't. I regularly let Symfony & Doctrine work out what object should be constructed when I use Doctrine's entity inheritance. $entityManagerInterface->getRepository(Person::class)->find(1) will return whichever child of Person matches ID 1, be it Employee, Boss your Dog, or anything else.

The more frequent use case is when converting route parameters:

#[Route(path: '/{person}')]
public function read(Person $person): Response
{
  // $person can again be Employee, Boss or whatever else, I honestly don't care in some cases.
}

While I proposed solutions to allow both, you're just looking for solving your debatable use-case and remove another feature.

My use case is a documented fairly standard use of Doctrine's entity inheritance.

I'd say that could work, depending on the implementation.

Feel free to provide one.

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class ... [quote shortened]

Not sure to understand.

If I'm reading the code correctly, setReference('foo', $person, 'randomstring') will store $person under $data['person']['randomstring'], where the randomstring has got nothing to do with the actual class name of $person?

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

Great. From your wording was not clear whether the $class value mattered or not. My point is to make sure that getReference('foo', Person::class) would yield the only object even if it was stored under $data['foo'][Employee::class]

  1. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.

There is no need for assertion if you're accessing $data['foo'][Bar::class]; with the call getReference('foo', Bar::class).

See my point about setReference('foo', $person, 'randomstring'). Unless you assert the type somewhere, it can't be guaranteed. Asserting it in setReference makes more sense, but the code isn't currently doing it.

This behavior is already the existing one if you look at the method

I missed that, apologies.

@VincentLanglet
Copy link
Contributor Author

Not sure to understand.

If I'm reading the code correctly, setReference('foo', $person, 'randomstring') will store $person under $data['person']['randomstring'], where the randomstring has got nothing to do with the actual class name of $person?

Allowing to pass a third parameter to the setReference was one of my first solution, but then I proposed another one which doesn't need this change. The fact we don't need a third parameter for setReference would solve all the issues related to the fact we could pass anything.

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

Great. From your wording was not clear whether the $class value mattered or not. My point is to make sure that getReference('foo', Person::class) would yield the only object even if it was stored under $data['foo'][Employee::class]

That's a debatable point.

The first thing I have in mind was to allow call like getReference('foo') but getReference('foo', Person::class) is more tricky. See the following example:

setReference('foo', new Employe()); // Works
getReference('foo'); // Works
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Doesn't work with my implementation but you want this to work

setReference('foo', new Dog()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Is it supposed to work ?

setReference('foo', new Boss()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Won't work

setReference('foo', new Person()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Works with my implementation but is it supposed to works with yours ?

Maybe the solution would be to play with is_a method:

  • First look at $data[$name][$class]
  • Then, if nothing is found,
$values = [];
foreach ($data[$name] as $value) {
     if (is_a($value, $class)) {
          $values[] = $value;
     }
}
if (count($values) > 1) {
     throw NotUniqueResultException();
}
if (count($values) === 0) {
     throw NoResultException();
}

return $values[0];

@janklan
Copy link

janklan commented Feb 7, 2023

$values = [];
foreach ($data[$name] as $value) {
     if (is_a($value, $class)) {
          $values[] = $value;
     }
}

I think iterating the whole array all the time could be slow with a large number of fixtures.

Overnight I realised another downside: if you rely on a class name, you make the implicit assumption a fixture is an object. I can't say I do it myself, but the current system lets you store anything, including scalars. I think we should not throw that away.

I came up with a slightly different approach: storing fixtures in a single array twice: under $key to maintain the current behaviour where things can get overwritten unless using unique keys, and then also under a "namespaced" $key.get_debug_tyope($value), where you can use duplicate $key value as long as the type of $value differs between identical keys.

Have a look at a working demo, https://3v4l.org/HuZD6#v8.2.2 - I added inline comments to explain what I thought needs it. The end of the code, labeled as Part 3, shows some interesting improvements that I don't think would work with what this PR proposes.

In the example, if you use duplicate $key on multiple $value objects of the same type, the key is overwritten. We could easily add more checks to prevent it. I just wanted to keep the PoC simple.

What do you think?

We could also take this chat to a more real-time medium. We're clearly both interested in finding a good solution.

@VincentLanglet
Copy link
Contributor Author

$values = [];
foreach ($data[$name] as $value) {
     if (is_a($value, $class)) {
          $values[] = $value;
     }
}

I think iterating the whole array all the time could be slow with a large number of fixtures.

I was worried about this too, but since we're only iterating on the fixture with the name $name, I thought this could be ok.

Overnight I realised another downside: if you rely on a class name, you make the implicit assumption a fixture is an object. I can't say I do it myself, but the current system lets you store anything, including scalars. I think we should not throw that away.

The fixture is always an object according to the phpdoc
https://github.com/doctrine/data-fixtures/blob/1.6.x/src/ReferenceRepository.php#L109 no ?

@janklan
Copy link

janklan commented Feb 7, 2023

The fixture is always an object according to the phpdoc
https://github.com/doctrine/data-fixtures/blob/1.6.x/src/ReferenceRepository.php#L109 no ?

Hmm, OK, it looks like that's the case. So scrap that "benefit" of being able to store scalars, it's probably a use case nobody needs. It just crossed my mind before I fell asleep.

The rest of the PoC remains valid, though.

@Pixelshaped
Copy link

Pixelshaped commented Feb 28, 2023

We've got another issue with this PR.

When we're calling $this->addReference($referenceName, $entity); it now calls $class = $this->getRealClass(get_class($object)); which is problematic for us.

Indeed, we have two EntityManagers configured. And it seems there is no way to indicate which one to create the reference on.

Beforehand, it did not refer to the ObjectManager hence we could add references from different EntityManagers with no issue (maybe that was undesired but now we're missing a method to add a reference to a specific EntityManager).

Our load method:

public function load(ObjectManager $manager): void
{
        $this->entityManager = $this->managerRegistry->getManager($this->getTargetEntityManagerName());

        $this->loadData(); // Creates entities and persists them and calls addReference() 

        $this->entityManager->flush();
}

getTargetEntityManagerName() is a custom method that our fixtures can override and it returns default by default and another EntityManager name when needed.

@VincentLanglet
Copy link
Contributor Author

I don't fully understand the issue. This PR is 5 months old, that's not the right place to discuss about issues.
Please open an issue instead to discuss about it (and ping me @Pixelshaped)

@derrabus
Copy link
Member

@VincentLanglet The 2.0.x branch is now open. Do you want to continue your work there?

@VincentLanglet
Copy link
Contributor Author

Sure Ill take a look

@VincentLanglet
Copy link
Contributor Author

@VincentLanglet The 2.0.x branch is now open. Do you want to continue your work there?

I made #464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to pass the class to getReference.
6 participants