Skip to content

Conversation

rthideaway
Copy link
Contributor

Splitting up new data producers as agreed in #859
This PR introduces Entity query data producer

@joaogarin joaogarin requested a review from fubhy October 10, 2019 14:41
public function resolve(string $type, int $limit = 10, ?int $offset = NULL, ?array $conditions = NULL, ?string $language = NULL, ?array $bundles = NULL, RefinableCacheableDependencyInterface $metadata): array {

// Make sure that max limit is not crossed.
if ($limit > static::SIZE_MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove that limit. It's up to the implementation to make sure it's not above a certain threshold.

}

// Make sure offset is zero or positive.
if (!isset($offset) || $offset < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use max() for this: https://www.php.net/manual/de/function.max.php

$offset = max($offset ?: 0, 0);

* id = "entity_query",
* name = @Translation("Load entities"),
* description = @Translation("Loads entities."),
* produces = @ContextDefinition("entities",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"entities" is not a typed data type. It actuall produces a list of string. So the context definition here should be a "multiple" (multiple => true) of "string".

$query->condition('langcode', $language);
}

foreach ($conditions as $condition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assoc arrays for the conditions is not ideal but I don't have a better solution atm.

@rthideaway
Copy link
Contributor Author

Addressed, please re-check

@rthideaway rthideaway requested a review from fubhy October 13, 2019 20:20
@codecov
Copy link

codecov bot commented Oct 14, 2019

Codecov Report

❗ No coverage uploaded for pull request base (8.x-4.x@794ce3c). Click here to learn what that means.
The diff coverage is 42.77%.

❗ Current head 73d02ee differs from pull request most recent head 37237ac. Consider uploading reports for the commit 37237ac to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             8.x-4.x     #911   +/-   ##
==========================================
  Coverage           ?   52.79%           
  Complexity         ?      648           
==========================================
  Files              ?      114           
  Lines              ?     1627           
  Branches           ?        0           
==========================================
  Hits               ?      859           
  Misses             ?      768           
  Partials           ?        0           
Impacted Files Coverage Δ Complexity Δ
src/Access/ExplorerAccessCheck.php 0.00% <0.00%> (ø) 2.00 <2.00> (?)
src/Access/VoyagerAccessCheck.php 0.00% <0.00%> (ø) 2.00 <2.00> (?)
src/Cache/Context/StaticCacheContext.php 0.00% <0.00%> (ø) 4.00 <4.00> (?)
src/Controller/SubrequestExtractionController.php 0.00% <0.00%> (ø) 1.00 <0.00> (?)
src/GraphQL/Buffers/EntityBuffer.php 80.76% <0.00%> (ø) 7.00 <0.00> (?)
src/GraphQL/Buffers/EntityRevisionBuffer.php 0.00% <0.00%> (ø) 7.00 <7.00> (?)
src/GraphQL/Resolver/Condition.php 0.00% <0.00%> (ø) 8.00 <8.00> (?)
src/GraphQL/Resolver/Context.php 0.00% <0.00%> (ø) 5.00 <5.00> (?)
src/GraphQL/Resolver/Map.php 0.00% <0.00%> (ø) 4.00 <4.00> (?)
src/GraphQL/Resolver/Path.php 0.00% <0.00%> (ø) 4.00 <4.00> (?)
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 794ce3c...37237ac. Read the comment docs.

@joaogarin joaogarin added the 4.x label Oct 15, 2019
Copy link
Contributor

@fubhy fubhy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this data producer is just too generic. If you need the results on an entity query, it sounds like you want to write a custom data producer that does exactly what you need. Modeling that with a data producer and abstract inputs just feels like the wrong approach and might lead to unreadable code and bad DX. Do you have a different opinion?

@fubhy
Copy link
Contributor

fubhy commented Oct 15, 2019

Instead, we could maybe come up with an abstract base class that can be used to create entity query data producers with ease. Also with functionality for pagination, etc. through helper methods on the base class. Thoughts?

@rthideaway
Copy link
Contributor Author

rthideaway commented Oct 15, 2019

The DX is actually not that bad, it's still quite simple and easy to write. Look what we use now:

// Job entity query.
$registry->addFieldResolver('Query', 'jobQuery',
  $builder->compose(
    $builder->produce('entity_query', [
      'type' => $builder->fromValue('node'),
      'limit' => $builder->fromArgument('limit'),
      'offset' => $builder->fromArgument('offset'),
      'language' => $builder->fromArgument('language'),
      'conditions' => $builder->fromArgument('filters'),
      'bundles' => $builder->fromValue(Job::JOB_TYPES),
    ]),
    $builder->produce('entity_load_multiple', [
      'type' => $builder->fromValue('node'),
      'ids' => $builder->fromParent(),
      'language' => $builder->fromArgument('language'),
    ])
  )
);

It's pretty clean code and also pretty clear what it does. The biggest downside here is the compose and duplicating arguments in the follow up data producer (type, language).

So what would be the benefit of having abstract class then? What's your vision here? Because I see it actually more complicated, but it's also possible I missed some point :) So I'd create my specific entity query data producer by extending the abstract one. I call the resolve method of abstract one to execute entity query and get ids. But as I want to have an entities I need to instantiate entity_load_multiple resolver and resolve ids to entities. This seems to be more complicated that what we have now. Have I missed something?

@fubhy
Copy link
Contributor

fubhy commented Oct 15, 2019

That validates my point actually. The code you just linked is not secure. You are opening of anyone to provide custom filters while also letting them choose the operator. We had a security issue for just that a while ago.

By the way, it would be advisable to use ENUMs for your filters all the way. Not sure if you do that atm.?

With this PR, you are basically just shadowing the normal Entity Query API by abstracting the nice method calls into harder to debug array structures. I'd really prefer an abstract class that easily allows people to roll their own custom entity query producers with custom input & output.

Such a custom producer could then also decide to directly load the entities instead of just returning the ids.

@klausi
Copy link
Contributor

klausi commented Oct 15, 2019

Agreed that this dataproducer is not secure by default, which is not good. My idea would be to add a context allowed_filters like this:

$registry->addFieldResolver('Query', 'jobQuery',
      $builder->compose(
        $builder->produce('entity_query', [
          'type' => $builder->fromValue('node'),
          'limit' => $builder->fromArgument('limit'),
          'offset' => $builder->fromArgument('offset'),
          'language' => $builder->fromArgument('language'),
          'conditions' => $builder->fromArgument('filters'),
          'bundles' => $builder->fromValue(Job::JOB_TYPES),
          'allowed_filters' => $builder->fromValue('title', 'field_job_region'),
        ]),
        $builder->produce('entity_load_multiple', [
          'type' => $builder->fromValue('node'),
          'ids' => $builder->fromParent(),
          'language' => $builder->fromArgument('language'),
        ])
      )
    );

That way an attacker can only filter by title and field_job_region on our jobs. Otherwise we can throw an exception. I don't think we need to restrict the operators. The operators are only a security issue if they are used on access restricted fields. But here we have an allow list of fields that can be filtered, so any operator is fine.

@fubhy
Copy link
Contributor

fubhy commented Oct 15, 2019

Agreed that this dataproducer is not secure by default, which is not good. My idea would be to add a context allowed_filters like this:

$registry->addFieldResolver('Query', 'jobQuery',
      $builder->compose(
        $builder->produce('entity_query', [
          'type' => $builder->fromValue('node'),
          'limit' => $builder->fromArgument('limit'),
          'offset' => $builder->fromArgument('offset'),
          'language' => $builder->fromArgument('language'),
          'conditions' => $builder->fromArgument('filters'),
          'bundles' => $builder->fromValue(Job::JOB_TYPES),
          'allowed_filters' => $builder->fromValue('title', 'field_job_region'),
        ]),
        $builder->produce('entity_load_multiple', [
          'type' => $builder->fromValue('node'),
          'ids' => $builder->fromParent(),
          'language' => $builder->fromArgument('language'),
        ])
      )
    );

That way an attacker can only filter by title and field_job_region on our jobs. Otherwise we can throw an exception. I don't think we need to restrict the operators. The operators are only a security issue if they are used on access restricted fields. But here we have an allow list of fields that can be filtered, so any operator is fine.

My point is that there's no good argument to shadow-wrap the entity query API using an abstract assoc-array configuration. Instead, let's just provide a convenient base class or trait that can be used to easily roll custom data producers with use-case-specific inputs.

@klausi
Copy link
Contributor

klausi commented Oct 15, 2019

The problem with an abstract base class is that you always need to write code when you want to use entity_query. IMO that is not necessary - if we can restrict the allowed filters then that seems like a good enough generic solution.

It is not perfect because it does not map out the condition filters as you say. Currently we are using our entity_query only in 2 places, so that might also be an indicator that we don't actually need it that often.

So I'm a bit divided here. I think we can do the 'allowed_filters' addition in this pull request, even if we don't merge it we can keep it as patch for jobiqo. Then leave the abstract base class for another PR if somebody wants to work on that.

@rthideaway rthideaway force-pushed the new-data-producers/entity-query branch from af0fbac to 502ae12 Compare May 19, 2020 10:02
@Kingdutch
Copy link
Contributor

I was actually looking for this (or an example to try and recreate it). I think better documentation on how to make specific classes for this would be better. I can see how opening up the entire Entity Query API like this without requiring the creation of specific code makes it easy to shoot yourself in the foot.

My biggest question is actually somewhat tangential (so not sure if this PR would be the best place to discuss). It's about how to handle the limits, since access controls could remove results which means that if 10 entities are requested you could end up only being able to load 8.

I think the pattern suggested in this PR (query first -> then load) may also suffer from this since we could have more query results than we're allowed to view. Having a single class that does both the querying and loading could take care of this since it could detect that it has loaded fewer entities than the limit and possibly load more.

@Kingdutch
Copy link
Contributor

Another thought: I believe it's common practice in GraphQL to use where as container for filter (e.g. users(where: { status: "active" })). So in this PR (and for anyone implementing similar producers manually) we may want to change the argument from filter to where :)

@Kingdutch
Copy link
Contributor

I implemented generic pagination support according to the Relay Connection specification in https://github.com/goalgorilla/open_social/blob/feature/graphql-base/modules/custom/social_graphql/src/Plugin/GraphQL/DataProducer/QueryEntityBase.php

This is used https://github.com/goalgorilla/open_social/blob/feature/graphql-base/modules/social_features/social_user/src/Plugin/GraphQL/DataProducer/QueryUser.php and can easily be re-used in other entity query based data producers. This leaves the logic of the extending class mostly free for needed filtering and possibly makes this PR unnecessary :)

*
* @DataProducer(
* id = "entity_query",
* name = @Translation("Load entities"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data producer doesn't actually load entities but only finds their IDs

Comment on lines +22 to +31
* "limit" = @ContextDefinition("integer",
* label = @Translation("Limit"),
* required = FALSE,
* default_value = 10
* ),
* "offset" = @ContextDefinition("integer",
* label = @Translation("Offset"),
* required = FALSE,
* default_value = 0
* ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset based queries in Drupal are somewhat difficult in Drupal because Drupal allows access handlers that can not be put into a database query. This means that access filtering will be done after the fact which could mean that 10 entities are selected but none are actually sent back to the client.

For this same reason, generic "count" queries are not often seen in GraphQL schemas. On one hand because GraphQL is created for data sets where this isn't pratcial, on the other hand because access a single access check may make a count inaccurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can happen in some cases. I think we should add that to the docs as well.

@klausi
Copy link
Contributor

klausi commented Sep 20, 2023

Since we have been using these 2 dataproducers successfully in jobiqo for years I think we should merge an improved version of this. I think this should be done in a new pull request to preserve the state of this one for historic reasons.

TODO:

  1. Add doc examples on the class comments how they can be used.
  2. Fix remarks from Kingdutch

@klausi
Copy link
Contributor

klausi commented Sep 21, 2023

Added some docs and fixed minor coding standard issues in #1360 , then merged it.

As I wanted to have this change in a single commit for better using it as a patch I merged the other pull request. Sorry for the loss on attribution @rthideaway !

Closing this pull request as well, please open follow-ups if there are still issues.

@klausi klausi closed this Sep 21, 2023
@rthideaway
Copy link
Contributor Author

Sorry for the loss on attribution @rthideaway !

Absolutely no worries! :) I fully understand and agree with the procedure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants