Subgraph best practices 1-4 (#682) (#687)

idalithb · marcusrein · benface · web-flow · commit cfda851a8ea6 · 2024-05-24T14:23:02.000-07:00
* Best practices 1-4 init commit.

* Best practices 1-4 added.

* Improved cookbook formatting and re-ordered cookbooks

* Fixed linting errors

* Reverted _meta.js to earlier

* Reverted _meta.js to earlier

Removed .gitgnore and pnpm_lock commit changes

* Update .gitignore

* Fixed pnpm_lock commit change

* Fixing pnpm_lock issues

* Changed file naming to be SEO optimized

* Update website/pages/en/cookbook/derivedFrom.mdx



* Update website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx



* Update website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx



* Update website/pages/en/cookbook/pruning.mdx



* Update website/pages/en/cookbook/derivedFrom.mdx



* Update website/pages/en/cookbook/pruning.mdx



* Update website/pages/en/cookbook/pruning.mdx



* Update website/pages/en/cookbook/pruning.mdx



* Updated naming of derivedFrom file to derivedfrom

* Rename derivedFrom.mdx to derivedfrom.mdx

---------

Co-authored-by: Marcus Rein &lt;64141593+marcusrein@users.noreply.github.com&gt;
Co-authored-by: Benoît Rouleau &lt;benoit.rouleau@icloud.com&gt;
diff --git a/website/pages/en/cookbook/_meta.js b/website/pages/en/cookbook/_meta.js
@@ -8,4 +8,8 @@ export default {
   grafting: '',
   'subgraph-uncrashable': '',
   'substreams-powered-subgraphs': '',
+  pruning: 'Subgraph Best Practice 1: Pruning with indexerHints',
+  derivedfrom: 'Subgraph Best Practice 2: Manage Arrays with @derivedFrom',
+  'immutable-entities-bytes-as-ids': 'Subgraph Best Practice 3: Using Immutable Entities and Bytes as IDs',
+  'avoid-eth-calls': 'Subgraph Best Practice 4: Avoid eth_calls',
 }
diff --git a/website/pages/en/cookbook/avoid-eth-calls.mdx b/website/pages/en/cookbook/avoid-eth-calls.mdx
@@ -0,0 +1,102 @@
+---
+title: Subgraph Best Practice 4 - Improve Indexing Speed by Avoiding eth_calls
+---
+
+## TLDR
+
+`eth_calls` are calls that can be made from a subgraph to an Ethereum node. These calls take a significant amount of time to return data, slowing down indexing. If possible, design smart contracts to emit all the data you need so you don’t need to use `eth_calls`.
+
+## Why Avoiding `eth_calls` Is a Best Practice
+
+Subgraphs are optimized to index event data emitted from smart contracts. A subgraph can also index the data coming from an `eth_call`, however, this can significantly slow down subgraph indexing as `eth_calls` require making external calls to smart contracts. The responsiveness of these calls relies not on the subgraph but on the connectivity and responsiveness of the Ethereum node being queried. By minimizing or eliminating eth_calls in our subgraphs, we can significantly improve our indexing speed.
+
+### What Does an eth_call Look Like?
+
+`eth_calls` are often necessary when the data required for a subgraph is not available through emitted events. For example, consider a scenario where a subgraph needs to identify whether ERC20 tokens are part of a specific pool, but the contract only emits a basic `Transfer` event and does not emit an event that contains the data that we need:
+
+```yaml
+event Transfer(address indexed from, address indexed to, uint256 value);
+```
+
+Suppose the tokens' pool membership is determined by a state variable named `getPoolInfo`. In this case, we would need to use an `eth_call` to query this data:
+
+```typescript
+import { Address } from '@graphprotocol/graph-ts'
+import { ERC20, Transfer } from '../generated/ERC20/ERC20'
+import { TokenTransaction } from '../generated/schema'
+
+export function handleTransfer(event: Transfer): void {
+  let transaction = new TokenTransaction(event.transaction.hash.toHex())
+
+  // Bind the ERC20 contract instance to the given address:
+  let instance = ERC20.bind(event.address)
+
+  // Retrieve pool information via eth_call
+  let poolInfo = instance.getPoolInfo(event.params.to)
+
+  transaction.pool = poolInfo.toHexString()
+  transaction.from = event.params.from.toHexString()
+  transaction.to = event.params.to.toHexString()
+  transaction.value = event.params.value
+
+  transaction.save()
+}
+```
+
+This is functional, however is not ideal as it slows down our subgraph’s indexing.
+
+## How to Eliminate `eth_calls`
+
+Ideally, the smart contract should be updated to emit all necessary data within events. For instance, modifying the smart contract to include pool information in the event could eliminate the need for `eth_calls`:
+
+```
+event TransferWithPool(address indexed from, address indexed to, uint256 value, bytes32 indexed poolInfo);
+```
+
+With this update, the subgraph can directly index the required data without external calls:
+
+```typescript
+import { Address } from '@graphprotocol/graph-ts'
+import { ERC20, TransferWithPool } from '../generated/ERC20/ERC20'
+import { TokenTransaction } from '../generated/schema'
+
+export function handleTransferWithPool(event: TransferWithPool): void {
+  let transaction = new TokenTransaction(event.transaction.hash.toHex())
+
+  transaction.pool = event.params.poolInfo.toHexString()
+  transaction.from = event.params.from.toHexString()
+  transaction.to = event.params.to.toHexString()
+  transaction.value = event.params.value
+
+  transaction.save()
+}
+```
+
+This is much more performant as it has eliminated the need for `eth_calls`.
+
+## How to Optimize `eth_calls`
+
+If modifying the smart contract is not possible and `eth_calls` are required, read “[Improve Subgraph Indexing Performance Easily: Reduce eth_calls](https://thegraph.com/blog/improve-subgraph-performance-reduce-eth-calls/)” by Simon Emanuel Schmid to learn various strategies on how to optimize `eth_calls`.
+
+## Reducing the Runtime Overhead of `eth_calls`
+
+For the `eth_calls` that can not be eliminated, the runtime overhead they introduce can be minimized by declaring them in the manifest. When `graph-node` processes a block it performs all declared `eth_calls` in parallel before handlers are run. Calls that are not declared are executed sequentially when handlers run. The runtime improvement comes from performing calls in parallel rather than sequentially - that helps reduce the total time spent in calls but does not eliminate it completely.
+
+Currently, `eth_calls` can only be declared for event handlers. In the manifest, write
+
+```yaml
+event: TransferWithPool(address indexed, address indexed, uint256, bytes32 indexed)
+handler: handleTransferWithPool
+calls:
+  ERC20.poolInfo: ERC20[event.address].getPoolInfo(event.params.to)
+```
+
+The portion highlighted in yellow is the call declaration. The part before the colon is simply a text label that is only used for error messages. The part after the colon has the form `Contract[address].function(params)`. Permissible values for address and params are `event.address` and `event.params.<name>`.
+
+The handler itself accesses the result of this `eth_call` exactly as in the previous section by binding to the contract and making the call. graph-node caches the results of declared `eth_calls` in memory and the call from the handler will retrieve the result from this in memory cache instead of making an actual RPC call.
+
+Note: Declared eth_calls can only be made in subgraphs with specVersion >= 1.2.0.
+
+## Conclusion
+
+We can significantly improve indexing performance by minimizing or eliminating `eth_calls` in our subgraphs.
diff --git a/website/pages/en/cookbook/derivedfrom.mdx b/website/pages/en/cookbook/derivedfrom.mdx
@@ -0,0 +1,73 @@
+---
+title: Subgraph Best Practice 2 - Improve Indexing and Query Responsiveness By Using @derivedFrom
+---
+
+## TLDR
+
+Arrays in your schema can really slow down a subgraph's performance as they grow beyond thousands of entries. If possible, the `@derivedFrom` directive should be used when using arrays as it prevents large arrays from forming, simplifies handlers, and reduces the size of individual entities, improving indexing speed and query performance significantly.
+
+## How to Use the `@derivedFrom` Directive
+
+You just need to add a `@derivedFrom` directive after your array in your schema. Like this:
+
+```graphql
+comments: [Comment!]! @derivedFrom(field: "post")
+```
+
+`@derivedFrom` creates efficient one-to-many relationships, enabling an entity to dynamically associate with multiple related entities based on a field in the related entity. This approach removes the need for both sides of the relationship to store duplicate data, making the subgraph more efficient.
+
+### Example Use Case for `@derivedFrom`
+
+An example of a dynamically growing array is a blogging platform where a “Post” can have many “Comments”.
+
+Let’s start with our two entities, `Post` and `Comment`
+
+Without optimization, you could implement it like this with an array:
+
+```graphql
+type Post @entity {
+  id: Bytes!
+  title: String!
+  content: String!
+  comments: [Comment!]!
+}
+
+type Comment @entity {
+  id: Bytes!
+  content: String!
+}
+```
+
+Arrays like these will effectively store extra Comments data on the Post side of the relationship.
+
+Here’s what an optimized version looks like using `@derivedFrom`:
+
+```graphql
+type Post @entity {
+  id: Bytes!
+  title: String!
+  content: String!
+  comments: [Comment!]! @derivedFrom(field: "post")
+}
+
+type Comment @entity {
+  id: Bytes!
+  content: String!
+  post: Post!
+}
+```
+
+Just by adding the `@derivedFrom` directive, this schema will only store the “Comments” on the “Comments” side of the relationship and not on the “Post” side of the relationship. Arrays are stored across individual rows, which allows them to expand significantly. This can lead to particularly large sizes if their growth is unbounded.
+
+This will not only make our subgraph more efficient, but it will also unlock three features:
+
+1. We can query the `Post` and see all of its comments.
+2. We can do a reverse lookup and query any `Comment` and see which post it comes from.
+
+3. We can use [Derived Field Loaders](/developing/graph-ts/api/#looking-up-derived-entities) to unlock the ability to directly access and manipulate data from virtual relationships in our subgraph mappings.
+
+## Conclusion
+
+Adopting the `@derivedFrom` directive in subgraphs effectively handles dynamically growing arrays, enhancing indexing efficiency and data retrieval.
+
+To learn more detailed strategies to avoid large arrays, read this blog from Kevin Jones: [Best Practices in Subgraph Development: Avoiding Large Arrays](https://thegraph.com/blog/improve-subgraph-performance-avoiding-large-arrays/).
diff --git a/website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx b/website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx
@@ -0,0 +1,176 @@
+---
+title: Subgraph Best Practice 3 - Improve Indexing and Query Performance by Using Immutable Entities and Bytes as IDs
+---
+
+## TLDR
+
+Using Immutable Entities and Bytes for IDs in our `schema.graphql` file [significantly improves ](https://thegraph.com/blog/two-simple-subgraph-performance-improvements/) indexing speed and query performance.
+
+## Immutable Entities
+
+To make an entity immutable, we simply add `(immutable: true)` to an entity.
+
+```graphql
+type Transfer @entity(immutable: true) {
+  id: Bytes!
+  from: Bytes!
+  to: Bytes!
+  value: BigInt!
+}
+```
+
+By making the `Transfer` entity immutable, graph-node is able to process the entity more efficiently, improving indexing speeds and query responsiveness.
+
+Immutable Entities structures will not change in the future. An ideal entity to become an Immutable Entity would be an entity that is directly logging on-chain event data, such as a `Transfer` event being logged as a `Transfer` entity.
+
+### Under the hood
+
+Mutable entities have a 'block range' indicating their validity. Updating these entities requires the graph node to adjust the block range of previous versions, increasing database workload. Queries also need filtering to find only live entities. Immutable entities are faster because they are all live and since they won't change, no checks or updates are required while writing, and no filtering is required during queries.
+
+### When not to use Immutable Entities
+
+If you have a field like `status` that needs to be modified over time, then you should not make the entity immutable. Otherwise, you should use immutable entities whenever possible.
+
+## Bytes as IDs
+
+Every entity requires an ID. In the previous example, we can see that the ID is already of the Bytes type.
+
+```graphql
+type Transfer @entity(immutable: true) {
+  id: Bytes!
+  from: Bytes!
+  to: Bytes!
+  value: BigInt!
+}
+```
+
+While other types for IDs are possible, such as String and Int8, it is recommended to use the Bytes type for all IDs due to character strings taking twice as much space as Byte strings to store binary data, and comparisons of UTF-8 character strings must take the locale into account which is much more expensive than the bytewise comparison used to compare Byte strings.
+
+### Reasons to Not Use Bytes as IDs
+
+1. If entity IDs must be human-readable such as auto-incremented numerical IDs or readable strings, Bytes for IDs should not be used.
+2. If integrating a subgraph’s data with another data model that does not use Bytes as IDs, Bytes as IDs should not be used.
+3. Indexing and querying performance improvements are not desired.
+
+### Concatenating With Bytes as IDs
+
+It is a common practice in many subgraphs to use string concatenation to combine two properties of an event into a single ID, such as using `event.transaction.hash.toHex() + "-" + event.logIndex.toString()`. However, as this returns a string, this significantly impedes subgraph indexing and querying performance.
+
+Instead, we should use the `concatI32()` method to concatenate event properties. This strategy results in a `Bytes` ID that is much more performant.
+
+```typescript
+export function handleTransfer(event: TransferEvent): void {
+  let entity = new Transfer(event.transaction.hash.concatI32(event.logIndex.toI32()))
+  entity.from = event.params.from
+  entity.to = event.params.to
+  entity.value = event.params.value
+
+  entity.blockNumber = event.block.number
+  entity.blockTimestamp = event.block.timestamp
+  entity.transactionHash = event.transaction.hash
+
+  entity.save()
+}
+```
+
+### Sorting With Bytes as IDs
+
+Sorting using Bytes as IDs is not optimal as seen in this example query and response.
+
+Query:
+
+```graphql
+{
+  transfers(first: 3, orderBy: id) {
+    id
+    from
+    to
+    value
+  }
+}
+```
+
+Query response:
+
+```json
+{
+  "data": {
+    "transfers": [
+      {
+        "id": "0x00010000",
+        "from": "0xabcd...",
+        "to": "0x1234...",
+        "value": "256"
+      },
+      {
+        "id": "0x00020000",
+        "from": "0xefgh...",
+        "to": "0x5678...",
+        "value": "512"
+      },
+      {
+        "id": "0x01000000",
+        "from": "0xijkl...",
+        "to": "0x9abc...",
+        "value": "1"
+      }
+    ]
+  }
+}
+```
+
+The IDs are returned as hex.
+
+To improve sorting, we should create another field on the entity that is a BigInt.
+
+```graphql
+type Transfer @entity {
+  id: Bytes!
+  from: Bytes! # address
+  to: Bytes! # address
+  value: BigInt! # unit256
+  tokenId: BigInt! # uint256
+}
+```
+
+This will allow for sorting to be optimized sequentially.
+
+Query:
+
+```graphql
+{
+  transfers(first: 3, orderBy: tokenId) {
+    id
+    tokenId
+  }
+}
+```
+
+Query Response:
+
+```json
+{
+  "data": {
+    "transfers": [
+      {
+        "id": "0x…",
+        "tokenId": "1"
+      },
+      {
+        "id": "0x…",
+        "tokenId": "2"
+      },
+      {
+        "id": "0x…",
+        "tokenId": "3"
+      }
+    ]
+  }
+}
+```
+
+## Conclusion
+
+Using both Immutable Entities and Bytes as IDs has been shown to markedly improve subgraph efficiency. Specifically, tests have highlighted up to a 28% increase in query performance and up to a 48% acceleration in indexing speeds.
+
+Read more about using Immutable Entities and Bytes as IDs in this blog post by David Lutterkort, a Software Engineer at Edge & Node: [Two Simple Subgraph Performance Improvements](https://thegraph.com/blog/two-simple-subgraph-performance-improvements/).
diff --git a/website/pages/en/cookbook/pruning.mdx b/website/pages/en/cookbook/pruning.mdx

Original file line number	Diff line number	Diff line change
`@@ -8,4 +8,8 @@ export default {`
`8`	`8`	`grafting: '',`
`9`	`9`	`'subgraph-uncrashable': '',`
`10`	`10`	`'substreams-powered-subgraphs': '',`
	`11`	`+ pruning: 'Subgraph Best Practice 1: Pruning with indexerHints',`
	`12`	`+ derivedfrom: 'Subgraph Best Practice 2: Manage Arrays with @derivedFrom',`
	`13`	`+ 'immutable-entities-bytes-as-ids': 'Subgraph Best Practice 3: Using Immutable Entities and Bytes as IDs',`
	`14`	`+ 'avoid-eth-calls': 'Subgraph Best Practice 4: Avoid eth_calls',`
`11`	`15`	`}`