Skip to content

Commit cfda851

Browse files
idalithbmarcusreinbenface
authored
Subgraph best practices 1-4 (#682) (#687)
* Best practices 1-4 init commit. * Best practices 1-4 added. * Improved cookbook formatting and re-ordered cookbooks * Fixed linting errors * Reverted _meta.js to earlier * Reverted _meta.js to earlier Removed .gitgnore and pnpm_lock commit changes * Update .gitignore * Fixed pnpm_lock commit change * Fixing pnpm_lock issues * Changed file naming to be SEO optimized * Update website/pages/en/cookbook/derivedFrom.mdx * Update website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx * Update website/pages/en/cookbook/immutable-entities-bytes-as-ids.mdx * Update website/pages/en/cookbook/pruning.mdx * Update website/pages/en/cookbook/derivedFrom.mdx * Update website/pages/en/cookbook/pruning.mdx * Update website/pages/en/cookbook/pruning.mdx * Update website/pages/en/cookbook/pruning.mdx * Updated naming of derivedFrom file to derivedfrom * Rename derivedFrom.mdx to derivedfrom.mdx --------- Co-authored-by: Marcus Rein <[email protected]> Co-authored-by: Benoît Rouleau <[email protected]>
1 parent f36982e commit cfda851

File tree

5 files changed

+396
-0
lines changed

5 files changed

+396
-0
lines changed

website/pages/en/cookbook/_meta.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,8 @@ export default {
88
grafting: '',
99
'subgraph-uncrashable': '',
1010
'substreams-powered-subgraphs': '',
11+
pruning: 'Subgraph Best Practice 1: Pruning with indexerHints',
12+
derivedfrom: 'Subgraph Best Practice 2: Manage Arrays with @derivedFrom',
13+
'immutable-entities-bytes-as-ids': 'Subgraph Best Practice 3: Using Immutable Entities and Bytes as IDs',
14+
'avoid-eth-calls': 'Subgraph Best Practice 4: Avoid eth_calls',
1115
}
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: Subgraph Best Practice 4 - Improve Indexing Speed by Avoiding eth_calls
3+
---
4+
5+
## TLDR
6+
7+
`eth_calls` are calls that can be made from a subgraph to an Ethereum node. These calls take a significant amount of time to return data, slowing down indexing. If possible, design smart contracts to emit all the data you need so you don’t need to use `eth_calls`.
8+
9+
## Why Avoiding `eth_calls` Is a Best Practice
10+
11+
Subgraphs are optimized to index event data emitted from smart contracts. A subgraph can also index the data coming from an `eth_call`, however, this can significantly slow down subgraph indexing as `eth_calls` require making external calls to smart contracts. The responsiveness of these calls relies not on the subgraph but on the connectivity and responsiveness of the Ethereum node being queried. By minimizing or eliminating eth_calls in our subgraphs, we can significantly improve our indexing speed.
12+
13+
### What Does an eth_call Look Like?
14+
15+
`eth_calls` are often necessary when the data required for a subgraph is not available through emitted events. For example, consider a scenario where a subgraph needs to identify whether ERC20 tokens are part of a specific pool, but the contract only emits a basic `Transfer` event and does not emit an event that contains the data that we need:
16+
17+
```yaml
18+
event Transfer(address indexed from, address indexed to, uint256 value);
19+
```
20+
21+
Suppose the tokens' pool membership is determined by a state variable named `getPoolInfo`. In this case, we would need to use an `eth_call` to query this data:
22+
23+
```typescript
24+
import { Address } from '@graphprotocol/graph-ts'
25+
import { ERC20, Transfer } from '../generated/ERC20/ERC20'
26+
import { TokenTransaction } from '../generated/schema'
27+
28+
export function handleTransfer(event: Transfer): void {
29+
let transaction = new TokenTransaction(event.transaction.hash.toHex())
30+
31+
// Bind the ERC20 contract instance to the given address:
32+
let instance = ERC20.bind(event.address)
33+
34+
// Retrieve pool information via eth_call
35+
let poolInfo = instance.getPoolInfo(event.params.to)
36+
37+
transaction.pool = poolInfo.toHexString()
38+
transaction.from = event.params.from.toHexString()
39+
transaction.to = event.params.to.toHexString()
40+
transaction.value = event.params.value
41+
42+
transaction.save()
43+
}
44+
```
45+
46+
This is functional, however is not ideal as it slows down our subgraph’s indexing.
47+
48+
## How to Eliminate `eth_calls`
49+
50+
Ideally, the smart contract should be updated to emit all necessary data within events. For instance, modifying the smart contract to include pool information in the event could eliminate the need for `eth_calls`:
51+
52+
```
53+
event TransferWithPool(address indexed from, address indexed to, uint256 value, bytes32 indexed poolInfo);
54+
```
55+
56+
With this update, the subgraph can directly index the required data without external calls:
57+
58+
```typescript
59+
import { Address } from '@graphprotocol/graph-ts'
60+
import { ERC20, TransferWithPool } from '../generated/ERC20/ERC20'
61+
import { TokenTransaction } from '../generated/schema'
62+
63+
export function handleTransferWithPool(event: TransferWithPool): void {
64+
let transaction = new TokenTransaction(event.transaction.hash.toHex())
65+
66+
transaction.pool = event.params.poolInfo.toHexString()
67+
transaction.from = event.params.from.toHexString()
68+
transaction.to = event.params.to.toHexString()
69+
transaction.value = event.params.value
70+
71+
transaction.save()
72+
}
73+
```
74+
75+
This is much more performant as it has eliminated the need for `eth_calls`.
76+
77+
## How to Optimize `eth_calls`
78+
79+
If modifying the smart contract is not possible and `eth_calls` are required, read “[Improve Subgraph Indexing Performance Easily: Reduce eth_calls](https://thegraph.com/blog/improve-subgraph-performance-reduce-eth-calls/)” by Simon Emanuel Schmid to learn various strategies on how to optimize `eth_calls`.
80+
81+
## Reducing the Runtime Overhead of `eth_calls`
82+
83+
For the `eth_calls` that can not be eliminated, the runtime overhead they introduce can be minimized by declaring them in the manifest. When `graph-node` processes a block it performs all declared `eth_calls` in parallel before handlers are run. Calls that are not declared are executed sequentially when handlers run. The runtime improvement comes from performing calls in parallel rather than sequentially - that helps reduce the total time spent in calls but does not eliminate it completely.
84+
85+
Currently, `eth_calls` can only be declared for event handlers. In the manifest, write
86+
87+
```yaml
88+
event: TransferWithPool(address indexed, address indexed, uint256, bytes32 indexed)
89+
handler: handleTransferWithPool
90+
calls:
91+
ERC20.poolInfo: ERC20[event.address].getPoolInfo(event.params.to)
92+
```
93+
94+
The portion highlighted in yellow is the call declaration. The part before the colon is simply a text label that is only used for error messages. The part after the colon has the form `Contract[address].function(params)`. Permissible values for address and params are `event.address` and `event.params.<name>`.
95+
96+
The handler itself accesses the result of this `eth_call` exactly as in the previous section by binding to the contract and making the call. graph-node caches the results of declared `eth_calls` in memory and the call from the handler will retrieve the result from this in memory cache instead of making an actual RPC call.
97+
98+
Note: Declared eth_calls can only be made in subgraphs with specVersion >= 1.2.0.
99+
100+
## Conclusion
101+
102+
We can significantly improve indexing performance by minimizing or eliminating `eth_calls` in our subgraphs.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Subgraph Best Practice 2 - Improve Indexing and Query Responsiveness By Using @derivedFrom
3+
---
4+
5+
## TLDR
6+
7+
Arrays in your schema can really slow down a subgraph's performance as they grow beyond thousands of entries. If possible, the `@derivedFrom` directive should be used when using arrays as it prevents large arrays from forming, simplifies handlers, and reduces the size of individual entities, improving indexing speed and query performance significantly.
8+
9+
## How to Use the `@derivedFrom` Directive
10+
11+
You just need to add a `@derivedFrom` directive after your array in your schema. Like this:
12+
13+
```graphql
14+
comments: [Comment!]! @derivedFrom(field: "post")
15+
```
16+
17+
`@derivedFrom` creates efficient one-to-many relationships, enabling an entity to dynamically associate with multiple related entities based on a field in the related entity. This approach removes the need for both sides of the relationship to store duplicate data, making the subgraph more efficient.
18+
19+
### Example Use Case for `@derivedFrom`
20+
21+
An example of a dynamically growing array is a blogging platform where aPostcan have manyComments”.
22+
23+
Lets start with our two entities, `Post` and `Comment`
24+
25+
Without optimization, you could implement it like this with an array:
26+
27+
```graphql
28+
type Post @entity {
29+
id: Bytes!
30+
title: String!
31+
content: String!
32+
comments: [Comment!]!
33+
}
34+
35+
type Comment @entity {
36+
id: Bytes!
37+
content: String!
38+
}
39+
```
40+
41+
Arrays like these will effectively store extra Comments data on the Post side of the relationship.
42+
43+
Heres what an optimized version looks like using `@derivedFrom`:
44+
45+
```graphql
46+
type Post @entity {
47+
id: Bytes!
48+
title: String!
49+
content: String!
50+
comments: [Comment!]! @derivedFrom(field: "post")
51+
}
52+
53+
type Comment @entity {
54+
id: Bytes!
55+
content: String!
56+
post: Post!
57+
}
58+
```
59+
60+
Just by adding the `@derivedFrom` directive, this schema will only store the “Comments” on the “Comments” side of the relationship and not on the “Post” side of the relationship. Arrays are stored across individual rows, which allows them to expand significantly. This can lead to particularly large sizes if their growth is unbounded.
61+
62+
This will not only make our subgraph more efficient, but it will also unlock three features:
63+
64+
1. We can query the `Post` and see all of its comments.
65+
2. We can do a reverse lookup and query any `Comment` and see which post it comes from.
66+
67+
3. We can use [Derived Field Loaders](/developing/graph-ts/api/#looking-up-derived-entities) to unlock the ability to directly access and manipulate data from virtual relationships in our subgraph mappings.
68+
69+
## Conclusion
70+
71+
Adopting the `@derivedFrom` directive in subgraphs effectively handles dynamically growing arrays, enhancing indexing efficiency and data retrieval.
72+
73+
To learn more detailed strategies to avoid large arrays, read this blog from Kevin Jones: [Best Practices in Subgraph Development: Avoiding Large Arrays](https://thegraph.com/blog/improve-subgraph-performance-avoiding-large-arrays/).
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: Subgraph Best Practice 3 - Improve Indexing and Query Performance by Using Immutable Entities and Bytes as IDs
3+
---
4+
5+
## TLDR
6+
7+
Using Immutable Entities and Bytes for IDs in our `schema.graphql` file [significantly improves ](https://thegraph.com/blog/two-simple-subgraph-performance-improvements/) indexing speed and query performance.
8+
9+
## Immutable Entities
10+
11+
To make an entity immutable, we simply add `(immutable: true)` to an entity.
12+
13+
```graphql
14+
type Transfer @entity(immutable: true) {
15+
id: Bytes!
16+
from: Bytes!
17+
to: Bytes!
18+
value: BigInt!
19+
}
20+
```
21+
22+
By making the `Transfer` entity immutable, graph-node is able to process the entity more efficiently, improving indexing speeds and query responsiveness.
23+
24+
Immutable Entities structures will not change in the future. An ideal entity to become an Immutable Entity would be an entity that is directly logging on-chain event data, such as a `Transfer` event being logged as a `Transfer` entity.
25+
26+
### Under the hood
27+
28+
Mutable entities have a 'block range' indicating their validity. Updating these entities requires the graph node to adjust the block range of previous versions, increasing database workload. Queries also need filtering to find only live entities. Immutable entities are faster because they are all live and since they won't change, no checks or updates are required while writing, and no filtering is required during queries.
29+
30+
### When not to use Immutable Entities
31+
32+
If you have a field like `status` that needs to be modified over time, then you should not make the entity immutable. Otherwise, you should use immutable entities whenever possible.
33+
34+
## Bytes as IDs
35+
36+
Every entity requires an ID. In the previous example, we can see that the ID is already of the Bytes type.
37+
38+
```graphql
39+
type Transfer @entity(immutable: true) {
40+
id: Bytes!
41+
from: Bytes!
42+
to: Bytes!
43+
value: BigInt!
44+
}
45+
```
46+
47+
While other types for IDs are possible, such as String and Int8, it is recommended to use the Bytes type for all IDs due to character strings taking twice as much space as Byte strings to store binary data, and comparisons of UTF-8 character strings must take the locale into account which is much more expensive than the bytewise comparison used to compare Byte strings.
48+
49+
### Reasons to Not Use Bytes as IDs
50+
51+
1. If entity IDs must be human-readable such as auto-incremented numerical IDs or readable strings, Bytes for IDs should not be used.
52+
2. If integrating a subgraphs data with another data model that does not use Bytes as IDs, Bytes as IDs should not be used.
53+
3. Indexing and querying performance improvements are not desired.
54+
55+
### Concatenating With Bytes as IDs
56+
57+
It is a common practice in many subgraphs to use string concatenation to combine two properties of an event into a single ID, such as using `event.transaction.hash.toHex() + "-" + event.logIndex.toString()`. However, as this returns a string, this significantly impedes subgraph indexing and querying performance.
58+
59+
Instead, we should use the `concatI32()` method to concatenate event properties. This strategy results in a `Bytes` ID that is much more performant.
60+
61+
```typescript
62+
export function handleTransfer(event: TransferEvent): void {
63+
let entity = new Transfer(event.transaction.hash.concatI32(event.logIndex.toI32()))
64+
entity.from = event.params.from
65+
entity.to = event.params.to
66+
entity.value = event.params.value
67+
68+
entity.blockNumber = event.block.number
69+
entity.blockTimestamp = event.block.timestamp
70+
entity.transactionHash = event.transaction.hash
71+
72+
entity.save()
73+
}
74+
```
75+
76+
### Sorting With Bytes as IDs
77+
78+
Sorting using Bytes as IDs is not optimal as seen in this example query and response.
79+
80+
Query:
81+
82+
```graphql
83+
{
84+
transfers(first: 3, orderBy: id) {
85+
id
86+
from
87+
to
88+
value
89+
}
90+
}
91+
```
92+
93+
Query response:
94+
95+
```json
96+
{
97+
"data": {
98+
"transfers": [
99+
{
100+
"id": "0x00010000",
101+
"from": "0xabcd...",
102+
"to": "0x1234...",
103+
"value": "256"
104+
},
105+
{
106+
"id": "0x00020000",
107+
"from": "0xefgh...",
108+
"to": "0x5678...",
109+
"value": "512"
110+
},
111+
{
112+
"id": "0x01000000",
113+
"from": "0xijkl...",
114+
"to": "0x9abc...",
115+
"value": "1"
116+
}
117+
]
118+
}
119+
}
120+
```
121+
122+
The IDs are returned as hex.
123+
124+
To improve sorting, we should create another field on the entity that is a BigInt.
125+
126+
```graphql
127+
type Transfer @entity {
128+
id: Bytes!
129+
from: Bytes! # address
130+
to: Bytes! # address
131+
value: BigInt! # unit256
132+
tokenId: BigInt! # uint256
133+
}
134+
```
135+
136+
This will allow for sorting to be optimized sequentially.
137+
138+
Query:
139+
140+
```graphql
141+
{
142+
transfers(first: 3, orderBy: tokenId) {
143+
id
144+
tokenId
145+
}
146+
}
147+
```
148+
149+
Query Response:
150+
151+
```json
152+
{
153+
"data": {
154+
"transfers": [
155+
{
156+
"id": "0x…",
157+
"tokenId": "1"
158+
},
159+
{
160+
"id": "0x…",
161+
"tokenId": "2"
162+
},
163+
{
164+
"id": "0x…",
165+
"tokenId": "3"
166+
}
167+
]
168+
}
169+
}
170+
```
171+
172+
## Conclusion
173+
174+
Using both Immutable Entities and Bytes as IDs has been shown to markedly improve subgraph efficiency. Specifically, tests have highlighted up to a 28% increase in query performance and up to a 48% acceleration in indexing speeds.
175+
176+
Read more about using Immutable Entities and Bytes as IDs in this blog post by David Lutterkort, a Software Engineer at Edge & Node: [Two Simple Subgraph Performance Improvements](https://thegraph.com/blog/two-simple-subgraph-performance-improvements/).

0 commit comments

Comments
 (0)