Skip to content

Commit 05eb248

Browse files
Regenerate MLIR Bindings (#1671)
Co-authored-by: enzyme-ci-bot[bot] <78882869+enzyme-ci-bot[bot]@users.noreply.github.com>
1 parent 330c6b2 commit 05eb248

File tree

6 files changed

+193
-82
lines changed

6 files changed

+193
-82
lines changed

src/mlir/Dialects/Arith.jl

Lines changed: 68 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -67,14 +67,14 @@ end
6767
"""
6868
`addi`
6969
70-
Performs N-bit addition on the operands. The operands are interpreted as
71-
unsigned bitvectors. The result is represented by a bitvector containing the
72-
mathematical value of the addition modulo 2^n, where `n` is the bitwidth.
73-
Because `arith` integers use a two\'s complement representation, this operation
70+
Performs N-bit addition on the operands. The operands are interpreted as
71+
unsigned bitvectors. The result is represented by a bitvector containing the
72+
mathematical value of the addition modulo 2^n, where `n` is the bitwidth.
73+
Because `arith` integers use a two\'s complement representation, this operation
7474
is applicable on both signed and unsigned integer operands.
7575
7676
The `addi` operation takes two operands and returns one result, each of
77-
these is required to be the same type. This type may be an integer scalar type,
77+
these is required to be the same type. This type may be an integer scalar type,
7878
a vector whose element type is integer, or a tensor of integers.
7979
8080
This op supports `nuw`/`nsw` overflow flags which stands for
@@ -253,9 +253,9 @@ end
253253
254254
Signed integer division. Rounds towards positive infinity, i.e. `7 / -2 = -3`.
255255
256-
Divison by zero, or signed division overflow (minimum value divided by -1)
257-
is undefined behavior. When applied to `vector` and `tensor` values, the
258-
behavior is undefined if _any_ of its elements are divided by zero or has a
256+
Divison by zero, or signed division overflow (minimum value divided by -1)
257+
is undefined behavior. When applied to `vector` and `tensor` values, the
258+
behavior is undefined if _any_ of its elements are divided by zero or has a
259259
signed division overflow.
260260
261261
# Example
@@ -292,10 +292,10 @@ end
292292
293293
Unsigned integer division. Rounds towards positive infinity. Treats the
294294
leading bit as the most significant, i.e. for `i16` given two\'s complement
295-
representation, `6 / -2 = 6 / (2^16 - 2) = 1`.
295+
representation, `6 / -2 = 6 / (2^16 - 2) = 1`.
296296
297-
Division by zero is undefined behavior. When applied to `vector` and
298-
`tensor` values, the behavior is undefined if _any_ elements are divided by
297+
Division by zero is undefined behavior. When applied to `vector` and
298+
`tensor` values, the behavior is undefined if _any_ elements are divided by
299299
zero.
300300
301301
# Example
@@ -543,9 +543,9 @@ end
543543
Signed integer division. Rounds towards zero. Treats the leading bit as
544544
sign, i.e. `6 / -2 = -3`.
545545
546-
Divison by zero, or signed division overflow (minimum value divided by -1)
547-
is undefined behavior. When applied to `vector` and `tensor` values, the
548-
behavior is undefined if _any_ of its elements are divided by zero or has a
546+
Divison by zero, or signed division overflow (minimum value divided by -1)
547+
is undefined behavior. When applied to `vector` and `tensor` values, the
548+
behavior is undefined if _any_ of its elements are divided by zero or has a
549549
signed division overflow.
550550
551551
# Example
@@ -590,8 +590,8 @@ Unsigned integer division. Rounds towards zero. Treats the leading bit as
590590
the most significant, i.e. for `i16` given two\'s complement representation,
591591
`6 / -2 = 6 / (2^16 - 2) = 0`.
592592
593-
Division by zero is undefined behavior. When applied to `vector` and
594-
`tensor` values, the behavior is undefined if _any_ elements are divided by
593+
Division by zero is undefined behavior. When applied to `vector` and
594+
`tensor` values, the behavior is undefined if _any_ elements are divided by
595595
zero.
596596
597597
# Example
@@ -790,9 +790,9 @@ end
790790
791791
Signed integer division. Rounds towards negative infinity, i.e. `5 / -2 = -3`.
792792
793-
Divison by zero, or signed division overflow (minimum value divided by -1)
794-
is undefined behavior. When applied to `vector` and `tensor` values, the
795-
behavior is undefined if _any_ of its elements are divided by zero or has a
793+
Divison by zero, or signed division overflow (minimum value divided by -1)
794+
is undefined behavior. When applied to `vector` and `tensor` values, the
795+
behavior is undefined if _any_ of its elements are divided by zero or has a
796796
signed division overflow.
797797
798798
# Example
@@ -1467,8 +1467,8 @@ end
14671467
Signed integer division remainder. Treats the leading bit as sign, i.e. `6 %
14681468
-2 = 0`.
14691469
1470-
Division by zero is undefined behavior. When applied to `vector` and
1471-
`tensor` values, the behavior is undefined if _any_ elements are divided by
1470+
Division by zero is undefined behavior. When applied to `vector` and
1471+
`tensor` values, the behavior is undefined if _any_ elements are divided by
14721472
zero.
14731473
14741474
# Example
@@ -1512,8 +1512,8 @@ end
15121512
Unsigned integer division remainder. Treats the leading bit as the most
15131513
significant, i.e. for `i16`, `6 % -2 = 6 % (2^16 - 2) = 6`.
15141514
1515-
Division by zero is undefined behavior. When applied to `vector` and
1516-
`tensor` values, the behavior is undefined if _any_ elements are divided by
1515+
Division by zero is undefined behavior. When applied to `vector` and
1516+
`tensor` values, the behavior is undefined if _any_ elements are divided by
15171517
zero.
15181518
15191519
# Example
@@ -1581,36 +1581,36 @@ end
15811581
"""
15821582
`scaling_extf`
15831583
1584-
This operation upcasts input floating-point values using provided scale
1585-
values. It expects both scales and the input operand to be of the same shape,
1586-
making the operation elementwise. Scales are usually calculated per block
1584+
This operation upcasts input floating-point values using provided scale
1585+
values. It expects both scales and the input operand to be of the same shape,
1586+
making the operation elementwise. Scales are usually calculated per block
15871587
following the OCP MXFP spec as described in https://arxiv.org/abs/2310.10537.
15881588
1589-
If scales are calculated per block where blockSize != 1, then scales may
1590-
require broadcasting to make this operation elementwise. For example, let\'s
1591-
say the input is of shape `<dim1 x dim2 x ... dimN>`. Given blockSize != 1 and
1592-
assuming quantization happens on the last axis, the input can be reshaped to
1593-
`<dim1 x dim2 x ... (dimN/blockSize) x blockSize>`. Scales will be calculated
1594-
per block on the last axis. Therefore, scales will be of shape
1595-
`<dim1 x dim2 x ... (dimN/blockSize) x 1>`. Scales could also be of some other
1596-
shape as long as it is broadcast compatible with the input, e.g.,
1589+
If scales are calculated per block where blockSize != 1, then scales may
1590+
require broadcasting to make this operation elementwise. For example, let\'s
1591+
say the input is of shape `<dim1 x dim2 x ... dimN>`. Given blockSize != 1 and
1592+
assuming quantization happens on the last axis, the input can be reshaped to
1593+
`<dim1 x dim2 x ... (dimN/blockSize) x blockSize>`. Scales will be calculated
1594+
per block on the last axis. Therefore, scales will be of shape
1595+
`<dim1 x dim2 x ... (dimN/blockSize) x 1>`. Scales could also be of some other
1596+
shape as long as it is broadcast compatible with the input, e.g.,
15971597
`<1 x 1 x ... (dimN/blockSize) x 1>`.
15981598
1599-
In this example, before calling into `arith.scaling_extf`, scales must be
1600-
broadcasted to `<dim1 x dim2 x dim3 ... (dimN/blockSize) x blockSize>`. Note
1601-
that there could be multiple quantization axes. Internally,
1599+
In this example, before calling into `arith.scaling_extf`, scales must be
1600+
broadcasted to `<dim1 x dim2 x dim3 ... (dimN/blockSize) x blockSize>`. Note
1601+
that there could be multiple quantization axes. Internally,
16021602
`arith.scaling_extf` would perform the following:
1603-
1603+
16041604
```
1605-
resultTy = get_type(result)
1605+
resultTy = get_type(result)
16061606
scaleTy = get_type(scale)
16071607
inputTy = get_type(input)
16081608
scale.exponent = arith.truncf(scale) : scaleTy to f8E8M0
16091609
scale.extf = arith.extf(scale.exponent) : f8E8M0 to resultTy
16101610
input.extf = arith.extf(input) : inputTy to resultTy
16111611
result = arith.mulf(scale.extf, input.extf)
16121612
```
1613-
It propagates NaN values. Therefore, if either scale or the input element
1613+
It propagates NaN values. Therefore, if either scale or the input element
16141614
contains NaN, then the output element value will also be a NaN.
16151615
"""
16161616
function scaling_extf(
@@ -1638,28 +1638,28 @@ end
16381638
"""
16391639
`scaling_truncf`
16401640
1641-
This operation downcasts input using the provided scale values. It expects
1642-
both scales and the input operand to be of the same shape and, therefore,
1643-
makes the operation elementwise. Scales are usually calculated per block
1641+
This operation downcasts input using the provided scale values. It expects
1642+
both scales and the input operand to be of the same shape and, therefore,
1643+
makes the operation elementwise. Scales are usually calculated per block
16441644
following the OCP MXFP spec as described in https://arxiv.org/abs/2310.10537.
16451645
Users are required to normalize and clamp the scales as necessary before calling
16461646
passing them to this operation. OCP MXFP spec also does the flushing of denorms
1647-
on the input operand, which should be handled during lowering by passing appropriate
1648-
fastMath flag to this operation.
1649-
1650-
If scales are calculated per block where blockSize != 1, scales may require
1651-
broadcasting to make this operation elementwise. For example, let\'s say the
1652-
input is of shape `<dim1 x dim2 x ... dimN>`. Given blockSize != 1 and
1653-
assuming quantization happens on the last axis, the input can be reshaped to
1654-
`<dim1 x dim2 x ... (dimN/blockSize) x blockSize>`. Scales will be calculated
1655-
per block on the last axis. Therefore, scales will be of shape
1656-
`<dim1 x dim2 x ... (dimN/blockSize) x 1>`. Scales could also be of some other
1657-
shape as long as it is broadcast compatible with the input, e.g.,
1647+
on the input operand, which should be handled during lowering by passing appropriate
1648+
fastMath flag to this operation.
1649+
1650+
If scales are calculated per block where blockSize != 1, scales may require
1651+
broadcasting to make this operation elementwise. For example, let\'s say the
1652+
input is of shape `<dim1 x dim2 x ... dimN>`. Given blockSize != 1 and
1653+
assuming quantization happens on the last axis, the input can be reshaped to
1654+
`<dim1 x dim2 x ... (dimN/blockSize) x blockSize>`. Scales will be calculated
1655+
per block on the last axis. Therefore, scales will be of shape
1656+
`<dim1 x dim2 x ... (dimN/blockSize) x 1>`. Scales could also be of some other
1657+
shape as long as it is broadcast compatible with the input, e.g.,
16581658
`<1 x 1 x ... (dimN/blockSize) x 1>`.
16591659
1660-
In this example, before calling into `arith.scaling_truncf`, scales must be
1661-
broadcasted to `<dim1 x dim2 x dim3 ... (dimN/blockSize) x blockSize>`. Note
1662-
that there could be multiple quantization axes. Internally,
1660+
In this example, before calling into `arith.scaling_truncf`, scales must be
1661+
broadcasted to `<dim1 x dim2 x dim3 ... (dimN/blockSize) x blockSize>`. Note
1662+
that there could be multiple quantization axes. Internally,
16631663
`arith.scaling_truncf` would perform the following:
16641664
16651665
```
@@ -1704,9 +1704,9 @@ end
17041704
"""
17051705
`shli`
17061706
1707-
The `shli` operation shifts the integer value of the first operand to the left
1708-
by the integer value of the second operand. The second operand is interpreted as
1709-
unsigned. The low order bits are filled with zeros. If the value of the second
1707+
The `shli` operation shifts the integer value of the first operand to the left
1708+
by the integer value of the second operand. The second operand is interpreted as
1709+
unsigned. The low order bits are filled with zeros. If the value of the second
17101710
operand is greater or equal than the bitwidth of the first operand, then the
17111711
operation returns poison.
17121712
@@ -1721,7 +1721,7 @@ This op supports `nuw`/`nsw` overflow flags which stands for
17211721
%1 = arith.constant 5 : i8 // %1 is 0b00000101
17221722
%2 = arith.constant 3 : i8
17231723
%3 = arith.shli %1, %2 : i8 // %3 is 0b00101000
1724-
%4 = arith.shli %1, %2 overflow<nsw, nuw> : i8
1724+
%4 = arith.shli %1, %2 overflow<nsw, nuw> : i8
17251725
```
17261726
"""
17271727
function shli(
@@ -1755,11 +1755,11 @@ end
17551755
"""
17561756
`shrsi`
17571757
1758-
The `shrsi` operation shifts an integer value of the first operand to the right
1759-
by the value of the second operand. The first operand is interpreted as signed,
1760-
and the second operand is interpreter as unsigned. The high order bits in the
1761-
output are filled with copies of the most-significant bit of the shifted value
1762-
(which means that the sign of the value is preserved). If the value of the second
1758+
The `shrsi` operation shifts an integer value of the first operand to the right
1759+
by the value of the second operand. The first operand is interpreted as signed,
1760+
and the second operand is interpreter as unsigned. The high order bits in the
1761+
output are filled with copies of the most-significant bit of the shifted value
1762+
(which means that the sign of the value is preserved). If the value of the second
17631763
operand is greater or equal than bitwidth of the first operand, then the operation
17641764
returns poison.
17651765
@@ -1798,9 +1798,9 @@ end
17981798
"""
17991799
`shrui`
18001800
1801-
The `shrui` operation shifts an integer value of the first operand to the right
1801+
The `shrui` operation shifts an integer value of the first operand to the right
18021802
by the value of the second operand. The first operand is interpreted as unsigned,
1803-
and the second operand is interpreted as unsigned. The high order bits are always
1803+
and the second operand is interpreted as unsigned. The high order bits are always
18041804
filled with zeros. If the value of the second operand is greater or equal than the
18051805
bitwidth of the first operand, then the operation returns poison.
18061806

src/mlir/Dialects/MemRef.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ element type of the memref.
218218
219219
A set `nontemporal` attribute indicates that this load is not expected to
220220
be reused in the cache. For details, refer to the
221-
[https://llvm.org/docs/LangRef.html#load-instruction](LLVM load instruction).
221+
[LLVM load instruction](https://llvm.org/docs/LangRef.html#load-instruction).
222222
223223
An optional `alignment` attribute allows to specify the byte alignment of the
224224
load operation. It must be a positive power of 2. The operation must access
@@ -1527,7 +1527,7 @@ memref would cause signed overflow of the `index` type.
15271527
15281528
A set `nontemporal` attribute indicates that this store is not expected to
15291529
be reused in the cache. For details, refer to the
1530-
[https://llvm.org/docs/LangRef.html#store-instruction](LLVM store instruction).
1530+
[LLVM store instruction](https://llvm.org/docs/LangRef.html#store-instruction).
15311531
15321532
An optional `alignment` attribute allows to specify the byte alignment of the
15331533
store operation. It must be a positive power of 2. The operation must access

src/mlir/Dialects/MosaicGPU.jl

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,58 @@ function async_load_tmem(
123123
)
124124
end
125125

126+
"""
127+
`async_prefetch`
128+
129+
Schedules an async prefetch of the contents of the `source` MemRef in GMEM
130+
to the L2 cache, making subsequent loads of the same data from GMEM faster.
131+
132+
The `indices` and `slice_lengths` inputs define what slice of the GMEM
133+
`source` is going to be prefetched. Both `indices` and `slice_lengths` must
134+
have a length equal to the rank of the `source`. The values in `indices` are
135+
the starting indices of each dimension and the values in `slice_lengths` are
136+
the lengths. Providing -1 in `slice_lengths` indicates that the slice length
137+
is 1.
138+
139+
The `collective` attribute can be provided to partition the prefetch over
140+
multiple blocks in a cluster.
141+
142+
The `predicate` allows scheduling the prefetch conditionally.
143+
"""
144+
function async_prefetch(
145+
source::Value,
146+
indices::Vector{Value},
147+
predicate=nothing::Union{Nothing,Value};
148+
slice_lengths,
149+
collective,
150+
location=Location(),
151+
)
152+
op_ty_results = IR.Type[]
153+
operands = Value[source, indices...]
154+
owned_regions = Region[]
155+
successors = Block[]
156+
attributes = NamedAttribute[
157+
namedattribute("slice_lengths", slice_lengths),
158+
namedattribute("collective", collective),
159+
]
160+
!isnothing(predicate) && push!(operands, predicate)
161+
push!(
162+
attributes,
163+
operandsegmentsizes([1, length(indices), (predicate == nothing) ? 0 : 1]),
164+
)
165+
166+
return create_operation(
167+
"mosaic_gpu.async_prefetch",
168+
location;
169+
operands,
170+
owned_regions,
171+
successors,
172+
attributes,
173+
results=op_ty_results,
174+
result_inference=false,
175+
)
176+
end
177+
126178
"""
127179
`async_store`
128180

0 commit comments

Comments
 (0)