454 lines
22 KiB
Markdown
454 lines
22 KiB
Markdown
|
# 'acc' Dialect
|
||
|
|
||
|
The `acc` dialect is an MLIR dialect for representing the OpenACC
|
||
|
programming model. OpenACC is a standardized directive-based model which
|
||
|
is used with C, C++, and Fortran to enable programmers to expose
|
||
|
parallelism in their code. The descriptive approach used by OpenACC
|
||
|
allows targeting of parallel multicore and accelerator targets like GPUs
|
||
|
by giving the compiler the freedom of how to parallelize for specific
|
||
|
architectures. OpenACC also provides the ability to optimize the
|
||
|
parallelism through increasingly more prescriptive clauses.
|
||
|
|
||
|
This dialect models the constructs from the [OpenACC 3.3 specification]
|
||
|
(https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.3-final.pdf)
|
||
|
|
||
|
This document describes the design of the OpenACC dialect in MLIR. It
|
||
|
lists and explains design goals and design choices along with their
|
||
|
rationale. It also describes specifics with regards to acc dialect
|
||
|
operations, types, and attributes.
|
||
|
|
||
|
[TOC]
|
||
|
|
||
|
## Dialect Design Goals
|
||
|
|
||
|
* Needs to have complete representation of the OpenACC language.
|
||
|
- A frontend requires this in order to properly generate a
|
||
|
representation of possible `acc` pragmas in MLIR. Additionally,
|
||
|
this dialect is expected to be further lowered when materializing
|
||
|
its semantics. Without a complete representation, a frontend might
|
||
|
choose a lower abstraction (such as direct runtime call) - but this
|
||
|
would impact the ability to do analysis and optimizations on the
|
||
|
dialect.
|
||
|
* Allow representation at the same semantic level as the OpenACC
|
||
|
language while having capability to represent nuances of the source
|
||
|
language semantics (such as Fortran descriptors) in an agnostic manner.
|
||
|
- Using abstractions that closely model the OpenACC language
|
||
|
simplifies frontend implementation. It also allows for easier
|
||
|
debugging of the IR. However, sometimes source language specific
|
||
|
behavior is needed when materializing OpenACC. In these cases, such
|
||
|
as privatization of C++ objects with default constructor, the
|
||
|
frontend fills in the `recipe` along with the `private` operation
|
||
|
which can be packaged neatly with the `acc` dialect operations.
|
||
|
* Be able to regenerate the semantic equivalent of the user pragmas from
|
||
|
the dialect (including bounds, names, clauses, modifiers, etc).
|
||
|
- This is a strong measure of making sure that the dialect is not
|
||
|
lossy in semantics. It also allows capability to generate
|
||
|
appropriate and useful debug information outside of the frontend.
|
||
|
* Be dialect agnostic so that it can be used and coexist with other
|
||
|
dialects including but not limited to `hlfir`, `fir`, `llvm`, `cir`.
|
||
|
- Directive-based models such as OpenACC are always used with a
|
||
|
source language, so the `acc` dialect coexisting with other
|
||
|
dialect(s) is necessary by construction. Through proper
|
||
|
abstractions, neither the `acc` dialect nor the source language
|
||
|
dialect should have dependencies on each other; where needed,
|
||
|
interfaces should be used to ensure `acc` dialect can verify
|
||
|
expected properties.
|
||
|
* The dialect must allow dataflow to be modeled accurately and
|
||
|
performantly using MLIR's existing facilities.
|
||
|
- Appropriate dataflow modeling is important for analyses and IR
|
||
|
reasoning - even something as simple as walking the uses. Therefore
|
||
|
operations, like data operations, are expected to generate results
|
||
|
which can be used in modeling behavior. For example, consider an
|
||
|
`acc copyin` clause. After the `acc.copyin` operation, a pointer
|
||
|
which lives on devices should be distinguishable from one that lives
|
||
|
in host memory.
|
||
|
* Be friendly to MLIR optimization passes by implementing common
|
||
|
interfaces.
|
||
|
- Interfaces, such as `MemoryEffects`, are the key way MLIR
|
||
|
transformations and analyses are designed to interact with the IR.
|
||
|
In order for the operations in the `acc` dialect to be optimizable
|
||
|
(either directly or even indirectly by not blocking optimizations
|
||
|
of nested IR), implementing relevant common interfaces is needed.
|
||
|
|
||
|
The design philosophy of the acc dialect is one where the design goals
|
||
|
are adhered to. Current and planned operations, attributes, types must
|
||
|
adhere to the design goals.
|
||
|
|
||
|
## Operation Categories
|
||
|
|
||
|
The OpenACC dialect includes both high-level operations (which retain
|
||
|
the same semantic meaning as their OpenACC language equivalent),
|
||
|
intermediate-level operations (which are used to decompose clauses
|
||
|
from constructs), and low-level operations (to encode specifics
|
||
|
associated with source language in a generic way).
|
||
|
|
||
|
The high-level operations list contains the following OpenACC language
|
||
|
constructs and their corresponding operations:
|
||
|
* `acc parallel` → `acc.parallel`
|
||
|
* `acc kernels` → `acc.kernels`
|
||
|
* `acc serial` → `acc.serial`
|
||
|
* `acc data` → `acc.data`
|
||
|
* `acc loop` → `acc.loop`
|
||
|
* `acc enter data` → `acc.enter_data`
|
||
|
* `acc exit data` → `acc.exit_data`
|
||
|
* `acc host_data` → `acc.host_data`
|
||
|
* `acc init` → `acc.init`
|
||
|
* `acc shutdown` → `acc.shutdown`
|
||
|
* `acc update` → `acc.update`
|
||
|
* `acc set` → `acc.set`
|
||
|
* `acc wait` → `acc.wait`
|
||
|
* `acc atomic read` → `acc.atomic.read`
|
||
|
* `acc atomic write` → `acc.atomic.write`
|
||
|
* `acc atomic update` → `acc.atomic.update`
|
||
|
* `acc atomic capture` → `acc.atomic.capture`
|
||
|
|
||
|
This second group contains operations which are used to represent
|
||
|
either decomposed constructs or clauses for more accurate modeling:
|
||
|
* `acc routine` → `acc.routine` + `acc.routine_info` attribute
|
||
|
* `acc declare` → `acc.declare_enter` + `acc.declare_exit` or
|
||
|
`acc.declare`
|
||
|
* `acc {construct} copyin` → `acc.copyin` (before region) +
|
||
|
`acc.delete` (after region)
|
||
|
* `acc {construct} copy` → `acc.copyin` (before region) +
|
||
|
`acc.copyout` (after region)
|
||
|
* `acc {construct} copyout` → `acc.create` (before region) +
|
||
|
`acc.copyout` (after region)
|
||
|
* `acc {construct} attach` → `acc.attach` (before region) +
|
||
|
`acc.detach` (after region)
|
||
|
* `acc {construct} create` → `acc.create` (before region) +
|
||
|
`acc.delete` (after region)
|
||
|
* `acc {construct} present` → `acc.present` (before region) +
|
||
|
`acc.delete` (after region)
|
||
|
* `acc {construct} no_create` → `acc.nocreate` (before region) +
|
||
|
`acc.delete` (after region)
|
||
|
* `acc {construct} deviceptr` → `acc.deviceptr`
|
||
|
* `acc {construct} private` → `acc.private`
|
||
|
* `acc {construct} firstprivate` → `acc.firstprivate`
|
||
|
* `acc {construct} reduction` → `acc.reduction`
|
||
|
* `acc cache` → `acc.cache`
|
||
|
* `acc update device` → `acc.update_device`
|
||
|
* `acc update host` → `acc.update_host`
|
||
|
* `acc host_data use_device` → `acc.use_device`
|
||
|
* `acc declare device_resident` → `acc.declare_device_resident`
|
||
|
* `acc declare link` → `acc.declare_link`
|
||
|
* `acc exit data delete` → `acc.delete` (with `structured` flag as
|
||
|
false)
|
||
|
* `acc exit data detach` → `acc.detach` (with `structured` flag as
|
||
|
false)
|
||
|
* `acc {construct} {data_clause}(var[lb:ub])` → `acc.bounds`
|
||
|
|
||
|
The low-level operations are:
|
||
|
* `acc.private.recipe`
|
||
|
* `acc.reduction.recipe`
|
||
|
* `acc.firstprivate.recipe`
|
||
|
* `acc.global_ctor`
|
||
|
* `acc.global_dtor`
|
||
|
* `acc.yield`
|
||
|
* `acc.terminator`
|
||
|
The low-level operations semantics and reasoning are further explained
|
||
|
in sections below.
|
||
|
|
||
|
### Data Operations
|
||
|
|
||
|
#### Data Clause Decomposition
|
||
|
The data clauses are decomposed from their constructs for better
|
||
|
dataflow modeling in MLIR. There are multiple reasons for this which
|
||
|
are consistent with the dialect goals:
|
||
|
* Correctly represents dataflow. Data clauses have different effects
|
||
|
at entry to region and at exit from region.
|
||
|
* Friendlier to add attributes such as `MemoryEffects` to a single
|
||
|
operation. This can better reflect semantics (like the fact that an
|
||
|
`acc.copyin` operation only reads host memory)
|
||
|
* Operations can be moved or optimized individually (eg `CSE`).
|
||
|
* Easier to keep track of debug information. Line location can point to
|
||
|
the text representing the data clause instead of the construct.
|
||
|
Additionally, attributes can be used to keep track of variable names in
|
||
|
clauses without having to walk the IR tree in attempt to recover the
|
||
|
information (this makes acc dialect more agnostic with regards to what
|
||
|
other dialect it is used with).
|
||
|
* Clear operation ordering since all data operations are on same
|
||
|
list.
|
||
|
|
||
|
Each of the `acc` dialect data operations represents either the
|
||
|
entry or the exit portion of the data action specification. Thus,
|
||
|
`acc.copyin` represents the semantics defined in section
|
||
|
`2.7.7 copyin clause` whose wording starts with
|
||
|
`At entry to a region`. The decomposed exit operation `acc.delete`
|
||
|
represents the second part of that section, whose wording starts with
|
||
|
`At exit from the region`. The `delete` action may be performed
|
||
|
after checking and updating of the relevant reference counters noted.
|
||
|
|
||
|
The `acc` data operations, even when decomposed, retain their original
|
||
|
data clause in an operation operand `dataClause` for possibility to
|
||
|
recover this information during debugging. For example, `acc copy`,
|
||
|
does not translate to `acc.copy` operation, but instead to `acc.copyin`
|
||
|
for entry and `acc.copyout` for exit. Both the decomposed operations
|
||
|
hold a `dataClause` field that specifies this was an `acc copy`.
|
||
|
|
||
|
The link between the decomposed entry and exit operations is the ssa
|
||
|
value produced by the entry operation. Namely, it is the `accPtr` result
|
||
|
which is used both in the `dataOperands` of the operation used for the
|
||
|
construct and in the `accPtr` operand of the exit operation.
|
||
|
|
||
|
#### Bounds
|
||
|
|
||
|
OpenACC data clauses allow the use of bounds specifiers as per
|
||
|
`2.7.1 Data Specification in Data Clauses`. However, array dimensions
|
||
|
for the data are not always required in the clause if the source
|
||
|
language's type system captures this information - the user can just
|
||
|
specify the variable name in the data clause. So the `acc.bounds`
|
||
|
operation is an important piece to ensure uniform representation of both
|
||
|
explicit user set dimensions and implicit type-based dimensions. It
|
||
|
contains several key features to allow properly encoding sizes in a
|
||
|
manner flexible and agnostic to the source language's dialect:
|
||
|
* Multi-dimensional arrays can be represented by using multiple ordered
|
||
|
`acc.bounds` operations.
|
||
|
* Bounds are required to be zero-normalized. This works well with the
|
||
|
`PointerLikeType` requirement in data clauses - since a lowerbound of 0
|
||
|
means looking at data at the zero offset from pointer. This requirement
|
||
|
also works well in ensuring the `acc` dialect is agnostic to source
|
||
|
language dialect since it prevents ambiguity such as the case of Fortran
|
||
|
arrays where the lower bound is not a fixed value.
|
||
|
* If the source dialect does not encode the dimensions in the type (eg
|
||
|
`!fir.array<?x?xi32>`) but instead encodes it in some other way (such as
|
||
|
through descriptors), then the frontend must fill in the `acc.bounds`
|
||
|
operands with appropriate information (such as loads from descriptor).
|
||
|
The `acc.bounds` operation also permits lossy source dialect, such
|
||
|
as if the frontend uses aggressive pointer decay and cannot represent
|
||
|
the dimensions in the type system (eg using `!llvm.ptr` for arrays).
|
||
|
Both of these aspects show `acc.bounds`' operation's flexibility to
|
||
|
allow the representation to be agnostic since the `acc` dialect is not
|
||
|
expected to be able to understand how to extract dimension information
|
||
|
from the types of the source dialect.
|
||
|
* The OpenACC specification allows either extent or upperbound in the
|
||
|
data clause depending on whether it is Fortran or C and C++. The
|
||
|
`acc.bounds` operation is rich enough to accept either or both - for
|
||
|
convenience in lowering to the dialect and for ability to precisely
|
||
|
capture the meaning from the clause.
|
||
|
* The stride, either in units or bytes, can be also captured in the
|
||
|
`acc.bounds` operation. This is also an important part to be able to
|
||
|
accept a source language's arrays without forcing the frontend to
|
||
|
normalize them in some way. For example, consider a case where in a
|
||
|
parent function, a whole array is mapped to device. Then only a view of
|
||
|
a non-1 stride is passed to child function (eg Fortran array slice with
|
||
|
non-1 stride). A `copy` operation of this data in child should be able
|
||
|
to avoid remapping this array. If instead the operation required
|
||
|
normalizing the array (such as making it contiguous), then unexpected
|
||
|
disjoint mapping of the same host data would be error-prone since it
|
||
|
would result in multiple mappings to device.
|
||
|
|
||
|
#### Counters
|
||
|
|
||
|
The data operations also maintain semantics described in the OpenACC
|
||
|
specification related to runtime counters. More specifically, consider
|
||
|
the specification of the entry portion of `acc copyin` in section 2.7.7:
|
||
|
```
|
||
|
At entry to a region, the structured reference counter is used. On an
|
||
|
enter data directive, the dynamic reference counter is used.
|
||
|
- If var is present and is not a null pointer, a present increment
|
||
|
action with the appropriate reference counter is performed.
|
||
|
- If var is not present, a copyin action with the appropriate reference
|
||
|
counter is performed.
|
||
|
- If var is a pointer reference, an attach action is performed.
|
||
|
```
|
||
|
The `acc.copyin` operation includes these semantics, including those
|
||
|
related to attach, which is specified through the `varPtrPtr` operand.
|
||
|
The `structured` flag on the operation is important since the
|
||
|
`structured reference counter` should be used when the flag is true; and
|
||
|
the `dynamic reference counter` should be used when it is false.
|
||
|
|
||
|
At exit from structured regions (`acc data`, `acc kernels`), the
|
||
|
`acc copyin` operation is decomposed to `acc.delete` (with the
|
||
|
`structured` flag as true). The semantics of the `acc.delete` are
|
||
|
also consistent with the OpenACC specification noted for the exit
|
||
|
portion of the `acc copyin` clause:
|
||
|
```
|
||
|
At exit from the region:
|
||
|
- If the structured reference counter for var is zero, no action is
|
||
|
taken.
|
||
|
- Otherwise, a detach action is performed if var is a pointer reference,
|
||
|
and a present decrement action with the structured reference counter is
|
||
|
performed if var is not a null pointer. If both structured and dynamic
|
||
|
reference counters are zero, a delete action is performed.
|
||
|
```
|
||
|
|
||
|
### Types
|
||
|
|
||
|
There are a few acc dialect type categories to describe:
|
||
|
* type of acc data clause operation input `varPtr`
|
||
|
- The type of `varPtr` must be pointer-like. This is done by
|
||
|
attaching the `PointerLikeType` interface to the appropriate MLIR
|
||
|
type. Although memory/storage concept is a lower level abstraction,
|
||
|
it is useful because the OpenACC model distinguishes between host
|
||
|
and device memory explicitly - and the mapping between the two is
|
||
|
done through pointers. Thus, by explicitly requiring it in the
|
||
|
dialect, the appropriate language frontend must create storage or
|
||
|
use type that satisfies the mapping constraint.
|
||
|
* type of result of acc data clause operations
|
||
|
- The type of the acc data clause operation is exactly the same as
|
||
|
`varPtr`. This was done intentionally instead of introducing an
|
||
|
`acc.ref/ptr` type so that IR compatibility and the dialect's
|
||
|
existing strong type checking can be maintained. This is needed
|
||
|
since the `acc` dialect must live within another dialect whose type
|
||
|
system is unknown to it. The only constraint is that the appropriate
|
||
|
dialect type must use the `PointerLikeType` interface.
|
||
|
* type of decomposed clauses
|
||
|
- Decomposed clauses, such as `acc.bounds` and `acc.declare_enter`
|
||
|
produce types to allow their results to be used only in specific
|
||
|
operations.
|
||
|
|
||
|
### Recipes
|
||
|
|
||
|
Recipes are a generic way to express source language specific semantics.
|
||
|
|
||
|
There are currently two categories of recipes, but the recipe concept
|
||
|
can be extended for any additional low-level information that needs
|
||
|
to be captured for successful lowering of OpenACC. The two categories
|
||
|
are:
|
||
|
* recipes used in the context of privatization associated with a
|
||
|
construct
|
||
|
* recipes used in the context of additional specification of data
|
||
|
semantics
|
||
|
|
||
|
The intention of the recipes is to specify how materialization of
|
||
|
action, such as privatization, should be done when the semantics
|
||
|
of the action needs interpreted and lowered, such as before generating
|
||
|
LLVM dialect.
|
||
|
|
||
|
The recipes used for privatization provide a source-language independent
|
||
|
way of specifying the creation of a local variable of that type. This
|
||
|
means using the appropriate `alloca` instruction and being able to
|
||
|
specify default initialization or default constructor.
|
||
|
|
||
|
### Routine
|
||
|
|
||
|
The routine directive is used to note that a procedure should be made
|
||
|
available for the accelerator in a way that is consistent with its
|
||
|
modifiers, such as those that describe the parallelism. In the acc
|
||
|
dialect, an acc routine is represented through two joint pieces - an
|
||
|
attribute and an operation:
|
||
|
* The `acc.routine` operation is simply a specifier which notes which
|
||
|
symbol (or string) the acc routine is needed for, along with parallelism
|
||
|
associated. This defines a symbol that can be referenced in attribute.
|
||
|
* The `acc.routine_info` attribute is an attribute used on the source
|
||
|
dialect specific operation which specifies one or multiple `acc.routine`
|
||
|
symbols. Typically, this is attached to `func.func` which either
|
||
|
provides the declaration (in case of externals) or provides the
|
||
|
actual body of the acc routine in the dialect that the source language
|
||
|
was translated to.
|
||
|
|
||
|
### Declare
|
||
|
|
||
|
OpenACC `declare` is a mechanism which declares a definition of a global
|
||
|
or a local to be accessible to accelerator with an implicit lifetime
|
||
|
as that of the scope where it was declared in. Thus, `declare` semantics
|
||
|
are represented through multiple operations and attributes:
|
||
|
* `acc.declare` - This is a structured operation which contains an
|
||
|
MLIR region and can be used in similar manner as acc.data to specify
|
||
|
an implicit data region with specific procedure lifetime. This is
|
||
|
typically used inside `func.func` after variable declarations.
|
||
|
* `acc.declare_enter` - This is an unstructured operation which is
|
||
|
used as a decomposed form of `acc declare`. It effectively allows the
|
||
|
entry operation to exist in a scope different than the exit operation.
|
||
|
It can also be used along `acc.declare_exit` which consumes its token
|
||
|
to define a scoped region without using MLIR region. This operation is
|
||
|
also used in `acc.global_ctor`.
|
||
|
* `acc.declare_exit` - The matching equivalent of `acc.declare_enter`
|
||
|
except that it specifies exit semantics. This operation is typically
|
||
|
used inside a `func.func` at the exit points or with `acc.global_dtor`.
|
||
|
* `acc.global_ctor` - Lives at the same level as source dialect globals
|
||
|
and is used to specify data actions to be done at program entry. This
|
||
|
is used in conjunction with source dialect globals whose lifetime is
|
||
|
not just a single procedure.
|
||
|
* `acc.global_dtor` - Defines the exit data actions that should be done
|
||
|
at program exit. Typically used to revert the actions of
|
||
|
`acc.global_ctor`.
|
||
|
|
||
|
The attributes:
|
||
|
* `acc.declare` - This is a facility for easier determination of
|
||
|
variables which are `acc declare`'d. This attribute is used on
|
||
|
operations producing globals and on operations producing locals such as
|
||
|
dialect specific `alloca`'s. Having this attribute is required in order
|
||
|
to appear in a data mapping operation associated with any of the
|
||
|
`acc.declare*` operations.
|
||
|
* `acc.declare_action` - Since the OpenACC specification allows
|
||
|
declaration of variables that have yet to be allocated, this attribute
|
||
|
is used at the allocation and deallocation points. More specifically,
|
||
|
this attribute captures symbols of functions to be called to perform
|
||
|
an action either pre-allocate, post-allocate, pre-deallocate, or
|
||
|
post-deallocate. Calls to these functions should be materialized when
|
||
|
lowering OpenACC semantics to ensure proper data actions are done
|
||
|
after the allocation/deallocation.
|
||
|
|
||
|
## OpenACC Transforms and Analyses
|
||
|
|
||
|
The design goal for the `acc` dialect is to be friendly to MLIR
|
||
|
optimization passes including CSE and LICM. Additionally, since it is
|
||
|
designed to recover original clauses, it makes late verification and
|
||
|
analysis possible in the MLIR framework outside of the frontend.
|
||
|
|
||
|
This section describes a few MLIR-level passes for which the `acc`
|
||
|
dialect design should be friendly for. This section is currently
|
||
|
solely outlining the possibilities intended by the design and not
|
||
|
necessarily existing passes.
|
||
|
|
||
|
### Verification
|
||
|
|
||
|
Since the OpenACC dialect is not lossy with regards to its
|
||
|
representation, it is possible to do OpenACC language semantic checking
|
||
|
at the MLIR-level. What follows is a list of various semantic checks
|
||
|
needed.
|
||
|
|
||
|
This first list is required to be done in the frontend because the `acc`
|
||
|
dialect operations must be valid when constructed:
|
||
|
* Ensure that only listed clauses are allowed for each directive.
|
||
|
* Ensure that only listed modifiers are allowed for each clause.
|
||
|
|
||
|
However, the following are semantic checks that can be done at the
|
||
|
MLIR-level (either in a separate pass or as part of the operation
|
||
|
verifier):
|
||
|
* Specify the validity checks that each modifier needs. (eg num_gangs
|
||
|
may need a positive integer).
|
||
|
* Ensure valid clause nesting.
|
||
|
* Validate clause restrictions which cannot appear with others.
|
||
|
* Validate that no conflicting clauses are used on variables.
|
||
|
|
||
|
Note that some of these checks can be even more precise when done at the
|
||
|
MLIR level because optimizations like inlining and constant propagation
|
||
|
expose detail that wouldn't have been visible in the frontend.
|
||
|
|
||
|
### Implicit Data Attributes
|
||
|
|
||
|
The OpenACC specification includes a section on `2.6.2 Variables with
|
||
|
Implicitly Determined Data Attributes`. What this section describes are
|
||
|
the data actions that should be applied to a variable for which
|
||
|
user did not specify a data action for. The action depends on the
|
||
|
construct being used and also on the default clause. However, the point
|
||
|
to note here is that variables which are live-in into the acc region
|
||
|
must employ some data mapping so the data can be passed to accelerator.
|
||
|
|
||
|
One possible optimizations that affects data attributes needed is
|
||
|
`Scalar Replacement of Aggregates (SROA)`. The `acc` dialect should
|
||
|
not prevent this from happening on the source dialect.
|
||
|
|
||
|
Because it is intended to be possible to apply optimizations across an
|
||
|
`acc` region, the analysis/transformation pass that applies the implicit
|
||
|
data attributes should be run as late as possible - ideally right before
|
||
|
any outlining process which uses the `acc` region body to create an
|
||
|
accelerator procedure. It is expected that existing MLIR facilities,
|
||
|
such as `mlir::Liveness` will work for the `acc` region and thus can be
|
||
|
used to perform this analysis.
|
||
|
|
||
|
### Redundant Clause Elimination
|
||
|
|
||
|
The data operations are modeled in a way where data entry operations
|
||
|
look like loads and data exit operations look like stores. Thus these
|
||
|
operations are intended to be optimized in the following ways:
|
||
|
* Be able to eliminate redundant operations such as when an `acc.copyin`
|
||
|
dominates another.
|
||
|
* Be able to hoist/sink such operations out of loops.
|
||
|
|
||
|
## Operations TOC
|
||
|
|
||
|
[include "Dialects/OpenACCDialectOps.md"]
|