一些概念和注释
HloModule¶
GAS
// HLO instructions are the atomic unit of the high-level compiler's IR.
//
// HloInstructions live inside of an HloComputation, which is analogous to a
// function in other programming languages. Nodes have no total order within
// their computation. Instead, they have a partial ordering determined by their
// data and control dependencies.
//
// HLO does not have basic blocks or explicit "branch" instructions. Instead,
// certain HloInstructions -- namely, kWhile, kConditional, and kCall -- encode
// control flow. For example, the kConditional HLO executes one of two possible
// computations, depending on the runtime value of a predicate.
//
// HLO is pure (mostly). It has no concept of mutable state. Instead, data
// values are produced by one HLO and flow into consumers across dependency
// edges.
HloComputation¶
GAS
// Describes a computation at the HLO level.
//
// You can think of an HloComputation like a function. It has some inputs
// (parameters) and returns exactly one value (the value of its root node). If
// you want to return multiple values, you can return a tuple.
//
// The instructions inside of a computation do not have an explicit total order.
// Instead, they have a partial order determined by their data and control
// dependencies.
//
// An HloModule contains one "entry computation" -- this is like main() in a C
// program. Every other computation inside of a module is attached to one or
// more HloInstructions, as a "nested computation". For example, the kMap
// instruction has a nested computation and "applies" it to every element of its
// input, elementwise. (That is, the input [x, y, z] is transformed to [f(x),
// f(y), f(z)].)
class HloComputation {
IrEmitter¶
GAS
// Abstract base class for translating HLO graphs to LLVM IR for a GPU.
//
// There are two concrete subclasses of IrEmitter: IrEmitterNested and
// IrEmitterUnnested. In the unnested variety, each HLO gets its own kernel
// function, whereas in the nested version the whole computation is emitted as
// one *non-kernel* function.
//
// In XLA, kernel functions never call other kernel functions. This means that
// if we have a kernel -- e.g. implementing a kReduce HLO -- that wants to use
// an HLO computation as a "subroutine" -- e.g. the HLO computation that
// specifies how to reduce two elements -- then the subroutine computation must
// be emitted using IrEmitterNested.
//
// Fusion nodes are a special case. A fusion node is emitted using
// IrEmitterUnnested, but the code is generated using FusedIrEmitter, which is
// not a subclass of gpu::IrEmitter, and in fact is better understood as an IR
// generator generator. See comments on that class.
class IrEmitter : public DfsHloVisitorWithDefault,
public IrBuilderMixin<IrEmitter>
IrEmitterUnnested¶
GAS
// Emits LLVM IR for an "unnested computation".
//
// An unnested computation is an HloComputation which you run by executing one
// or more kernels for each HloInstruction it contains. Examples of unnested
// computations:
//
// - An HloModule's root computation,
// - The body of an HLO while loop,
// - The true/false computation of an HLO conditional.
//
// Note the opportunity for confusion -- the while loop's computation is nested
// within the root computation, but it's emitted using IrEmitterUnnested! Don't
// think about it too hard.
//
// Examples of things that are not unnested computations:
//
// - The reducer of a kReduce HLO. This is emitted using IrEmitterNested.
// - The body of a fusion node. IrEmitterUnenested emits the relevant code
// within a kernel function using FusedIrEmitter. (FusedIrEmitter is not
// really an IrEmitter, but is more an "IR generator generator".)
//
class IrEmitterUnnested : public IrEmitter {
IrEmitterNested¶
GAS
// Emits LLVM IR for a "nested computation" into a non-kernel device function.
//
// This is used to emit code for HloComputations that don't require a separate
// kernel call. For example, IrEmitterNested is used to emit code for a kReduce
// HLO's elementwise reduction computation. Notably, IrEmitterNested is *not*
// used to emit code for fusion nodes -- fusion nodes use FusedIrEmitter, which
// is a different beast altogether.
//
// IrEmitterNested generates a non-kernel function with the following
// parameters:
//
// - N pointers to the buffers of each of the N parameters to the computation,
// - a pointer to the output buffer of the computation, and
// - a pointer to the top-level temp buffer.
//
class IrEmitterNested : public IrEmitter