LLVM Bug Cost Computation Skipped For Hoisted Vector Code Discussion
- Introduction
- The Problem: Invalid SVE Cost and Loop Vectorization
- Illustrative Code Examples
- The Issue: LV Doesn't Cost Hoisted Instructions
- Consequences and a Crash Example
- Technical Details and Code Analysis
- Impact on Loop Vectorization
- Related Discussions and Fixes
- Conclusion
Introduction
In the realm of compiler optimization, loop vectorization stands as a crucial technique for enhancing performance. Loop vectorization transforms scalar operations within a loop into vector operations, enabling parallel execution and significant speed improvements. However, this process isn't without its challenges. One such challenge arises when dealing with hoisted vector code, particularly in the context of LLVM's cost model. Guys, in this article, we're diving deep into a fascinating bug within LLVM's loop vectorizer (LV) that skips cost computation for hoisted instructions. This oversight can lead to suboptimal code generation and, in some cases, even crashes. Let's unravel this intriguing issue and understand its implications. Our focus will be on explaining the problem in a way that's easy to grasp, even if you're not an LLVM expert. We'll explore code examples, discuss the consequences, and peek into the technical details to give you a comprehensive understanding.
The Problem: Invalid SVE Cost and Loop Vectorization
The core issue revolves around the interaction between invalid Scalable Vector Extension (SVE) costs and LLVM's loop vectorization process. When an intrinsic function, like llvm.minimumnum
, has an incorrect SVE cost, the loop vectorizer might make suboptimal decisions. Specifically, it may generate code that uses the vector version of the intrinsic (<vscale x 4 x float> @llvm.minimumnum
) when a scalar version would be more efficient. This happens because the cost model, which guides the vectorizer's decisions, isn't accurately reflecting the true cost of the vector operation. The loop vectorizer's oversight in considering the cost of hoisted instructions can lead to the generation of inefficient code. Imagine you're trying to optimize a loop that performs a simple minimum calculation. If the vector version of the minimum function is incorrectly marked as low-cost, the vectorizer might choose to vectorize the loop using this function, even if it's actually slower than performing the calculation element-by-element. This is where the problem lies: an inaccurate cost model can steer the vectorizer down the wrong path, resulting in performance degradation.
Illustrative Code Examples
To better illustrate the issue, let's examine a couple of code examples. These examples, written in LLVM intermediate representation (IR), demonstrate how the invalid SVE cost can impact loop vectorization. Understanding these examples will give you a clearer picture of the bug and its consequences.
Example 1: Invalid SVE Cost Intrinsic
Consider the following LLVM IR code snippet:
; RUN: opt -mtriple aarch64 -mattr=+sve -passes="print<cost-model>"
define <4 x float> @minimumnum.fixed(<4 x float> %a, <4 x float> %b) {
%c = call <4 x float> @llvm.minimumnum(<4 x float> %a, <4 x float> %b)
ret <4 x float> %c
}
define <vscale x 4 x float> @minimumnum(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
%c = call <vscale x 4 x float> @llvm.minimumnum(<vscale x 4 x float> %a, <vscale x 4 x float> %b)
ret <vscale x 4 x float> %c
}
This code defines two versions of the minimumnum
function: one operating on fixed-size vectors (<4 x float>
) and another operating on scalable vectors (<vscale x 4 x float>
). The problem arises when the scalable vector version has an invalid cost. This means that the cost model might underestimate the actual cost of using this function, potentially leading the loop vectorizer to make suboptimal choices. To run this example, you can use the opt
tool with the specified command-line arguments. This will print the cost model information, allowing you to see the cost associated with the llvm.minimumnum
intrinsic. The key takeaway here is that an incorrect cost for the scalable vector version can mislead the loop vectorizer.
Example 2: Loop Vectorization with Hoisted Instructions
Now, let's look at a more complex example that involves loop vectorization and hoisted instructions:
; RUN: opt -passes=loop-vectorize,simplifycfg -mtriple=aarch64 -mattr=+sve -S %s
define void @vectorized_hoisted(ptr %p) {
entry:
br label %loop
loop: ; preds = %loop, %entry
%iv = phi i64 [ 1, %entry ], [ %iv.next, %loop ]
%idx = phi i64 [ 0, %entry ], [ %idx.next, %loop ]
%res = tail call float @llvm.minimumnum.f32(float 0.0, float 0.0)
%gep.p.red = getelementptr float, ptr %p, i64 %idx
store float %res, ptr %gep.p.red, align 4
%idx.next = add i64 %idx, 1
%iv.next = add i64 %iv, 1
%exit.cond = icmp eq i64 %iv.next, 0
br i1 %exit.cond, label %exit, label %loop
exit: ; preds = %loop
ret void
}
declare float @llvm.minimumnum.f32(float, float)
This code defines a function @vectorized_hoisted
that contains a loop. Inside the loop, the llvm.minimumnum.f32
function is called, and its result is stored in memory. The loop vectorizer might attempt to vectorize this loop, potentially hoisting the call to llvm.minimumnum.f32
outside the loop. However, if the cost of the vector version of llvm.minimumnum
is incorrect, the vectorizer might generate suboptimal code. Running this code through the loop vectorizer with the specified options (opt -passes=loop-vectorize,simplifycfg -mtriple=aarch64 -mattr=+sve -S %s
) demonstrates how the issue manifests. The resulting LLVM IR code shows that the vectorizer incorrectly uses the scalable vector version of llvm.minimumnum
, even though it might not be the most efficient choice. This example highlights the impact of the bug on real-world loop vectorization scenarios.
The Issue: LV Doesn't Cost Hoisted Instructions
The root cause of the problem lies in the fact that LLVM's loop vectorizer (LV) doesn't properly account for the cost of hoisted instructions. Hoisted instructions are those that are moved outside the loop during the vectorization process. In the context of this bug, the llvm.minimumnum
intrinsic is a prime example of an instruction that might be hoisted. The vectorizer makes decisions based on the cost model, which estimates the performance impact of different code transformations. However, if the cost of a hoisted instruction isn't accurately considered, the vectorizer might make a suboptimal choice, such as using a more expensive vector operation when a scalar operation would be faster. This oversight is particularly problematic when dealing with SVE intrinsics, as their costs can be complex and difficult to model accurately. The lack of cost consideration for hoisted instructions is the key factor that triggers this bug.
Consequences and a Crash Example
The consequences of this bug can range from minor performance degradation to more severe issues, including crashes. When the loop vectorizer generates suboptimal code due to the incorrect cost modeling of hoisted instructions, the program might run slower than expected. In more extreme cases, the generated code might be invalid, leading to crashes or incorrect results. The provided information mentions a specific crash example in https://github.com/llvm/llvm-project/pull/145545, which highlights the severity of the issue. While no crash is observed directly due to this bug in the initial example, the potential for crashes underscores the importance of addressing this problem. The potential for crashes makes this bug a critical issue that needs to be resolved.
Technical Details and Code Analysis
To gain a deeper understanding of the bug, let's delve into the technical details and analyze the code snippets provided. We'll examine the LLVM IR code and discuss how the loop vectorizer interacts with the cost model.
Analyzing the First Code Snippet
The first code snippet defines two versions of the minimumnum
function: one for fixed-size vectors and one for scalable vectors. The key part of this example is the -passes="print<cost-model>"
option, which tells LLVM to print the cost model information. By examining this output, we can see the cost associated with the llvm.minimumnum
intrinsic. If the cost for the scalable vector version is significantly lower than it should be, it indicates a potential problem. The cost model output is crucial for identifying the root cause of the issue.
Analyzing the Second Code Snippet
The second code snippet demonstrates a loop that calls llvm.minimumnum.f32
. The loop vectorizer might attempt to vectorize this loop, potentially hoisting the call to llvm.minimumnum.f32
outside the loop. By running the code through the loop vectorizer and examining the output, we can see whether the vectorizer has made a suboptimal choice. In this case, the output shows that the vectorizer has used the scalable vector version of llvm.minimumnum
, even though it might not be the most efficient option. This highlights the impact of the bug on the generated code.
Impact on Loop Vectorization
The bug has a significant impact on loop vectorization, particularly in code that uses SVE intrinsics. By not accurately costing hoisted instructions, the loop vectorizer can make suboptimal decisions, leading to performance degradation or even crashes. This issue undermines the effectiveness of loop vectorization as an optimization technique. Addressing this bug is crucial for ensuring that LLVM's loop vectorizer generates efficient code. The impact on loop vectorization underscores the importance of fixing this bug.
Related Discussions and Fixes
The provided information mentions a related discussion and a potential fix in https://github.com/llvm/llvm-project/pull/145545. This pull request likely addresses the crash issue mentioned earlier and might include changes to the cost model or the loop vectorizer itself. Keeping track of such discussions and fixes is essential for staying up-to-date with the latest developments and ensuring that the bug is properly addressed. The related discussions and fixes provide valuable context and potential solutions.
Conclusion
In conclusion, the bug where LLVM's loop vectorizer skips cost computation for hoisted instructions is a significant issue that can lead to suboptimal code generation and even crashes. This problem is particularly relevant when dealing with SVE intrinsics, where the cost model might not accurately reflect the true cost of vector operations. By understanding the root cause of the bug and its consequences, developers can work towards finding effective solutions and ensuring that loop vectorization remains a powerful optimization technique. Addressing this issue will improve the overall performance and stability of LLVM-generated code.