AliasAnalysis.html gives a basic introduction to alias analysis:

Alias Analysis (also known as Pointer Analysis) is used to determine whether two pointers refer to the same object in memory. There are many different alias-analysis algorithms, which can be classified as flow-sensitive vs. flow-insensitive, context-sensitive vs. context-insensitive, field-sensitive vs. field-insensitive, and unification-based vs. subset-based.

Traditional alias analysis is used to answer Must, May, and No:

  • Must means the two pointers always point to the same object;
  • May means they may point to the same object;
  • No means they never point to the same object.

The article Compiler Optimization: What Is Alias Analysis - Zhihu (zhihu.com) introduces several types of alias-analysis algorithms.

The LLVM AliasAnalysis class is the base class for implementing alias analysis. It can provide simple alias-analysis information as well as Mod / Ref information, which is useful for more advanced analyses. The two alias-analysis optimizations enabled in SkyEye are derived from the AliasAnalysis class. (Mod: memory access modifies; Ref: references memory)

The simplest application of alias analysis:

For example, the following C code:

1int foo (int __attribute__((address_space(0))) *a,
2             int __attribute__((address_space(1))) *b) {
3    *a = 42;
4    *b = 20;
5    return *a;
6}

is translated into LLVM IR as follows:

1define i32 @foo(i32 addrespace(0)* %a, i32 addrspace(1)* %b) #0 {
2entry:
3    store i32 42, i32 addrspace(0)* %a, align 4
4    store i32 42, i32 addrspace(1)* %b, align 4
5    %0 = load i32, i32* %a, align 4
6    ret i32 %0
7}

Now we need to optimize foo and remove the unnecessary load:

1define i32 @foo(i32 addrespace(0)* %a, i32 addrspace(1)* %b) #0 {
2entry:
3    store i32 42, i32 addrspace(0)* %a, align 4
4    store i32 42, i32 addrspace(1)* %b, align 4
5    ret i32 42
6}

However, this optimization only works if a and b do not alias; otherwise it produces the wrong result, as shown below:

1    int i = 0;
2    int result = foo(&i, &i);

You can see that this call makes a and b alias. The function should return 20, but because of the optimization it returns 42, which is wrong. So the compiler can only perform this optimization when it can determine that the two pointers do not alias.

In my view, alias analysis mainly serves as a prerequisite pass for other optimizations. Its main role is to help other optimization passes decide which variables can be optimized and which cannot, so that redundant IR can be removed accurately.

So I searched the LLVM 3.0 source tree:

In the backend compilation pipeline, there are indeed several places that call alias analysis as a prerequisite pass.

image

I then selected the source code for the TypeBasedAliasAnalysis class for the main analysis:

TypeBasedAliasAnalysis inherits from ImmutablePass and AliasAnalysis, and provides a metadata-based implementation of type-based alias analysis by determining whether two types may alias through the relationships between nodes.

Because it inherits from ImmutablePass, TypeBasedAliasAnalysis is a pass that cannot be changed once created. Such pass objects are typically used to share state across the entire compilation process. Here, we create one for each JIT function. That also explains why it is invoked right at the beginning.

Another key term is metadata. LLVM uses metadata to carry additional information. This information does not directly affect program execution during compilation, but it is very useful for optimization and analysis. In this case, metadata is used to indicate type information for pointers and memory objects so that alias analysis can be more precise. The following figure shows an introduction to the metadata format (LLVM Metadata introduction - CSDN Blog):

 1// This file defines the TypeBasedAliasAnalysis pass, which implements
 2// metadata-based TBAA.
 3//
 4// In LLVM IR, memory does not have types, so LLVM's own type system is not
 5// suitable for doing TBAA. Instead, metadata is added to the IR to describe
 6// a type system of a higher level language. This can be used to implement
 7// typical C/C++ TBAA, but it can also be used to implement custom alias
 8// analysis behavior for other languages.
 9//
10// The current metadata format is very simple. TBAA MDNodes have up to
11// three fields, e.g.:
12//   !0 = metadata !{ metadata !"an example type tree" }
13//   !1 = metadata !{ metadata !"int", metadata !0 }
14//   !2 = metadata !{ metadata !"float", metadata !0 }
15//   !3 = metadata !{ metadata !"const float", metadata !2, i64 1 }
16//
17// The first field is an identity field. It can be any value, usually
18// an MDString, which uniquely identifies the type. The most important
19// name in the tree is the name of the root node. Two trees with
20// different root node names are entirely disjoint, even if they
21// have leaves with common names.
22//
23// The second field identifies the type's parent node in the tree, or
24// is null or omitted for a root node. A type is considered to alias
25// all of its descendants and all of its ancestors in the tree. Also,
26// a type is considered to alias all types in other trees, so that
27// bitcode produced from multiple front-ends is handled conservatively.
28//
29// If the third field is present, it's an integer which if equal to 1
30// indicates that the type is "constant" (meaning pointsToConstantMemory
31// should return true; see
32// http://llvm.org/docs/AliasAnalysis.html#OtherItfs).
33//
34// TODO: The current metadata format doesn't support struct
35// fields. For example:
36//   struct X {
37//     double d;
38//     int i;
39//   };
40//   void foo(struct X *x, struct X *y, double *p) {
41//     *x = *y;
42//     *p = 0.0;
43//   }
44// Struct X has a double member, so the store to *x can alias the store to *p.
45// Currently it's not possible to precisely describe all the things struct X
46// aliases, so struct assignments must use conservative TBAA nodes. There's
47// no scheme for attaching metadata to @llvm.memcpy yet either.

I also looked into alias analysis’s memory consumption and whether calling it is expensive, but I couldn’t find relevant material. I’ll study that next time.

Main reference: llvm-3.0.src/docs/AliasAnalysis.html

Focused source code: lib\Analysis\TypeBasedAliasAnalysis.cpp