HHVM includes an experimental LLVM code generator that requires a custom
version of LLVM at the moment.

We introduced a number of extensions to LLVM that are highly
unstable. Each of them could be modified or dropped in the future.
Once we finalize our requirements for LLVM we plan to upstream the
changes. For the moment we will keep LLVM patch in tools/llvm
directory to allow everyone to build and use LLVM backend with HHVM.

The included patch has been applied and verified to work with
trunk@215415 (git 23761603fe609770cc6fd3e42edf96b273265b7d).

In case you need to build clang with this LLVM branch, the corresponding
clang commit is trunk@215290.

To use the patch with git:

  $ git clone https://github.com/llvm-mirror/llvm
  $ cd llvm
  $ git checkout -b hhvm 2376160
  $ patch -p0 < ${path_to_hhvm}/tools/llvm/llvm.patch
  ...

Our current list of extensions to LLVM includes smashable attribute,
locrecs metadata, HHVM-specific calling conventions, and HHVM-specific
optimizations.


I. Location records.

HHVM runtime needs to patch tail calls generated by the backend and thus
needs to identify code locations generated for these instructions.
We've considered using LLVM's patchpoint interface for this, but
given its limitations (mainly the lack of tail call support and
negative effect on optimizations) decided to use a special kind
of metadata that gets propagated to MC level (current implementation
piggybacks on debug info).

As of LLVM 3.5 the syntax for the metadata is as follows:

  musttail call void @foo(i64 %val), !locrec !{i32 42}

Note that there's no direct relationship between LLVM instructions
and instructions at machine level. After optimizations instructions
could be eliminated, combined, cloned, etc. Similarly LLVM function
code could be inlined, cloned, or outlined and thus we do not
include function context for location records, and expect locrecs IDs
to be unique within a module.

Location records are written into .llvm_locrecs section with the
following format:

Header {
  uint8   : Major Version (1)
  uint8   : Minor Version (0)
  uint16  : <reserved>
  uint32  : NumRecords
}

LocationRecord[NumRecords] {
  uint64  : Address           ; absolute address of the instruction
  uint32  : LocRec ID         ; ID of !locrec
  uint8   : Size              ; size of the instruction
  uint8   : <reserved>
  uint16  : <reserved>
}

For the reasons mentioned above there could be multiple records for
a single instruction marked with a unique locrec ID in LLVM IR, or no
records at all.


II. Smashable attribute.

The 'smashable' attribute can be attached to LLVM 'call' and 'invoke'
instructions. This guarantees that resulting machine-level
instruction(s) (if any) will not straddle a cache-line boundary and
thus could be safely smashed in multi-threaded environment. E.g.

  musttail call void @foo(i64 %val) readonly smashable


III. Calling conventions.

4 new calling conventions have been added to cover different calling
scenarios in HHVM.


IV. Optimizations.

HHVM-specific optimizations include conditional tail call optimizations
and a prototype for hot-cold code splitting based on block frequency.


V. Misc.

We disable generation of MCJIT stubs to avoid gaps in generated code.

1-byte alignment (i.e. no alignment) can be specified for functions
that don't have OptimizeForSize attribute.

To keep functions from different modules placed as tightly as
possible and yet to satisfy internal alignment requirements, we use
module flags to tell LLVM that the code we are about to output will
be skewed with respect to a requested section alignment.

When emitting code for HHVM tracelets identified by HHVM_TC calling
convention, the generated stack prologue and epilogue are modified to
take into account the fact that stack pointer is differently
aligned on entrance to the tracelet.  Standard X86_64 ABI puts
the return value at 16-byte alignment, while for HHVM tracelets this
alignment is offset by 8 bytes.
