"Yes, you can build AOT compilers that beat JITs produced by meta-tracing an interpreter" - not for all the languages. As mentioned below, there is a certain tradeoff associated with JITs - warmup time, memory consumptions etc. But for certain class of problems (say compiling Python) and for certain class of use cases (running at top speed), I dare you to compete with PyPy. The biggest contender so far comes from the Zippy project, which is indeed built on truffle which is a meta JIT, albeit a method based one.
Not sure, don't have much experience with that. This is a "classic" futamura projection - you write an interpreter and the "magic" turns it into a compiler. I'm not aware of any consumer-grade compiler like that, but there is a huge swath of research on that.
You can very easily create a dumb one - you just copy-paste interpreter loop essentially (which is what e.g. cython does if not presented with annotations), however the results just aren't very good
Research on partial evaluation (PE) was fashionable in the 1990s, but largely fizzled out. I was told that was because they could never really get the results to run fast. I'm trying to understand why. Clearly meta-tracing and PE have a lot of overlap. Truffle is based on some variant of dynamic PE if I understand what they do correctly. Most of the 1990s work in PE was more about static PE I think. The paper [1] touches on some of these issues, but I have not studied it closely yet.