December 18, 2016

UE4 - Quick Tips, Tricks, and Optimizations

I have been learning a lot about Unreal Engine over the past few years. My training in the engine officially began in the fall of 2012, and since that time I have spent all of my efforts in learning all I can about the new engine and how it works. Whether you're a seasoned veteran or just getting started, there are plenty of tricks that you can use to help improve your material building:

General Material Building
  • UE4 has a lighting, reflection, and shadow environment built-in. Using the standard system will always give you the best results. Purely custom code will not blend as well in the environment and won't be as optimized.
  • Reuse code often: even if you have a cheaper method to calculate something, when you reuse branches of code for other purposes in the shader, the result is cached and duplicated. When you use cheaper methods, you recalculate what is already done.
    • If you can reuse code for vertex displacement in the normals, you can save a LOT of instructions for very complex materials!
  • Use textures instead of procedural methods. While this may sound like a bad idea, textures use a simple array lookup while procedural methods require tons of calculations. Textures are also very complex and artist-driven, so it is much better and easier to use a large 2k texture for wind displacement than some complex procedural code.
    • Keep in mind transforms only work in world position offset, not tessellation. Limit your tessellation usage to only what's necessary.
  • For objects with high vertex instruction cost, limit the vertex count as much as possible. Vertex shading is multiplied per-vertex. Grass meshes only need 4 vertices. Water planes, unless you need translucent fog, only need 4 as well. When translucent vertex fog is necessary, try to keep the vertex count under 4,000.
  • Power of 0.5 and 2 are cheaper than a power of 3 or 4, those are cheaper than powers of 5, and that is cheaper than powers of 5.75. This is due to the Power node using different methods to compress exponents: inverse square and square compress as if you multiplied with or without one-minusing the result, and is much cheaper than anything else. Multiples of that (4, 8, 16, etc.) are cheaper than that. Exponents that are a whole number calculate using the Exponent expression in HLSL. Decimals require the Power expression, and are the least efficient. But all this can be done simply using the Power node without the need for multiplying by itself.
  • Don't just slap a texture, normals, and roughness on an object and call it done! Overlay two textures on top of each other and get some more variation out of it! These layered texturing techniques were pioneered in Banjo Kazooie and Ocarina of Time. They are minimally difficult to render, and the end result is well worth the effort.
  • Frenel shading in a PBR environment is welcome! Custom reflections, however, are not. Use reflection captures and stationary skylights for areas that need a reflection. If you must use a custom reflection, the object may not match the physical environment it's placed in.
Translucency
  • Translucent rendering is cheapest when unlit. However, you lose the benefit of shadowing and local lighting. Non-directional per-vertex (on lower polycounts) is the second cheapest. Per-vertex directional is third cheapest. Unless you need localized reflections on translucency, those lighting models can save you a hundred or more instructions over Surface Translucency Volume and Forward Shading models.
  • All forms of translucent rendering don't feature GGX specular rendering. You can bring it back by adding a light vector blueprint in your level and using the engine's HLSL code for GGX specularity.

    Custom Node:

    float a = Roughness * Roughness;
    float a2 = a * a;
    float d = ( NoH * a2 - NoH ) * NoH + 1;
    return a2 / ( PI*d*d );
  • Traditionally, translucency does not fare well for cheap depth of field methods like Gaussian blur. Seperate translucency is unaffected by DoF, and disabling separate translucency is always affected by Dof, there is no in-between. However, you can fade out the translucent object as it recedes into the distance, and you can use MipValueMode > MipBias to blur a textured translucent surface into the distance. The cheapest and easiest way is to use the pixel depth to increase the bias and blur into the distance. Use a power of 0.5 to get the blending just right. Keep in mind, this is all done with "Seperate translucency" enabled so your translucent material stays nice and crisp up front, but blurrier in the back.
Lighting
  • Use a skylight! Skylights account for all light not directly hit by the sun. This is what allows your environment to be lit all around.
  • To save on the cost of rendering dynamic GI in an outdoor environment, use a skylight and change the bottom color to a greenish hue. This will emulate the bounce light provided by green grass. The results are not perfect, but very similar. You can change the color to whatever you want, and the result is easy to render and looks pleasing.
  • Keep your use of dynamic lights to a minimum. While the engine was designed to handle multiple lights overlapping each other, the pixel cost spikes in areas where lights overlap. If you do have overlapping lights and are using the precomputed lightmass process for GI, do not exceed 4 lights overlapping at once. Otherwise, lightmass will need to bake more than 4 channels for light GI to be rendered. The fewer stationary overlap the better, but if you need more, switch to dynamic and use soft static lights for indirect lighting.
Particles
  • Small particles like sparks are best handled by the GPU. They can collide cheaply and spawn in the millions. Large particles like dust clouds are best handled by the CPU. You get the power of particle cutouts and subimage UVs to limit overdraw while providing variation. Unless you have more than 1000 CPU particles onscreen at once, you do not need to worry too much.
  • Very small particles can get cut out with various AA methods. TXAA is the most aggressive. Switch to FXAA if you need small particles, or increase your particle size.
Landscape
  • Draw calls are cheap nowadays. Maximize efficiency by making more component sizes at fewer quads per section. This allows unused components to be culled out.
  • In some cases, tessellation is preferable to parallax occlusion. For exceptionally steep, not-too-sharp detail and low-vertex environments, tessellation is already implemented on Landscape, so you can handle triangle explosion without killing the card. But for crisp, sharp detail that doesn't need to be tessellated, POM is much better.