The following guest blog post was submitted by Kevin Normann, President and CTO of Midnight Studios.
Earlier this year, we at Midnight Studios were asked to recreate the Origami Sky demo project built by MaxPlay's Game Development Suite, using Unity and Unreal to compare the CPU performance of MaxCore against the other two game engines. I would like to share some interesting takeaways from the project that are valuable to game and VR developers interested in optimizing CPU performance.
First, a little about the Origami Sky demo itself. The demo is set in a Japanese Zen garden filled with thousands of flying origami cranes (boids), lit floating lanterns, and a few predatory paper dragons. Each lantern contains a dynamic point light source, and all moving objects are full game objects rather than simple particles. This is important to note because there are so many boid objects flying in procedural motion that it is easy to mistake it for a particle system running on a graphics card. These are actually full game objects whose behavior is processed by the main CPU rather than by a shader. This approach intentionally pressure tests the power of the n-core processing with scaleable content.
Our objective was to take this demo and recreate it using Unity and Unreal so that the results of all engines can be compared without bias toward one engine over the other. This meant that we would be keeping the flocking origami “boids” behavior processed on the CPU using the threading models provided by the given engines, and keep all other processing off the main CPU as much as possible. It also meant that all of the objects and art assets would need to be nearly identical. Our secondary objective was to ensure that the demos were as visually comparable as possible, despite the differences between the engines.
For both Unity and Unreal, the shared principle work involved…
- Importing assets into the project
- Building the scene
- Populating the scene using a shared definition file to define boid object counts
- Tweak material properties, lighting and colors for visual similarity
Unique work for Unity included:
- Porting the code to C#
- Porting the flocking code to use the Mono supplied (.net style) threading model
Unique work for Unreal included:
- Porting the C++ code to build within the Unreal project
- Porting the flocking code to run within the custom Unreal threading model
The work was straightforward with one exception. Getting the rendered results on all three engines to look as similar as possible took considerable effort, because there are so many subtle and not-so-subtle differences between the engines in how they render that final scene; compensating for these differences is not something that developers typically have to do since most games are developed on only one engine. The goal was not only to make the final result look as similar as possible, but that the changes didn’t create any added burden on the engines involved. This means that where applicable, we used built in shaders that were similar in visual look and underlying implementation. The final resulting look is very comparable given the uniqueness of each rendering solution.
Our assumptions were that there would be some level of performance gains due to MaxPlay’s unique multi-threading approach, but we were not at all expecting this; the results were shockingly positive. We expected that Unity would be slower simply because of its architectural challenges and the overhead involved in using C#. When looking at the C++ code between Unreal and MaxPlay, at first look, it appears as though the weight of processing incurred on any given object would be heavier for each object processed within MaxCore, however the results show that any perceived “weight” in MaxCore code clearly has drastic benefits at scale!
At 3,800 origami cranes, 3 dragons, and 1,900 lanterns (5703 total objects), MaxPlay maintained an FPS consistently above 90, Unreal was at 15 and Unity was 5:
The tests for all 3 engines were performed on an Intel Core i7-5960X 8-core CPU with an Nvidia GTX980 Ti graphics card.
For the second part of the comparison, we scaled down the number of objects rendered within Unreal and Unity until we reached and maintained 90 FPS on each engine We found that Unreal could maintain 90 FPS with a total object count of 1189 and Unity could maintain 90 FPS with a total object count of 280. Far less than MaxPlay's 90FPS and 5,700 total objects.
Again, the tests for all 3 engines were performed on an Intel Core i7-5960X 8-core CPU with an Nvidia GTX980 Ti graphics card.
"The performance boosts will be especially noteworthy for projects
which rely heavily on main CPU processing"
When interpreting the results above, keep in mind that the demo is focused on each engine's ability to process large quantities of game objects and provide them with simulated flocking behavior, ALL on the main CPU. These results are based on “out-of-the-box” threading models used by all three engines, as well as their internal, per object processing overhead. All tests, including MaxPlay, were done without specific optimization related to the flocking algorithm. Clearly, MaxPlay out-performs the other two engines phenomenally with the help of their MaxCore system. Some credit should also go to its LID and hierarchical light storage solution. These two advantages are the primary reason for the improved performance of the MaxPlay engine. Due to this drastic performance difference we should consider the context to determine if there are any other contributing factors.
While the results are drastic and positive for MaxPlay, I do want to point out two things that bear consideration:
- MaxPlay, at the time this work is being done, is not yet a fully featured engine. It is reasonable to assume that as features come online, its object processing will incur some more overhead which would affect the results to some degree.
- This demo is simpler than a robust video game. A video game will involve more custom processing and conditional behavior per object that would likely scale the results uniformly across all three engines at bit.
These are both minor points and are stated for full disclosure and accuracy. While they have some favorable impact toward MaxPlay in this test, certainly not by enough to discredit these results.
Lets face it, even in the unlikely case that MaxPlay was running 50% slower, it would still perform much better than the others.
So, what conclusions should we draw from this test? First and foremost, MaxPlay’s MaxCore technology is exciting. The performance boosts will be especially noteworthy for projects which rely heavily on main CPU processing. This includes games with large numbers of active objects or objects with advanced AI and/or complex animation. In addition, projects that are heavily GPU bound may actually benefit from moving some of the work optimized for the GPU back onto the CPU. Developers of such projects should seriously consider the MaxPlay engine as a powerful solution to meet their particular development needs. The boost in performance may not be quite as extreme as the statistics provided in this demo, but will be compelling nevertheless. In this era, when games are looking to operate at a consistent 90+ frames per second for VR/AR, developers need every advantage to continue to provide players with compelling experiences.
In conclusion, let me add that Midnight was happy to do this work for MaxPlay, and we are now even more excited about the direction that the MaxPlay engine is taking. We look forward to compiling more statistics and examples of the strength of the MaxPlay architecture as features come online. We have even started our own internal project that makes use of the specific scalable advantages provided by the MaxPlay game development system.
If you have any questions about this project or other game development topics please contact me.
Thanks for reading!
President and CTO
Midnight Studios, Inc.