Tuesday, February 26, 2013

Optimization Magic - Part 2

(link to Part 1 of Optimization Magic)
(a slightly circular reference to forum discussion)

First; barring any unforeseen issues we are launching 1.10.2 globally in a few hours (of course with the caveat that anything could be rolled back). We expect 1.10.2 to result in a 15% server-side performance improvement across the board on the existing hardware. So while that does NOT completely address the current performance spikes, it should generally improve the experience and allow the game to run more stable at its current 25 FPS server-side tick rate. Also 1.10.2 is the first of a series of optimization focused updates.

SPCT and trashing the new hardware.

Second; a big thanks to the SPCT (San Paro City Testers) who jumped in and tried trashing the new software and new server hardware on Saturday as well as some invited guests (and for those who don't know, you can join the SPCT program to participate in the public tests of new builds by applying through our forums. Keep in mind that part of your entrance exam is figuring out HOW to apply...)

It was a pretty fun event since we basically put up the new hardware, upped the CCU to 100 per server core, then unlocked the FPS-lock on the new software, and asked everyone to show up in the same spot and "shoot and holler" and basically perform as many close encounter actions as they could (which in turn trashed client FPS for most players, since rendering 100 other players within 90m actively shooting at you and setting off explosions turns out to have its own client FPS issues, though that's separate from the server effects).

In short we tried creating the worst possible scenario for the game (server wise) and we think we pulled that off pretty well.

The good news is that on the new hardware running 1.10.2, we saw a stable server side FPS around 40 FPS (basically frame ticks stayed around 25ms), with the biggest spikes at 28ms (which is a server side FPS of 35FPS).

This means we should be able to comfortable return to 100 CCU and set the server tick to 35FPS using the new hardware combined with the upcoming software version of 1.10.2.

You can see the video of the carnage here:

Here is the resulting performance graph from the test above:

In fact, as an added bonus, during the 1.10.2 maintenance we are going to put SOME of the new hardware into production for SOME (not all) of the Financial Districts on US-West.

Due to some quirks in the configuration system, we will not be able to change the 25FPS limit until ALL the districts have new hardware, but at minimum US-West should start seeing more and more stable performance in Financial as a result. After all the US-West Financial have been upgraded, then we will move on to Waterfront.

Next steps

Next step is to evaluate 1.10.2 after the software patch, find other obvious performance issues, and then review what other optimizations we will focus on next. After that we will monitor and measure the effects of the OverKill hardware being put in US-West in production, and hope it doesn't physically explode or blow up, or catch on fire in our datacenter given it's 'pushed' pretty hard. 

Presuming all goes well, we will expand the roll-out of the new hardware and progressively replace Financial and Waterfront in each of the worlds, while at the same time continuing software optimization.

One of the challenges on the client side remains the giant particle-effect slow-downs. Unreal has a known issue where unlit particles cause large FPS hits. Epic addressed some that in the 2009 build of the engine with the introduction of the Light Mass system, but unfortunately APB Reloaded pre-dates that particular engine build, and does not have access to that lighting system.

We have various ideas of how to upgrade or replace parts of the engine over the next few months, and some are more radical than others (maybe just creating a whole new APB game using the new Unreal engine instead?.... :) ), but the basic direction is to continue with optimizations in any way shape or form we can think of, and to continue keeping you informed via the blog and forums as we roll out these changes.

See you in San Paro!


Sunday, February 17, 2013

Optimization Magic

(a slightly circular reference: link to forum discussion)

One of the hardest things for a small team like ours to fit in to the product cycle on a regular basis is optimization.

Optimization is a little bit like making sure we are eating our vegetables every week. Every time we add or change a feature in the game, in theory we should also refactor everything around the code that just changed. Since that's usually not practical, large scale optimization really tends to happen in chunks when the team is finally given some time to focus on the task.

Now that we are about to release newly optimized code, I wanted to take some time to lift the hood on the work being done, and to share some insight into issues we have to contend with - from the software engine to server hardware.

We are really proud that APB Reloaded has remained a consistent top-5 (out of 100+) in Steam's Free2Play category since its December 2011 launch (as an aside; the game actually gets the vast majority of its traffic directly through rather than through Steam, but Steam provides a convenient benchmark to compare against other games), so we are planning for this game to provide many more years of entertainment for all its fans. But this tension between short term and long term goals (survive today, but plan for a 5+ year lifespan) means that every day we struggle with what we should focus on next; features, maps (Asylum anyone?), game content, security or optimization? All of these are pressing needs for one reason or another.

Given that turning off the game and entering an optimization-only cycle is not possible, we instead attempt  the next best thing; we optimize while running full speed. It's a little bit like changing the oil in your car while travelling at 65 MPH down the highway. What could possibly go wrong?

APB - a server resource hog?

Few Unreal engine based games have attempted to throw 100 fully customized players (how many FPS games ever give you 50 v 50 in a single map?) in a twitch FPS/TPS setting on a single area, where each player can have a fully customized character, car and skinned weapon, which means at least 200 fully custom player items (cars and characters), up to 850 autonomous NPC pedestrians and 350 autonomous NPC cars driving around in a district, in addition to everything that's movable and destructible (traffic lights, dumpsters, billboards etc.).

This means roughly 18,000 dynamic actors for the server to track while running in a single shard, and in fact on a single server core (and read more on the 'single-core' issue down in the hardware section).

Granted newer games using other engines like Planetside 2/Forelight have used a very different 'continent', 'distance' and 'mission' optimization system to allow much larger factions on a continent (though not necessarily in the same firefight), and even Fallen Earth uses a system of dynamic shards to allow 10,000 players in an area. But neither is an Unreal game.

Technically speaking customization has some impact on the server side performance (mostly due to large amounts of asset streaming), but customization has a larger negative impact on the game client, and tends to drive client-side frame rate lower than the expected frame rate from someone's gaming rig if customizations had not been such a central part of the game.

While there are clearly other FPS/TPS' that perform amazing graphical feats on older hardware (CoD-MW series, Crysis series, Gears of War series, Far Cry etc.), they rarely allow this many complex humans and AI actors in a single battle area, or when they do, the participants are streamlined and unified and do not permit nearly APB's level of insane customization (or city-wide destruction). Or they behave more like RPG or RTS games and generally have much lower requirements for hit registration, server tick rate and movement prediction. The amazing feat in APB is that this game still actually works on a lot of pre-2009-era hardware given the extreme computational complexity of the game.

Server FPS vs Client FPS

As a general rule, we want the server to perform a full pass of computations for all the 100 players and 18,000+ district actors 30 times per second (giving each CPU core at most 33ms to complete all computations in that one frame). If we achieve 30 FPS on the server, then connected game clients can easily run at 2X-3X the server tick rate (60FPS - 90FPS) without any noticeable loss in accuracy. At 1:2 or 1:3 server-to-client ratio movement prediction and frame-interpolation provide a very smooth game experience.

Unfortunately during the last few updates we have had to temporarily lower the server tick rate to 25 FPS and reduce the max CCU per core, so it's high time to perform another full optimization pass.

Software Optimizations and Server Side Computation times.

Below is a graph of what version 1.10.1 server-side computations look like under ideal test circumstances AND using our new test hardware (more details on this new hardware at the bottom of this post).

In the current 1.10.1 build the server completes 1 full frame (moving those thousands of actors around) on 1 core in 1 full district using the new hardware type in 19.2ms. In 'theory' this means the server on the new hardware is capable of running at 52FPS tick rate (!).

This is to be compared with the 'current gen' hardware, where we have only been able to run a 'safe' server tick rate of 25FPS in the current 1.10.1 build.

The lower part of the graph shows version 1.10.2 with the new software optimizations.

From the synthetic test it appears the team has been able to squeeze a 16% performance improvement in software alone (which amounts to about a 10 FPS improvement on the server). This improvement drops the per-frame processing time to 16.1ms, which means a theoretical 62FPS server tick rate (again on the new hardware).

This 'should' mean that software optimization alone (the 16% improvement) will let us go from 25FPS back to the original 30FPS serverside tickrate on the current hardware as part of the 1.10.2 update (to be determined after the game is live).

You can read these graphs from the bottom up, starting with receiving network packets from all connected actors, updating game elements and physics, updating cameras and streaming, and ending with sending data back to all clients. What's rather surprising is that almost 50% of the entire server processing time consist entirely of receiving/parsing and serializing/sending network traffic. The actual game updates (players, objects, physics etc.) take only 50% of available CPU time.

From the above graphics you can see that the team has been able to really squeeze and optimize the "Receive Network Traffic" and the "Update Game Objects" steps. We expect to continue optimizing all the steps in the system, but presuming QA signs off on the upcoming patch, we will measure the real-world impact of these improvements in the coming week.

The Single-Core Engine Conundrum

First a disclaimer. The Unreal Engine has served us (and thousands of other games and companies) incredibly well. It's a great engine and a fantastic rendering system. Now the engine has certain design choices that create certain hard-to-overcome limits (as all engines do).

The biggest one for large scale games is Unreal's monolithic and (almost) single-threaded server-client-response system. The philosophy behind Epic making that design choice back in the era of Unreal Tournament / Gears of War makes perfect sense, given the engine's focus on small-scale lobby based FPS/TPS games or even single player or co-op games. Some Unreal based RPG's (for example Blade and Soul) have clearly adopted the engine as a renderer, and then created an entirely proprietary server system to handle RPG style updates and connection loads (which usually requires 2000-3000 players per shard but only a server tick-rate of 10FPS or less in RPG mode).

APB Reloaded uses a hybrid of standard Unreal server code (originally we used Unreal version 2008, so the engine is getting a little aged at this point) and its own proprietary TCP message stack coordinating the communications between worlds and districts, as well as a very proprietary customization system. But the general actor-to-actor interaction relies on a system that's very close to the original Unreal system. Mostly handled in a single game update loop.

This means all the processing in a single district happens on a single core and in a single thread.

One way to think of this is that the engine fundamentally works like a turn-based game where each actor has  33ms to move per turn. Within the scope of a single server core/thread, the process gives each actor one chance to make a move (or combination of moves). When all actors have signaled their move, everyone is told of everyone else's  updated moves, and the game now proceeds to the next move (though from the chart above you can see that we actually only spend about 7.5ms moving stuff around, the rest of the time is spent sharing that information).

Human reaction time (or as it's called Mental Chronometry) is around 160ms, so processing everything at 33ms on the server, plus the packet roundtrip time (ideally less than 40-80ms) for a total of about sub 120ms of processing delay, should give us sufficient headroom to provide a good player experience.

However, even just a slight improvement in server side processing will actually enhance the fluidity of the game. We humans are very good at processing sequential frames of information and can easily spot the visual difference between film at 24fps and video at 30fps (or as the case may be "the Hobbit" at 48fps for those of you who now hate Peter Jackson). This means we will notice visual processing hiccups long before we react to new on-screen events.

Why does all this single-threaded-ness matter to us? Well, it turns out that most of the performance gains in recent years in server processors from Intel and AMD have NOT come from performing more computations on a single core, but rather to have many parallel cores performing parallel tasks.

Sadly for APB Reloaded, that type of parallel task division does not improve individual district performance... But... there is hope...

New OTW Hardware Test World going live: OverKill

In the near future we are about to release a new OTW (Open Test World) called OverKill. OverKill is actually an apt name and is the result of a lot of hardware experimentation by our IT team (and the above computation tests were run in this hardware as well).

The current generation APB servers consist of Intel Xeon X5570 "Nehalem" based processors (operating in 3.2Ghz Turbo Mode) with 4 cores x 2 processors each. We use Dell M610 blades like these (just recently pulled from our datacenter).

The benefits of blade servers are that we can increase the density of the hosting operation, since we can fit 16 servers in 10 "rack units." The drawbacks - the types of processors supported by blade servers and the inability to overclock those processors - have caused us some serious problems in optimizing the hardware for the game.

For quite some time we have been looking for a new processor solution specifically to handle Financial and Waterfront (and eventually Asylum) districts. Something that can live in our three datacenters, but at the same time give us a cost effective solution to run at much higher single-core clock speeds, while also taking advantage of the newer "Sandy Bridge" and "Ivy Bridge" Intel processor architectures.

After much playing around with various combinations of server chips, it turns out that  server boards and server chips really don't like or even permit overclocking and they are almost never engineered to optimize single-threaded performance (other than the incidental improvements that come from larger L2 and L3 cache systems), and we also need at least 6 cores to be able to perform these calculations in a cost effective manner (which let's us run 3 fully loaded districts on a single server) which left us with a conundrum.

After experimentation we have settled on having a public test using a custom solution that uses a high end desktop board (ASUS Rampage 4 Extreme) combined with an unlocked Intel i7-3930K 6-core processor that in a datacenter settings (with lots of cold air) easily runs stable at 4.25Ghz (technically we can push it to 5Ghz, but we are starting small).

Will it work once we throw real APB district computations into these systems? The synthetic test indicates it will indeed work. Will I/O performance hold up (given the strangle-hold that network I/O has on server CPU)? That's much harder to test, so we will find out as soon as we start running the  OTW tests.

In a synthetic benchmark the i7-3930K OC (compared to the stock X5570) shows raw gains of nearly 70% in single-threaded performance (!). We do lose two cores per server, but the extra expense (more servers) seem worth the vast performance gain.

CPU Bench – Single Threaded:
[ORIGINAL X5570] – 1349
[EXPERIMENTAL 3930K] – 2284

If we can capture some of these performance gains in the real world, and translate it into improved Action District performance, then our longterm goal is not only to ensure a stable 30 fps server tick-rate, but gradually be able to raise the CCU in each district as well.

From the graphs on software optimization, you can see that the new hardware with the new software 'COULD' run a theoretical server-side tickrate of 62 FPS, which is 206% more than we actually require for the 30 FPS tick target rate.

Our plan is to use the extra performance (again once we have run the real world tests) to ensure we can increase CCU in a single district. Since CCU taxes the server in a non-linear fashion, we expect to only increase CCU 25%-50% before dragging the server back down to 30 FPS tick. Of course this is still speculation, and is still to be determined during live testing.

Higher district CCU would mean better matchmaking (but THAT is a whole other blog entry, though needless to say 80 people in a district means 20 teams with potentially 10 ongoing matchups whereas 120 people in a district means 30 teams with 15 ongoing matchups, resulting in 50% improvement in match availability. Of course it's not quite that simple - but you get the gist). More players = better matchmaking.

OUR hardware, software and network vs. YOUR hardware, software and network.

In this post we have only talked about server side processing and optimization, and have not touched the OTHER things that also affect performance  First and foremost - you need a good gaming rig to play APB. We always recommend having 8GB of RAM and using 64-bit Windows 7. Anything less is asking for trouble. In particular using 64-bit Windows is critical. Also client-side FPS in most Unreal games tends to drop dramatically during very large semi-transparent VFX events (i.e. very big explosions where the player does NOT die - something APB of course has a lot of) so only higher end graphics cards tend to perform ok during those big VFX events (and to optimize that part of the engine code is a whole other ball of wax, far beyond the scope of this current post).

Of course network connectivity, and your latency to our core datacenters are critical as well (Los Angeles, Washington DC and Frankfurt) or to the datacenters managed by our Russian (Moscow) and our Brazilian (São Paulo) publishing partners.

I hope this article has shed some light on the optimization work currently being done. If you are one of our OTW testers, then expect to see the "OverKill" world come online in the next two weeks. And for everyone else we expect to release 1.10.2 very soon, which should have some immediate performance improvements.

Til Next Time!

Friday, February 1, 2013

From Reloaded, With Love

As I mentioned on Twitter, here is the blog we've been working on to let you know in advance all of the plans we have for Valentine's day this year.

Pink and heart-patterned weapons are back! Use them to participate in our Massacre Event or participate in one of our community contests.

We have a fairly new member of the team that will share more details on the festivities on all of our behalf.  Please meet Simon, one of our Producers.


Written By, Simon
Reloaded Producer

Romance is in the air here at the Reloaded love-nest as Valentine’s Day is fast-approaching. Our Valentine’s update includes new role levels, contests and the all-new Fallen Angel costume pack!

'Valentine's Week Massacre' 2013

To keep things interesting for new players and long-term fans alike, we’re taking last year’s Roles and adding in more levels and rewards. Please note any kills made during last year’s event will be carried over to this year, so you don’t have to start from zero again (if you already hit over 750 kills last year you’ll instantly get the top level reward).

The levels for the ranks of the 'Valentine's Week Massacre' role are:

Rank 1 - Requirements: 50 kills. Reward: Death Theme 1.
Rank 2 - Requirements: 150 Kills. Reward: Love Heart Glasses.
Rank 3 - Requirements: 300 Kills. Reward: Death Theme 2.
Rank 4 - Requirements: 500 Kills. Reward: A leased M1922 (Tommy Gun) + Pink Preset Police Hat.
Level 5 (NEW) - Requirements: 750 Kills. Reward: 'Casanova' Title + Valentine's weapon skin.

The 'Snubby Love' role, tracks kills with the Colby SNR 850 'Cherub' secondary weapon, which is gifted to you when you login to the game.

The levels for the ranks of the 'Snubby Love' role are:

Rank 1 - Requirements: 25 kills. Reward: Ophelia - Blonde Decal Unlock.
Rank 2 - Requirements: 75 Kills. Reward: Prentiss (Enforcers) / Bloodroses (Criminals) symbol.
Rank 3 - Requirements: 150 Kills. Reward: Ophelia - Red Head Decal Unlock.
Rank 4 - Requirements: 250 Kills. Reward: Colby SNR 850 'Cherub' unlock, so it can be purchased from Contacts at any time (including outside the event).
Level 5 (NEW) - Requirements: 375 Kills. Reward: 'Cherub Chaser' Title.

New “Angel Wings” Backpack Item

We’ve had a lot of positive feedback about the skateboard and radio backpacks we recently released, and you’ve been asking us to make more available on ARMAS. Our art team came up with plenty of cool ideas and the first one released is Angel Wings for Valentines!

To get the detail required, one of our artists, Tim, used a piece of software called ZBrush to digitally sculpt the feathers. Digital Sculpting is exactly what it sounds like and can be compared to carving clay or marble, but it’s all done virtually using a PC & graphics tablet instead of a workbench & chisel.

The sculpted model was then exported to a design program called 3D Studio Max where it was tweaked and tested with our APB characters. The final stage meant adding the wings to our test version of APB and checking that they work with other customization items, running around with the item deployed, getting in and out of vehicles, and more.

The final customizable item will be available as part of the “Fallen Angel” pack on ARMAS. As a special bonus we’re adding in a customizable Halo and a “Fallen Angel” title.

In summary, the “Fallen Angel” Pack includes:

Angel Wings (customizable).
Halo (customizable).
“Fallen Angel” title.

Valentine’s Community Contests

In addition to all this new content, we’re going to be running a couple of Valentine’s-themed contests from February 1st to 14th (US Pacific time).

“Rev My Engine”

Keeping the Valentine’s Day theme in mind, players must trick out their ride by designing their own love machines!

  •  14 Entries will be chosen by a unanimous decision.
  •  Each winner will receive a Vehicle Selector - choice of one Account Lifetime Vehicle!
  •  Starts Feb. 1st and ends Feb. 14th

“Be My Valentine”

Love is in the air and it’s time to capture it! Players can design two matching or complementing outfits, one outfit per person.

Players may submit their outfits and characters for only ONE of the following categories:

1) Most Fugly
2) Most Sexy
3) Celebrity Look-A-Like
4) Fictional Icons
5) Star Crossed Lovers

  • Each character must take one in-game editor screenshot and submitted in one entry.
  • Each entry must also have at least one in-game group photo with their matching / complementing outfit worn.
  • Two characters max per couple.
  • One couple per category will be chosen by a unanimous vote. (10 player winners total)
  • Each winning couple (2 different users per couple) will win 300 G1C + A Valentine Weapon selector 90 days!
  • Starts Feb. 1st and ends Feb. 14th.