Follow via Facebook Follow via Twitter Follow via YouTube Follow via RSS Steam Group

Author Topic: The website is srs bsns  (Read 3046 times)


  • Admin
  • Posts: 1419
  • Reputation: +10/-0
The website is srs bsns
« on: Oct 10, 2011, 04:13 PM »
At some point, it was mentioned on the official boards that if you wanted to see that we were serious, you should check out the website. As it turns out, we are serious, and so is the website, but that probably isn't apparent at first glance. So I'm going to talk about what's going on behind the scenes and the decisions that led to the current webserver setup.

  • Scale very well
  • Handle burst traffic without crawling
  • Reliable
  • Automatic optimization
  • Provide the power to do whatever we feel like (hosting a full item database, etc.)
  • Provide high bandwidth for members to collaborate on projects with large chunks of data like videos

The first thing I did was to imagine what the server would look like if it was the monster it would need to be if everyone playing SW:TOR decided they wanted to visit the website a hundred times per day. This is intentional overkill, but it allows me to properly scale back to a basic configuration that could very easily scale up to that size if it is ever needed. The monster needs a real content delivery network, a massive and redundant relational database, a large cluster of webservers, squids, load balancers, and a bunch of ancillary servers to handle backups, media files, compression, image scaling, mail, etc. I also had to take into consideration what technologies I consider mandatory for the site, which really boils down to just PHP.

Distilling the monster down to something that can be deployed for a reasonable cost led to the following decisions. There would be three domains. Two of the domains would handle static content and could evolve into the content delivery network. The remaining domain would handle the heavy lifting -- the php. There would be a database server and a file server. The file server would be accessible from the outside for uploading and downloading big files without going through the website. That made a minimum of five logical servers, and the next thing that had to be considered was what technologies I wanted to use.

There were three choices for the primary webserver: Apache HTTP Server, nginx (pron. Engine-X), and lighttpd (pron. Lighty). I was never really happy with Lighty and the whole 'memory issue', so that left Apache and nginx. I don't like using Fast-CGI for PHP (which is another discussion altogether), so that left Apache for the heavy lifting. That's not a bad thing as Apache is highly configurable and robust with a long history. Choosing Apache also gives access to mod_<whatever> for any issues that might crop up. The problem with Apache is a scaling issue--it is a resource hog when it is trying to handle multiple requests and during bursts of traffic it can get mired down quickly. A reverse proxy goes a long way to lightening the load, so that was the next decision to make. Adding in the reverse proxy increases the number of logical servers to six.

The reverse proxies I considered were Squid, Varnish and nginx. I benchmarked all three in a test environment, but that was really for future reference--I already knew I was going to choose nginx for the basic config because the primary cache was going to be disk based and not memory based for cost reasons. Varnish and/or Squid have their place in the monster, but for the baby, nginx was an easy choice. The benchmarks used (ApacheBench and Load Impact) showed that nginx outperformed Squid and Varnish in the basic configuration... Which was a bit of a surprise, but certainly a pleasant one (even if the benchmarks are completely synthetic).

The choice of nginx for the reverse proxy also made it the server of choice for the domains that were going to serve static content. lighttpd would also perform well for this task, but limiting the technologies in use keeps maintenance woes to a minimum.

The database server? MySQL. An easy choice for various reasons, but especially in terms of scalability and what I personally have the most experience with. It can also be easily switched over to Amazon RDS which can meet the needs of the monster in a Multi-AZ deployment. And speaking of Amazon...

The choice for the file server is Amazon S3. This decision is all about cost, scalability and accessibility. An S3 bucket can be mounted as a local file system on almost any platform allowing ease of use for members to collaborate on videos. More significantly, it can be mounted on the webservers and used with Amazon CloudFront if I want to eventually use it as part of a CDN. It certainly isn't fast when used as a local file system, but it is fast enough for the intended usage, especially if it is fronted by nginx servers caching the content.

Next: Deciding what handles what, configuration and general optimization.
« Last Edit: Oct 13, 2011, 08:29 AM by Ratio »

Happiness is when you play a game so well that people call you a hacker.

There are two types of people in the world, those who can extrapolate from incomplete information.


  • Admin
  • Posts: 1419
  • Reputation: +10/-0
Configuration and Optimization
« Reply #1 on: Oct 10, 2011, 04:15 PM »
First off, a logical diagram of the website... Because pictures make everything better:

There are a total of six logical servers. I didn't draw connection lines, because it turned into a spiderweb in the back everytime I tried. I figured in this case, words would be better. When a client connects to the website, all it sees is the front end which is the nginx (RP) server. This is our traffic director which serves content back to the connected client either from its own cache, or by requesting the content from the appropriate server... which is any of: apache (www), nginx (img), nginx (static) or S3. The three webservers (apache (www), nginx (img) and nginx (static)) are all connected to the fileserver (S3) as well. Only apache (www) is connected to the database server (MySQL) as it is the only webserver that needs to touch the database. S3 and MySQL don't talk to each other at all.


The server configuration is overkill for the amount of traffic that we can expect in the beginning. I just wanted to get that out of the way.

The nginx reverse proxy has several other duties besides being a reverse proxy. It caches content, it compresses content and it sets expiry headers if they are missing. The nginx(rp) can also act as an incomplete failover if any of the webservers decide it's time for a vacation.
  • Caching content: Content is cached from all three of the webservers (www, img and static), but its primary cache function is to cache content from www for non-logged in accounts. It doesn't cache content (or serve cached) for logged in accounts or for any client connecting from a mobile device.
  • Compressing content: Content that gets to nginx(rp) without being compressed is compressed when it is appropriate. Appropriate currently means that the uncompressed item is over 4k and it's not something with native compression like an image file or a pdf. If someone is looking at the site with IE6 (why?!!) nothing is compressed.
  • Setting missing expiry headers: Anything that is logically cacheable by a browser that doesn't have an existing expiry header has one added with a value appropriate to the type of content. This is usually max expiry, though there is variance for certain items.
  • Pseudo failover: If the nginx(rp) tries to request something from a server that's dead, it will still serve its last cached copy -- even if the cached copy is expired.

The img server serves natively compressed content. Currently this is mostly images, which is where the clever name comes from.

The static server serves both uncompressed and pre-compressed content. Currently this is mostly CSS and JavaScript.

The www server serves php requests and all of the attachments. The serving of attachments is going to be moved to nginx(RP) in the near future. This server uses mod_rpaf to get the appropriate information from nginx(rp).

The S3 file server is mounted using fuse and s3fs.


The single biggest performance gain so far is using nginx as a cache. The nginx(rp) in front of apache(www) can push out 3.14x(!) as much traffic as Apache by itself. More numbers: 214% increase in requests per second handled. 218% increase in bandwidth. 141% decrease in time to serve a page. Max RAM usage with 100 concurrent connections: 60MB

There are a few scripts that crawl around optimizing content in the file systems that the webservers use. (How they work is beyond the scope of this already overly long post, but I can talk about them in detail if anyone is interested... As well as anything else, for that matter.)

The first script is an Apache Ant that crawls around looking for css and javascript to compress and combine. It has a list of files that it should compress using YUI Compressor and--when it should--combines them into the main css and js files that the site uses. Combining the files means less requests and therefore less overhead. It then gzips a copy so that nginx(rp) doesn't have to.

The second script crawls around looking for .png's and uses pngcrush to reduce file sizes.

The third script is also an Ant script that crawls around and sets appropriate headers on anything on its list. These include: Cache-Control, mime-type and Expires.

Cookie-free content: Only www sets and uses cookies, so img and static are cookie free. This saves a bit of bandwidth as cookies are not sent when requesting static content.

Parallel downloads: The three domains also allow parallel downloads... Web browsers limit the number of connections that they will make to one host. Having three hosts means that more connections are available for download which decrease page load times.

Compressed content: Content is compressed where appropriate.

Keep-Alive: Keep-alive is enabled based on average concurrent users.

You can always go to GTmetrix and see the current state of optimization.

[more to be added... probably]
« Last Edit: Oct 13, 2011, 08:37 AM by Ratio »
Happiness is when you play a game so well that people call you a hacker.

There are two types of people in the world, those who can extrapolate from incomplete information.


  • Admin
  • Posts: 1419
  • Reputation: +10/-0
« Reply #2 on: Oct 10, 2011, 04:25 PM »
[For what have you.]
« Last Edit: Oct 10, 2011, 05:39 PM by Ratio »
Happiness is when you play a game so well that people call you a hacker.

There are two types of people in the world, those who can extrapolate from incomplete information.


  • Admin
  • Posts: 1419
  • Reputation: +10/-0
Re: The website is srs bsns
« Reply #3 on: Oct 12, 2011, 03:11 PM »
How to make an "empty" GIF (44 bytes).
For those times when you manage to compile nginx without the empty gif module. :-[

Code: [Select]
echo -e "\x47\x49\x46\x38\x39\x61\x01\x00\x01\x00\xf0\x01\x00\xff\xff\xff\x00\x00\x00\x21\xf9\x04\x01\x0a\x00\x00\x00\x2c\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02\x44\x01\x00\x3b" > empty.gif
Or with ImageMagick, 43 bytes
Code: [Select]
convert -size 1x1 xc:transparent empty.gif
« Last Edit: Oct 12, 2011, 03:16 PM by Ratio »
Happiness is when you play a game so well that people call you a hacker.

There are two types of people in the world, those who can extrapolate from incomplete information.


Related Topics

  Subject / Started by Replies Last post
1 Replies
Last post Nov 23, 2011, 04:00 PM
by Eree

* Live Streams

Data retrieved in 0.08 seconds.

* Recent Posts

Site Stuff by Ratio
[Aug 09, 2018, 07:56 PM]

Canderous Ordo Trash Talk by Cates
[Apr 26, 2017, 04:49 PM]

New Voice Communicator...? by Cates
[Apr 26, 2017, 04:49 PM]

Star Citizen by Cates
[Nov 19, 2016, 05:21 AM]

Free Games by Crabbok
[Jul 29, 2016, 11:47 PM]

XCOM 2: LD50 at War! by Yuri
[Jul 14, 2016, 05:17 PM]

Overwatch by Crabbok
[Apr 30, 2016, 04:17 PM]

Disney buys Star Wars by Dargos
[Apr 14, 2016, 04:57 AM]

Free Division Clothing "Gear" Sets by Cates
[Mar 06, 2016, 08:23 PM]

Internet on the Island by Fixate
[Mar 03, 2016, 07:14 AM]

* Gallery


Views: 4430
By: Acorns