Off of streetsy's Best of 2008.
Moving time! ;-)
In case anyone wants to know, I've setup a blog on my own:
Diesen Monat laden wir zu einem Vortrag mit Max Horvath zum Thema Tesing_SeleniumDSL ein.
(English: This month's meeting features a session with Max Horvath about Testing_SeleniumDSL. Attending is free!)
Die Eckdaten:
Thema: Tesing_SeleniumDSL (Max Horvath, StudiVZ)
Wann: 3. Dezember, 2008, 20:30 Uhr
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)
Kosten: frei
(Apologies, I still don't have another blog. :-))
Just a brief follow-up to my earlier article.
A disclaimer which I should have added to my last article would include that most of my pseudo benchmarks are very subjective and also way too basic. For example, our server setup is pretty comprehensive but we have to take everything into account in order to provide real benchmark. And when I write everything I mean CPU (cores), RAM, motherboard, HDD and so on. Maybe even the throughput of the network card -- if it's different.
Having said this, here are a couple follow-ups.
1) require/include(_once) and __autoload, or "Why is __autoload() 'better'?"
A lot of people asked me if __autoload() wasn't slower than a straight include, and of course they are correct.
They are correct because when __autoload() is invoked, that is the extra overhead before anything else happens. Because inside __autoload() there's just include too. When coding a simple application the developer should be on top of the code and all its dependency, but sometimes they (or we) are not. ;-)
To illustrate my point from the above, look at the following example:
Faster:
<?php
include 'Foobar.php';
$foobar = new Foobar();
echo $foobar->hello('World');
?>
Slower:
Even faster (though not really maintainable):<?php
function __autoload($class) {
include '"{$class}.php";
}$foobar = new Foobar();
echo $foobar->hello('World');
?>
Alas, it it's not always that easy. Applications are often bigger than a single page and we sacrifice all the manual keeping track of our dependencies with calls to require/include_once or better __autoload(). :-) And because the Zend Framework is full require_once calls, which I explained that I stripped, I choose the "in this case" less expensive route with __autoload().<?php
class Foobar {
public function __construct() {}
public function hello($var) { return $var; }
}$foobar = new Foobar();
echo $foobar->hello('World');
?>
The bottom line is, that it's all about convenience.
Reasons for not being on top include that maintaining dependencies inside a framework such as Zend Framework are a nightmare. This is not about its 6 or so MB (mini release). It's also not about 20 MB -- we can all agree on the fact that disk space is cheap!
It's just that even though not all the files are loaded when we use only certain components of the framework, it does not make it easier for the developer to figure out which classes (and essentially files) are used. Not in a very straight forward way at least.
We could use Xdebug to figure out what is loaded and where - but that requires us to walk through everything our application does, revisiting this process when we add features and it overall adds to whatever we are working on already, thus taking away all the RAD benefits and reasons why we use a framework to begin with.
An alternative would be to break up the framework into components (as Zend advertises it) and to offer a PEAR-style installation. While this may not be convenient for unzip and go type of people, I'd offer it as an alternative installation method for the more knowledgable user.
2) Zend_Loader ERRATA
My loader in my last blog entry, looked like:
function __autoload($className) {
include_once str_replace('_', '/', $className) . '.php';
}
But since __autoload() is only invoked when the class is not yet present, it makes no sense to include_once, so let's use this instead (thanks for the comments and pointing that out):
function __autoload($className) {
include str_replace('_', '/', $className) . '.php';
}
3) Caching database results
In my first article, I talked about how you have to avoid queries and cache whatever is possible. Despite people raving about caches, all the awesomeness aside, caching also has major drawbacks which are almost always excluded and overlooked.
I'm sure many people know Cache_Lite, and have seen one or two examples - it's super nice and simple.
In a nutshell:
- set a TTL
- pull from the cache if it exists
- populate the cache if it doesn't
- Cache_Lite also takes care of deleting the cache when the TTL ran out
What none of those examples talk about, is the issue we run into when you a user added something and the cache is not invalidated by your application. For instance, if I decided to cache our user's data today, it would double or tripple our support volume in an instant. Just because our customers don't know or care about the resources consumed by a database query. What they care about is their data, and they want to see it now.
In this case the web2.0 is not too helpful either. Applications are online, available at all times, instant, very responsive and the data is live. ;-)
So far so good -- no! No?
Well, we need to keep in mind that a file based cache with a high traffic application might not be exactly suitable anyway ("How slow is the disk?"). So we may need to setup a RAM disk, if you got plenty of it and put your cache on there or look for a maybe more robust solution such as memcached.
To back up my claim and in an effort to normalize my blog post, go head over to Tillate (a pretty awesome name, if I may add). They just posted a blog entry going into detail on the obstacles you run into when you cache content and why you should consider outsourcing some of your caching to the clientside.
Last but not least -- two things.
- To re-iterate on my disclaimer, make sure to actually put a meter on things so you can measure an improvement and not guess ("I think it's faster."), because that is the worst. For an example of what I mean, please check out Sander van de Graaf's blog.
- We need to keep in mind that the added complexity needs to be taken into account -- during development (for example replicating the setup for development and staging) and also for maintainance (more services, more sorrow).
4) Zend_Db
I'm not trying to pick a fight with Zend_Db, but I can't really avoid talking about it. Aside from general shortcomings in the implementation, my number one issue currently is that the Mysqli driver prepares all queries that are send to the MySQL server.
Let aside all the pseudo security (I believe each value is quoted anyway before prepare is used), they are slow and also suck because MySQL's query cache currently cannot deal with them. Even though Bill Karwin pointed out that MySQL 5.1.x will eventually solve this problem, we are still stuck in the here and now.
The quickfix would be to shortcut all queries and inject directly into (I think) exec(), which would avoid the internal prepared statement but this leads to getting rid off it all together, which means we throw aboard all the convenience offered through insert(), update(), delete(), fetch*() and so on.
I have a patch to basically add in a fake Zend_Db_Adapter_Mysqli_Statement class (Did I get the name right?), which would basically mock mysqli_stmt but this is not a solution that would go into the framework. I'm attempting to find a more general solution, but don't hold you breath.
5) Zend Framework
Zend Framework is both amazing and frightening at the same time. With all the feedback I should add that it's semi-open-source (open source backed by a company), so you can always report bugs, (sign a CLA to) supply patches and maybe even get access to commit code to the official repository.
With (of course) no offense meant, there are a few things that lack severely currently:
- Good coding guidelines vs. overengineering - I wonder who's gonna optimize all the weirdness for 2.0 that is apparently required by CS/design. While engineering in general is a good thing (TM), various people have already pointed out that some of the components (or helpers, validators, etc.) tend to be overengineered - for example, when you load a new class (and essentially file) to apply strtolower() on a variable.
- Code review - I feel like
oftensometimes components are promoted to trunk and not enough people looked at it before that has been done. For a great example of peer code review, let me pimp PEAR again because while we are sort of agressive on our coding style (and also tend to drive people crazy or at least provoke flamewars), all the feedback you get during the proposal phase is worth so much.
You don't learn all that in school, open source gives it away for free. - Even though the Zend Framework is advertised as a glue framework (you just use what you need), many components have hardcoded dependencies - an example is my favorite Zend_Loader. Especially in case of the Zend_Loader there are two things I'd like to see -- one is allowing the user to override the loader used in any component, two is replacing all the require_once calls with the loader since by default it tends to be the less expensive operation anyway.
- Maintainers - some components seem to be rather unmaintained and it seems there is little/no communication that is visible for everyone on the outside. Not very open-sourcy. :-P
Also, various people who contributed to the framework initially, have left Zend's payroll and their work is sort of orphaned (so it looks).
There's just not enough people who have time to actively contribute to core components currently. And in a way, despite a company backing the code, the Zend Framework suffers from the same problems any opensource project has.
Last but not least -- I don't hate the Zend Framework, or frameworks in general. ;-)
(Despite some other people, who's blogs I like to read and find very entertaining.)
(Sorry, German-only content. But if you are in the area (Berlin, Germany) this week (2008/11/05) feel free to drop by. We can always arrange talks in English or translate. :-) Attending is free!)
Im November laedt die Berliner PHP Usergroup zu einem Vortrag ueber CouchDB ein.
Die Eckdaten:
Thema: CouchDB (Jan Lehnardt)
Wann: 20:30 Uhr, 5. November, 2008
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)
Kosten: freiRSVP: Facebook, Qype, Mailinglist
[ EDIT: I've posted a follow-up to this article (part dos)! ]
Let me first start off by saying that the Zend Framework has been very good to us.
It enabled us to build a kick-ass application in a relatively short amount of time. On top of that, we followed the conventions from Zend and PEAR and essentially have a very maintainable piece of software which I don't hate looking at every day (which is as one can imagine, a huge plus).
The other day our servers were overwhelmed with the rising traffic and I started profiling my application through Xdebug. Initially I tried to use Zend Studio and the Zend_Debugger but Zend doesn't like my (awesome) operating system (FreeBSD) and only provides Linux and Windows extensions. Xdebug, while being free and awesome in general, doesn't know this prejudice. :-)
On this project we currently run with 100,000 visitors per day on average, our peak is Sunday night where we get a ton more traffic than usually. We run the latest PHP (5.2.6 at this time), etc.. The software comes from FreeBSD ports, there are no magic secret patches. I'm picky about the modules I compile and load but the list is far from optimized.
To our defense, we just relaunched over summer and since we are a team of four total and only two of those four people write code. Since we started off slow with 60,000-80,000 visitors per day since summer, we never really had a chance or need to optimize and tried to avoid all premature optimization.
We currently use 50-some Zend-classes. I wish I could provide a better number, but as you may know, the Zend Framework is only bundled as a whole and figuring out which classes are all in the mix is tricky. So the 50 is an estimation based on grepping through our own class code mostly.
On the server we run Apache 1.3. Currently we have a total of four webservers (two older (dual core, 6 GB RAM, slow disks (7.5k rpm)), two newer models (eight cores and 6 GB RAM, faster disks (15k rpm)). The backend consists of two powerful workhouses with eight cores, more RAM (than the frontends) and a lot of disk (at 12k rpm each).
Prior to starting the quest for performance one of our older servers was able to handle ten (10) requests/second at peaks, now we are at 42 requests/second. (Give or take a few.) In regard to page loading time, we went with a few optimizations from 340 ms to 76 ms in no time (all figures according to Xdebug). So I feel like we are right on the right track to Getting rich with PHP. (Where's my Lexus at? :-))
We benchmarked using Apache Bench (at moderate ab -n 1000 -c 100 http://url) and Siege, which are both really awesome tools and provide you with an instant DoS attack on your servers. I might add that you are better off running those tools from "localhost" vs. remote as you might trigger your providers IDS/snort/DoS protection otherwise.
Here are a few things, that helped us. Suggestions are in no particular order and I should add that whatever is applicable for my suitation, doesn't have to work for you. Also, my number game could be off and if you have suggestions on how to improve, please comment or drop me a line.
1) APC
It cannot be stressed enough. Please run APC. Please take into account to adjust the default settings, also check apc.slam_defense, apc.write_lock and apc.stat.
We had APC before, but I felt like I needed to mention it on the list of things.
Also, apc_fetch, apc_store etc. are great ways to add little caches throughout your application.
And they almost require zero time to implement. I may suggest you use apc_fetch/apc_store directly vs. wrapping the Zend_Cache layer around it which provides (IMHO) little added value and benefit but just adds more class code around the obvious.
2) Adjust PHP's realpath.cache setting and .ttl
This helps, somewhat.
3) Get rid off require_once, use __autoload (and the Zend_Loader)
This might be a hassle when during development, because require_once evaluates each include, thus letting you know if it finds a parse error, and also where.
With include_once (which Zend_Loader essentially uses), it's a bit tricky at times. A good idea here would be a phing task (or some other script) which strips out or replaces require_once when you deploy your application to production.
Removing require_once in favour of __autoload shows one of the biggest performance improvements in my entire application - I shaved off roughly 220 milliseconds by removing about 15 (or so) calls to require_once in my bootstrap.php file. And that's with APC enabled, and a decent sized realpath.cache (and .ttl).
Beyond weird coding conventions (I shall bitch about those in another blog post), require_once is also the number one performance killer from the entire Zend_* code base. The before/after is amazing. Without any of those enhancements from the list just by stripping out require_once from our ZendFramework "install", we went from 9-10 requests/second to 27 requests/second.
Use the following shellscript to strip them:
3) Zend_Loadergrep -rl require_once . | grep -v svn |grep -v Loader | xargs perl -pi~ -e 's/require_once/#require_once/'
I know, I just recommended using the Zend_Loader but with no offense, the Zend_Loader sucks is not so great when it comes to general performance. Obviously I did not write it and really no offense meant, but it does some really weird stuff on the inside which I am not sure what the use-case is. But I am sure there is one. ;-)
In order to preserve the API, I extended Zend_Loader and started overwriting functions such as Zend_Loader::_securityCheck(), which runs a regular expression on the name of each file you feed to __autoload/Zend_Loader.
On top of that I switched to usig the Zend_Loader only for models and controllers. But not for Zend_ and Company_-classes. Since Zend (and we) essentially follow the great PEAR coding standard in regard to one class per file and a very explicit naming scheme, all you have to do in your __autoload is the following:
function __autoload($className) {
include_once $className = str_replace('_', '/', $className) . '.php';
}
Now, that would be the bare minimum and our loader looks slightly more complicated but I haven't stopped there and we are still in the process of "dumbing" it down even further, but so far it saved us between five and 15 ms per page.
4) Cache DB results, avoid queries!
Those tricky, tricky DB queries.
Even though our DB backends idle mostly even when we get beat with traffic, there's a few things to keep in mind.
One of them is - DB queries are really expensive. And by queries I am not talking about the "SELECT * FROM foo"-part, but rather about opening a connection to another server, sending the query, receiving it and so on. Let alone by caching one of those, we roughly gained another 20 ms on the frontpage. And it's not a very complex query either.
I remember looking puzzled when ahem... I was presented with the code that pulls a status message on each request to the homepage but I had forgotten about this already and just noticed it again when it popped up in xDebug with a notable amound of milliseconds.
5) Zend_Db_Table
Zend_Db_Table is very easy to use, in fact most of our models wrap around a couple tables and that's why we got a bunch of them. Now what I did not realize (but thanks to JamesG@#zftalk now I do), is that the meta data the class uses to provide all those nifty interfaces is generated on each request. That's a DESCRIBE TABLE in the background, which is pure overhead.
Zend_Db_Table_Abstract::setDefaultMetadataCache() to the rescue.
5) Apache
5a)
I sometimes hate Apache, but I also can't live without it.
Over the past years, I have tried all sorts of things in the webserver market - Lighttpd and nginx with php-cgi (fastcgi) seem to be no fun. A commercial solution such as Resin or Zeus has never been an option either.
I've always come back to Apache (1.3) for the simple fact that Apache and PHP are really so tightly integrated that nothing ever will go wrong.
Remember that guy Nik who claimed that Apache/PHP sometimes fail and deliver the sourcecode to the browser (because Facebook obviously failed to configure Apache)? Well, that doesn't happen - ever. The only problem with Apache is that Apache and client(browser)-communication is a bitch.
Nginx to the rescue! Fast install, easy to configure (don't let the Russian FAQ scare you, the nginx.conf-dist will teach you all you need!) - just chain your Apache to localhost:8080 and let Nginx proxy all requests to it and your Apaches move from "lockf" status, to "run" and "accept" always.
Whenever Apache receives a request of a slower client, it will have to wait until the slower client is done reading all of the response. While waiting, your 30 MB Apache sits there unable to do anything else. Which nginx in the mix, the Apache sends the response as fast as it can to nginx, thus having more time to take care of what it's supposed to do for you - PHP.
Judging from my poor benchmarks, nginx adds to the number of requests by factor six or seven (6 or 7). It's amazying and I never expected it to have such a great impact. It also doesn't eat away on resources, so beware of the Russians! :-)
Take all "optimizations" into account, Apache 1.3, proxied by Nginx can now handle over 3000 requests/second (ab -n 10000 -c 1000 http://url).
5b)
The obvious quirks, for example check out your default Apache install and unload all the modules and extensions you never use anyway.
For example, we don't have any of those HTTP authentication boxes ever. So why do we need *_auth_* modules. Then, we don't use a user_dir, why load mod_userdir, our Apache does not log - why load mod_config_log, or my most favorite: mod_status.
Make sure mod_status is really disabled because otherwise that's one very, very expensive operation you got right there, with each request.
A good idea is to check top, unload, and look again:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
42601 root 1 96 0 7092K 2324K CPU1 1 0:00 1.00% top
36242 www 1 20 0 114M 33304K lockf 1 0:22 0.29% httpd
38251 www 1 20 0 114M 32184K lockf 0 0:08 0.29% httpd
42579 www 1 20 0 114M 28016K lockf 1 0:02 0.29% httpd
37975 www 1 20 0 115M 34688K lockf 1 0:18 0.24% httpd
36344 www 1 20 0 115M 34036K lockf 1 0:18 0.24% httpd
Take into account (thanks, Jan), that the size column, is not the real size. It's like the theoretical size including whatever the Apache could use if it had to, but you want to look at RES (resident) instead. Because that's what's in the memory right now.
Another smart move is to put all the rules from .htaccess into your server configuration because otherwise Apache searches for various (!) .htaccess files with each requests and tries to evaluate the rules you have in there.
Imagine this:
In this request Apache will look for .htaccess in the following directories:/htdocs/foo/bar/index.html
Turn it off (AllowOverride none) instead and move on because when you move all your directives into httpd.conf (or similar) at least they get evaluated once when the server process starts up./htdocs
/htdocs/foo
/htdocs/foo/bar
From a deployment perspective it's nicer to have .htaccess because all you need to do is re-deploy one file vs. editing a server config and restarting a server, but this really pays off. With certain APC settings, you will need to restart the server anyway, also, "No pain, no gain!".
(Sorry, German-only content, please ignore. Summary: October, November, December meetings of the Berlin PHP Usergroup are all planned. If you are in the city, pay us a visit! Google Maps link and dates are below, the meeting starts at 8:30 PM.)
Die naechsten/letzten Treffen der Berliner PHP Usergroup in 2008 sind geplant. Wir wuerden uns ueber Teilnehmer und neue Gesichter freuen.
1. Oktober, 2008
Thema: Webentwicklung mit dem Zend Framework (Thomas Lohner, SysEleven GmbH)
Wann: 20:30 Uhr
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)
RSVP: Facebook, Qype5. November, 2008
Thema: CouchDB (Jan Lehnardt)
Wann: 20:30 Uhr
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)3. Dezember, 2008
Thema: Tesing_SeleniumDSL (Max Horvath, StudiVZ)
Wann: 20:30 Uhr
Wo: Z-Bar, Bergstr. 2, Berlin-Mitte (Google Maps)
Mailingliste: https://mail.einsnull.com/mailman/listinfo/bephpug (low traffic)
Wiki/Website: http://bephpug.de/
(Disclaimer: I'm sorry, this is sort of lame but I had to blog about it anyway. Because I need to write down stuff in order to remember it. Because I had to look this up at least 10 times now. And because I always need to debug/search for the same info (which brings me back to my first 'because'). And because I still don't understand why OpenX/Ads doesn't have a simple user table. :-))
OpenAds 2.3.x-beta, 2.4
Table: oa_preference
Columns: admin, admin_pw, ...
Depending on your install, the table name might have another prefix than "oa".
The password is a simple md5 hash, which means you don't need anything funky to replace it in the database.
Walked the bridge a bunch of times. Had to leave my summer sublet last week. But I'll be back sooner or later.
More bridge pictures, are in my NYC BRIDGES set on Flickr.

