DPS Computing Uncovers Solution to OS X ‘Superbug’
Following numerous hours of investigation by DPS Computing into the OS X ‘superbug’, which causes repeated kernel panics over and over again with apparently no common cause, we have uncovered a solution that, in testing, has proved to work and correct the critical bug in the OS X operating system, which Apple have been attempting to deal with for the past 18 months.
Now firstly, there’s a couple of points that we need to make. One, it’s doubtful that this is the only solution, or necessarily the best, however, our tests have shown that it should work in a lot of circumstances, if not the majority.
As some of you will remember, we reported two days ago on the plight of MacBook Pro users experiencing kernel panics persistently out of the blue. Following in depth research into this issue, we discovered that Apple initially for six months denied that the problem was cause by Apple hardware or software, instead indicating that it must be third party software and hardware that users are adding to their machines. Around 12 months ago, Apple appear to have finally conceded that there was a problem with the aptly named ‘Black Screen of Death’ (silent kernel panics) was due to a low level hardware and OS X bug.
It was initially indicated that a fix for the bug would be available in the 10.7.2 update to the OS X operating system. Users eagerly anticipated this update, to free them of this catastrophic ailment of their once mighty machine, however disappointment was to come. 10.7.2 was released, everybody updated and……… nothing changed.
Many users vented further frustration in the Apple community on their support forums and again another announcement was made by Apple on the support forums that this was likely being caused by a firmware issue, which it was promptly announced would be upgraded in the near future. Firmware update 2.6 was this time supposed to be the saviour of the MacBook Pros all over the country experiencing this weird and disabling bug in Apples flagship operating system. But….. you guessed it. Firmware update 2.6 came, and it went. And still, things were no better.
As mentioned in our previous article a solution, albeit temporary, that seemed to work was the use of the freeware program gfxCardStatus, which, when used to stop the dynamic switching of the graphics card, seemed to work. The main issue with this fix was that, to have a chance of temporarily fixing the issue, you had to restrict the Macs capabilities to using on the integrated Intel graphics card, which is the least powerful of the two graphics cards shipped with the MacBook Pro (the other external card being the nVidea card). Many users were however, happy to take a performance hit as they were absolutely desperate to stop the excessive kernel panics, which rendered their £2,500 Macs into little more use than a £2 paper weight.
This temporary solution, which of the many shared in the community support forums provided by Apple is without doubt the one with the highest success rate.
We tested out the gfxCardStatus fix on one of our machines and low and behold, it worked. OK, so it wasn’t perfect, but we knew that we’d all be able to wait a little while for Apple to sort the problem out.
However, when we continued testing this solution over the following two days, there appeared to be a flaw in the solution which would render it useless, at least in some cases – it is not clear whether what we experienced would happen in all the other cases, but it is a safe bet to say that it is not an isolated phenomena.
After 24 hours of uptime, with the gfxCardStatus solution implemented, we rebooted the MacBook Pro. OK, so nothing drastic about that. The Mac booted up fine and we logged in as per usual. gfxCardStatus had reverted to dynamic switching, so we changed it back to the ‘Integrated only’ option, as it had been previously set prior to the reboot. And then, the display went ‘crazy’. Remember the slider puddles you used to play with as a kid? Well that’s exactly what OS X did with our screen. It was cut up into tiny pieces, mixed up, and put back together. And as if that wasn’t enough, the OS decided to throw in some ‘interference’ for good measure – the kind that you used to see on Analogue TV when you weren’t quite tuned into the channel correctly.
So a couple of reboots later and lots of tinkering we finally manage to get back on to the ‘Discrete only’ option using the Nvidea card and the display returns to normal – great, except we’re back to where we started now. But the problems are much worse. The integrated graphics card plays havoc with the display, rendering it useless. And the discrete graphics card works perfectly in 5 minute intervals before promptly and timely causing a kernel panic and the Black Screen of Death (similar to the Blue Screen of Death in Windows – apart from this one doesn’t have any text!)
Hmmm, so we we’re a bit unhappy at this stage as you can imagine. Rather than settling for this temporary solution, which although good, was by no means good enough for any machine that would be used in a production environment, we decided to set about finding a more temporary solution – seen as though Apple are showing few signs that they are any closer to solving the problem than they were 18 months ago.
One recommendation from Apple was to run the AHT – Apple Hardware Test. This would indeed be useful as it would allow us to identify whether the panics are being caused by hardware or software – and in this case, software referring to OS X. Great, we started to follow the recommendation. So, firstly we are informed that we need to shut down (and it must be a shut down) followed by a power up (not, I repeat not a reboot!). Before the Apple logo appears we were told to hold down D. This would start the AHT for us.
So, I followed these instructions and I did it. While waiting for the magic to happen, I did question what holding the ‘D’ key would do. Normally there is another key used to create a interrupt signal or similar to the OS to tell it we don’t want a normal boot. So I was sat there holding the ‘D’ key with one finger while using my other hand to read more on the issue on an iPad.
5 minutes passed and we’ve gone from a black screen to an off white screen. I’m still holding the ‘D’ key. And I’m still reading about the problem on the iPad. A couple more minutes passed, by which stage my finger had turned a shade of bright red and my arm was crying out for a rest.
Another couple of minutes, and we’ve got a log in screen. The same old login screen as normal. Not the special 16bit lookalike computer icon that was indicated in the instructions. So we tried a couple more times, same result. Changing how and when the key was pressed each time.
Convinced that this advice seemed a little bit woeful, we tried Cmd-D upon boot. Same effect, nothing. We do a little more (soul) searching on the iPad and I discover an excellent piece of information, which works for any users of Lion (10.7.x). Excellent, so this isn’t the AHT test via your own hard drive or DVD, this time you hold Option (Alt) + D which boots the online AHT from Apple.
We follow through the same steps using the new key combination and ta-da!! It starts to load the AHT. Just before I can crack a smile…… we gather around the MacBook and inspect the stop error that has just been displayed saying ‘AHT cannot run on this system’.
Great, normal reboot is initiated and I plan to use my 5 minutes of uptime to read a bit more advice from Apple. The reasons for this error are a) out of date version of OS X – i.e. you need to upgrade to 10.7.4 and b) out of date firmware – i.e. you need to upgrade to 2.6. The only problem with this advice and reasoning for the error is that the machine was running both OS X 10.7.4 and the 2.6 firmware update. Two pieces of advice from Apple, and two dead ends.
So with a bit more digging on AHT we discover its convoluted and apparently unnecessary complexity. AHT has been included in the past few major versions of OS X. There is also a version on all new MacBooks over the past few years. However, if at any time during your ownership of your MacBook Pro, you have upgraded to a new version of OS X (a new major version), then AHT won’t be there any more. But if you’ve got your copy of OS X on DVDs AHT will be present on Disc 1, that is unless you have Snow Leopard in which case it’ll be on Disc 2. Keeping up so far? We’re not done yet. However, if say for example you upgraded online (via the AppStore) to Lion, and your using your original Snow Leopard discs, it’ll probably still work, however some Snow Leopard files will be copied across during the AHT process, which is completely unnecessary and it may make Lion go wonky. And finally, the logical solution to the previously mentioned problem would be to use the AHT included in Lion on the disc. Well, no, as you can see from this entire situation, absolutely nothing is following logic. If you have the Lion DVD, great, but you don’t have AHT on it. And if you downloaded it from the App Store, you definitely don’t have AHT on it. One solution to ‘burn the image to a disc’ was evidently never going to work as the App Store version of OS X isn’t provided as an image, it’s an application. Next, the reason why Lion doesn’t include AHT is because its all changed to exclusively the online AHT from Lion onwards. So, if you haven’t got an Internet connection, you would probably start crying at this point. But don’t worry, many more of us will be crying along with you now as we then discover the revelation that the online AHT online works if your MacBook Pro originally shipped with Lion, upgrades don’t count.
So after a couple of hours of a fruitless AHT search we give up on that. It might not even necessarily give us any answers, so we move on.
Now we start investigating tools…… no, not that kind of tools (although you could be forgiven for thinking that with the runaround Apple is giving the community), software tools.
As experienced Mac users will know, there isn’t the plethora of free software maintenance and utilities tools that there is available for Windows. The apparent reasoning behind this is that Macs don’t break, well at least not until now. There’s a few paid for tools, some of which tease you into a ‘free download’, perform a scan, declare you MacBook a write off and offer to fix all your problems with 10 minutes and a click of a button – after you pay them £50-£75.
So theres, ToolKit, ToolTip, Tool….. something anyway. I can’t quite remember the exact name off the top of my head now but it’s the $99 pro version we were looking at (the cheaper sister product, the Deluxe version is given to AppleCare customers free of charge, apparently because it’s ‘that good’). One of the Pro versions selling points is that it goes ‘much further than Apples Hardware Test’. Great, no need to worry about the AHT saga now! But, the price tag… well we’ll keep this in reserve in case we get even more desperate than we already are.
Free system utilities, well there isn’t many. The only one that we found to be of any potential use in this situation was OnyX, which has many positive reviews and endorsements across the Internet.
In my complete an utter desperation at this point, I concede that I’ll just have to go through the 5 minute kernel panics for the next 25 years (or at least until we replace the affected MacBook Pro) and decide the system could do with a bit of a general major cleanup anyway, after all, what more harm can it do?
I thought’d I’d make that stand out for the non techie users who just want it fixed and don’t care why, how or the amazing adventure we embarked upon to get to this point.
So, reboot, hold down the shift key on the keyboard, and we get it into ‘Safe Boot’ (similar to Windows Safe Mode). And yes, most of your things will be disabled, including all non Apple start up items. But on the plus side, the system runs a lot faster.
So, we then start OnyX. You’ll be greeted by a dialog asking you to check S.M.A.R.T status, which it highly recommends. And so do we. It should only take a few seconds, and then we’re onto the next stage, assuming there is no problems. If you do encounter problems at this point, follow the repair instructions given by OnyX.
All being well and good, you’ll be greeted by a second dialog box. This time its to verify the integrity of the start up volume. Again this highly recommended by OnyX and by us. This is the same process that ‘Disk Utility’ follows to check the start up volume integrity. If any errors are shown, it’s best to reboot into the recovery area, run Disk Utility and repair the start up volume and permissions. Not doing so, when errors are shown, and then using the tools in OnyX can, for want of a better term, ‘brick’ your system, and believe me, that is not good. Assuming everything is fine, we can then carry on.
We’re then greeted by an Administration authorisation dialog box. This allows us to give OnyX permission to do its magic. Enter your username and password.
Then we arrive at the main menu:
Click on the cleaning option, at which point you should be presented with a screen similar to the following:
On this first tab, the system tab (shown above) ensure that all the options are ticked in this ‘Delete the cache’ section. By default some are left unticked. Tick them all. Don’t worry, a cache is just temporary files that can speed up common tasks and applications after you’ve used them a few times. Their removal will not harm your system. Click execute and wait for the function to complete.
Then move onto the User tab:
Select everything again. As above, nothing bad is going to happen (lets be honest, things can’t really get any worse can they). Click execute and wait for the function to complete.
Move onto the Internet tab:
Now some people are attached to their cookies, browser history etc. If you are, then don’t worry, you don’t have to do anything on this tab. But equally, you could also take this opportunity to sort out the thousands of temporary files for your different browsers, a good many of which won’t be used any more. Either way, if you decided to delete them, your system is going to be fine. Click execute and wait for the function to complete or move onto the next tab.
Move onto the Fonts tab:
Again, tick all the options and delete all the font caches. Ignore the doomsday warning bout Apps taking usually long to load the first time after clearing all these caches. Yes, Apps will take a bit longer to load the first few times…. however, if we don’t follow these steps, the chances of us ever using any of our Apps productively again is minimal. Click execute and wait for the function to complete.
Close OnyX and empty the trash. Don’t worry at the amount of files that are getting junked now. We had over 100,000 – this is normal if you’ve been using your MacBook a long time and / or you haven’t cleared your caches recently (or ever).
Reboot out of ‘Safe Boot’.
The boot and log on will take a little longer than usual. As will the loading of your different apps for the first few times. But…… the good old persistent kernel panics should now have resolved. No need to set the gfxCardStatus settings, feel free to have dynamic switching on – you shouldn’t have any more problem and this setting does give the best performance and power consumption for your system.
Basically, corruption can occur in caches (especially system and kernel caches, although it can be others) which can cause persistent kernel panics. Cleaning your caches every once in a while is a good idea to maintain performance anyway. Delete the cache, and you delete the corruption. Our test machine that we used has stopped having 5 minute kernel panics and, touch wood, there hasn’t been any since.
This particular problem seems to be particularly affecting mid 2010 MacBook Pro 15″ models (manufactured between April 2010 and February 2011) running OS X Lion (10.7.x). It isn’t clear why this very specific model (6.2) would be affected in particular, maybe it is just coincidence.
Either way, this fix should work if you are experiencing the same problem, irregardless of model.
The only one, very minor, temporary downside, is that due to the caches being cleared the first boot and the first few times starting each application are going to be a bit slower. But it’s a price worth paying for actually turning your MacBook Pro back from being a paper weight into being the excellent productive machine that it should be! This is by the far the best and most permanent solution to the problem out there currently.
At one point, David, our Managing Director, was so disheartened he was contemplating replacing all the office Macs with PCs (perish the thought!). But crisis averted, with next to no help from Apple ;). Reading the community forums you can see that Apple support and the ‘Genius’ bar weren’t much help to users who accessed their services.
Faith in Macs restored, lets continue with the productivity! Hopefully it’ll be at least another 20 years at least before another critical bug occurs in Mac OS!