Re: Hacker FAQ (please comment and help fix)
Article: 7609 of alt.hackers From: buhr@stat.wisc.edu (Kevin Buhr) Newsgroups: alt.hackers Subject: Re: Hacker FAQ (please comment and help fix) Date: 07 Apr 1995 02:28:13 GMT Organization: Statistics Department, University of Wisconsin---Madison Lines: 127 Approved: buhr@stat.wisc.edu Distribution: world Message-ID: BUHR.95Apr6212814@mozart.stat.wisc.edu <3m1g2g$bbj@news.Ieunet.ie> NNTP-Posting-Host: mozart.stat.wisc.edu In-reply-to: nh@Ieunet.ie's message of 6 Apr 1995 20:40:00 +0100 X-Followup-to: the grammatic school of death... X-Troll: and if you correct *my* troll, you're really in trouble Status: RO
In article <3m1g2g$bbj@news.Ieunet.ie> nh@Ieunet.ie (Nick Hilliard) writes: | | Are you mixing tenses? If you use 'showed', you should use 'found'. | | : I also must install the SCSI controller and drivers, and me without my | : screwdriver. | | 'me without my screwdriver' is not a valid sentence. And if it were, it | would be 'and I without my screwdriver'. | | Rule one of nitpicking: make sure that *you're* correct, too :-) You and your two English-major friends were trolled. And you misspelled "your". Hope this helps! Kevin <buhr@stat.wisc.edu> * * * Okay, this is why I REALLY posted... The following hack is unnecessarily long and boastful, but indulge me because I'm proud of it. Thanks. ObHack: There was a bug in Linux-AFS that had been around for several versions. Periodically and fairly unpredictably, a random doubleword in the kernel stack would be decremented, with wild and crazy results. Unfortunately, the actual decrement site (as opposed to the "crash site") would usually be narrowed down to a call to the scheduler, where just about anything could have happened. Well, I finally stumbled across a directory of files in AFS space that, when copied en masse to another directory would almost inevitably cause a fatal stack decrement. Finally, a reproducible manifestation! Now, the thing is, we don't have a source license for AFS here, so drastic measures were called for. When I finally got my hands on a version of "objdump" that supported the "--disassemble" option, I created a dump of the entire loadable module (4 megs worth) and set about tracking down the exact decrement instruction that was causing me all this grief. I ended up taking the disassemblies of various AFS functions and building a larger and larger loadable kernel module that would hotpatch AFS into calling my (modifiable) versions of the functions. At each point, I would put little snippets of code around all the calls in each function to figure out which call (or which chunk of code between calls) was decrementing my poor, helpless stack. In the end, I had moved over about a dozen nontrivial functions into my loadable module, and had the following fatal calling sequence: afs_linux_notify_change (pushes %ebp at very start of routine) afs_setattr afs_WriteVCache RXAFS_StoreStatus xdr_AFSFetchStatus xdr_u_long xdrrx_getlong rx_ReadProc osi_Sleep interruptible_sleep_on (decrements the saved %ebp value) Linux kernel afficiandoes may already be laughing, since "interruptible_sleep_on" calls the scheduler. So, I was back where I had started. But not quite... I hadn't gotten much sleep, and unlike my cool hacker buddies, I'm severely effected by more than a couple 24 hour periods with less than 2 or 3 hours sleep each. I also had an "Assignment from Hell" due that Monday, so I decided to put things on the backburner. But, of course, if you're like me, you can't put anything on the backburner either. I ended up, on a whim, taking a break from my assignment and spending what amounted to less than an hour patching my kernel to allow a kernel module to turn on the debugging registers on a "system-wide" rather than "process-wide" basis and then trap and die on a resulting SIGTRACE instead of trying to continue. By turning debugging on just before the dastardly call and right off afterwards, I could generate a trap at the very instruction doing the decrement. What do you know? After a few false starts, it worked. Please share with me the triumphant message I immediately mailed which, in my excitement, I misaddressed: > Date: Mon, 6 Mar 95 16:53:12 -0600 > To: wardlord@mit.edu > CC: linux-afs-bugs@mit.edu, jtk@atria.com > Subject: I GOT IT!!!! > From: buhr@stat.wisc.edu (Kevin Buhr) > > Okay... > > My kernel is hacked to bits, but---as promised---I have the exact > kernel code instruction (called by the process "afsd") that decrements > the "ebp" value! > > A dump follows: > [ dump and call trace deleted ] > > The relevant code (pardon the crudeness of the objdump output): > [ dull instructions deleted ] > > 00016280 <_crfree+4> movl 0x8(%ebp),%ebx > 00016283 <_crfree+7> decl 0x8(%ebx) ; GOTCHA!!! > 00016286 <_crfree+a> cmpl $0x0,0x8(%ebx) > [ dull instructions deleted ] > > Kevin <buhr@stat.wisc.edu> I was very, very, very proud of myself, and Derek <warlord@mit.edu> found the problem in no time and had a brand spanking new module ready in a day and a half. And, that was the day my Linux box stopped trapping! It didn't stop *hanging* until I hacked the WD80x3 network driver so it'd force my 83905-based network card into shared memory mode, but that's a hack for another day... Kevin <buhr@stat.wisc.edu>