Sam Trenholme's webpage
This article was posted to the Usenet group alt.hackers in 1995; any technical information is probably outdated.

Re: Hacker FAQ (please comment and help fix)


Article: 7609 of alt.hackers
From: buhr@stat.wisc.edu (Kevin Buhr)
Newsgroups: alt.hackers
Subject: Re: Hacker FAQ (please comment and help fix)
Date: 07 Apr 1995 02:28:13 GMT
Organization: Statistics Department, University of Wisconsin---Madison
Lines: 127
Approved: buhr@stat.wisc.edu
Distribution: world
Message-ID: BUHR.95Apr6212814@mozart.stat.wisc.edu
	<3m1g2g$bbj@news.Ieunet.ie>
NNTP-Posting-Host: mozart.stat.wisc.edu
In-reply-to: nh@Ieunet.ie's message of 6 Apr 1995 20:40:00 +0100
X-Followup-to: the grammatic school of death...
X-Troll: and if you correct *my* troll, you're really in trouble
Status: RO

In article <3m1g2g$bbj@news.Ieunet.ie> nh@Ieunet.ie (Nick Hilliard)
writes:
|
| Are you mixing tenses?  If you use 'showed', you should use 'found'.
|
| : I also must install the SCSI controller and drivers, and me without my
| : screwdriver.
|
| 'me without my screwdriver' is not a valid sentence.	And if it were, it
| would be 'and I without my screwdriver'.
|
| Rule one of nitpicking:  make sure that *you're* correct, too :-)

You and your two English-major friends were trolled.  And you
misspelled "your".  Hope this helps!

Kevin <buhr@stat.wisc.edu>

			*	*	*

Okay, this is why I REALLY posted...  The following hack is
unnecessarily long and boastful, but indulge me because I'm proud of
it.  Thanks.

ObHack:

There was a bug in Linux-AFS that had been around for several
versions.  Periodically and fairly unpredictably, a random doubleword
in the kernel stack would be decremented, with wild and crazy results.
Unfortunately, the actual decrement site (as opposed to the "crash
site") would usually be narrowed down to a call to the scheduler,
where just about anything could have happened.

Well, I finally stumbled across a directory of files in AFS space
that, when copied en masse to another directory would almost
inevitably cause a fatal stack decrement.  Finally, a reproducible
manifestation!

Now, the thing is, we don't have a source license for AFS here, so
drastic measures were called for.  When I finally got my hands on a
version of "objdump" that supported the "--disassemble"
option, I
created a dump of the entire loadable module (4 megs worth) and set
about tracking down the exact decrement instruction that was causing
me all this grief.

I ended up taking the disassemblies of various AFS functions and
building a larger and larger loadable kernel module that would
hotpatch AFS into calling my (modifiable) versions of the functions.
At each point, I would put little snippets of code around all the
calls in each function to figure out which call (or which chunk of
code between calls) was decrementing my poor, helpless stack.

In the end, I had moved over about a dozen nontrivial functions into
my loadable module, and had the following fatal calling sequence:

afs_linux_notify_change (pushes %ebp at very start of routine)
afs_setattr
afs_WriteVCache
RXAFS_StoreStatus
xdr_AFSFetchStatus
xdr_u_long
xdrrx_getlong
rx_ReadProc
osi_Sleep
interruptible_sleep_on (decrements the saved %ebp value)

Linux kernel afficiandoes may already be laughing, since
"interruptible_sleep_on" calls the scheduler.  So, I was back where I
had started.   But not quite...

I hadn't gotten much sleep, and unlike my cool hacker buddies, I'm
severely effected by more than a couple 24 hour periods with less than
2 or 3 hours sleep each.  I also had an "Assignment from Hell" due
that Monday, so I decided to put things on the backburner.

But, of course, if you're like me, you can't put anything on the
backburner either.  I ended up, on a whim, taking a break from my
assignment and spending what amounted to less than an hour patching my
kernel to allow a kernel module to turn on the debugging registers on
a "system-wide" rather than "process-wide" basis and
then trap and die
on a resulting SIGTRACE instead of trying to continue.	By turning
debugging on just before the dastardly call and right off afterwards,
I could generate a trap at the very instruction doing the decrement.

What do you know?  After a few false starts, it worked.  Please share
with me the triumphant message I immediately mailed which, in my
excitement, I misaddressed:

> Date: Mon, 6 Mar 95 16:53:12 -0600
> To: wardlord@mit.edu
> CC: linux-afs-bugs@mit.edu, jtk@atria.com
> Subject: I GOT IT!!!!
> From: buhr@stat.wisc.edu (Kevin Buhr)
>
> Okay...
>
> My kernel is hacked to bits, but---as promised---I have the exact
> kernel code instruction (called by the process "afsd")
that decrements
> the "ebp" value!
>
> A dump follows:
>
[ dump and call trace deleted ]
>
> The relevant code (pardon the crudeness of the objdump output):
>
[ dull instructions deleted ]
>
> 00016280 <_crfree+4> movl   0x8(%ebp),%ebx
> 00016283 <_crfree+7> decl   0x8(%ebx)		   ; GOTCHA!!!
> 00016286 <_crfree+a> cmpl   $0x0,0x8(%ebx)
>
[ dull instructions deleted ]
>
> Kevin <buhr@stat.wisc.edu>

I was very, very, very proud of myself, and Derek <warlord@mit.edu>
found the problem in no time and had a brand spanking new module ready
in a day and a half.

And, that was the day my Linux box stopped trapping!

It didn't stop *hanging* until I hacked the WD80x3 network driver so
it'd force my 83905-based network card into shared memory mode, but
that's a hack for another day...

Kevin <buhr@stat.wisc.edu>



Parent Parent

Child Child

Back to index