Perhaps it’s a bit Machiavellian, but I just love code that in some way tricks another piece of code. For example, in college I wrote some code that trolled through the address space of my favorite game to afford me certain advantages. Most recently, I’ve been working on some code that tricks other code into believing a complete fiction[1] about what operating system it’s executing on. While working on that, I discovered an interesting problem with the pid provider — code that’s all about deception and sleight of hand. Before you read further, be warned: I’ve already written two completely numbing accounts of the details of the pid provider here and here, and this is going to follow much in that pattern. If you skip this one for fear of being bored to death[2], I won’t be offended.
The problem arose because the traced process tried to execute an x86 instruction like this:
call *0x10(%gs)
This instruction is supposed to perform a call to the address loaded from 0x10 bytes beyond the base of the segment described by the %gs selector. The neat thing about the pid provider (in case you’ve skipped those other posts) is that most instructions are executed natively, but some — and call is one of them — have to be emulated in the kernel. This instruction’s somewhat unusual behavior needed to be emulated precisely; the pid provider, however, didn’t know from selector prefixes and blithely tried to load from the absolute virtual address 0x10. Whoops.
To correct this, I needed to add some additional logic to parse the instruction and then augment the emulation code to know how to deal with these selectors. The first part was trivial, but the second half involved some digging into the x86 architecture manual. There are two kinds of descriptor tables, the LDT (local) and GDT (global). The value of %gs, in this case, tells us which table to look in, the index into that table, and the permissions associated with that selector.
Below is the code I added to usr/src/uts/intel/dtrace/fasttrap_isa.c to handle this case. You can find the context here.
1145 if (tp->ftt_code == 1) { 1146 1147 /* 1148 * If there's a segment prefix for this 1149 * instruction, first grab the appropriate 1150 * segment selector, then pull the base value 1151 * out of the appropriate descriptor table 1152 * and add it to the computed address. 1153 */ 1154 if (tp->ftt_segment != FASTTRAP_SEG_NONE) { 1155 uint16_t sel, ndx; 1156 user_desc_t *desc; 1157 1158 switch (tp->ftt_segment) { 1159 case FASTTRAP_SEG_CS: 1160 sel = rp->r_cs; 1161 break; 1162 case FASTTRAP_SEG_DS: 1163 sel = rp->r_ds; 1164 break; 1165 case FASTTRAP_SEG_ES: 1166 sel = rp->r_es; 1167 break; 1168 case FASTTRAP_SEG_FS: 1169 sel = rp->r_fs; 1170 break; 1171 case FASTTRAP_SEG_GS: 1172 sel = rp->r_gs; 1173 break; 1174 case FASTTRAP_SEG_SS: 1175 sel = rp->r_ss; 1176 break; 1177 } 1178 1179 /* 1180 * Make sure the given segment register 1181 * specifies a user priority selector 1182 * rather than a kernel selector. 1183 */ 1184 if (!SELISUPL(sel)) { 1185 fasttrap_sigsegv(p, curthread, 1186 addr); 1187 new_pc = pc; 1188 break; 1189 } 1190 1191 ndx = SELTOIDX(sel); 1192 1193 if (SELISLDT(sel)) { 1194 if (ndx > p->p_ldtlimit) { 1195 fasttrap_sigsegv(p, 1196 curthread, addr); 1197 new_pc = pc; 1198 break; 1199 } 1200 1201 desc = p->p_ldt + ndx; 1202 1203 } else { 1204 if (ndx >= NGDT) { 1205 fasttrap_sigsegv(p, 1206 curthread, addr); 1207 new_pc = pc; 1208 break; 1209 } 1210 1211 desc = cpu_get_gdt() + ndx; 1212 } 1213 1214 addr += USEGD_GETBASE(desc); 1215 }
The thing I learned by writing this is how to find the base address for those segment selectors which has been something I’ve been meaning to figure out. We (and most other operating systems) get to the thread pointer through a segment selector, so when debugging in mdb(1) I’ve often wondered how to perform the mapping from the value of %gs to the thread pointer that I care about. I haven’t put that code back yet, so feel free to point out any problems you see. Anyway, if you made it here, congratulations and thanks.
[1]Such is my love of the elaborate ruse that I once took months setting up a friend of mine for a very minor gag. Lucas and I were playing scrabble and he was disappointed to hear that the putative word “fearslut” wasn’t good. Later I conspired with a friend at his company to have a third party send mail to an etymology mailing list claiming that he had found the word “fearslut” in an old manuscript of an obscure Shakespear play. Three months later Lucas triumphantly announced to me that, lo and behold, “fearslut” was a word. I think I passed out I was laughing so hard.
[2]My parents are fond of recounting my response when they asked what I was doing in my operating systems class during college: “If I told you, you wouldn’t understand, and if I explained it, you’d be bored.”
5 Responses
You code does not check the segment limits and access rights.
Kostik,
It doesn’t? I thought that it was sufficient to confirm that it was a user selector and check the bounds on the given descriptor table — is there something else you think I need to do? Thanks.
Adam,
What Kostik is getting at isn’t just bounds-checking within the GDT/LDT; the descriptor itself (i.e., the entry in the g/ldt) goes on to describe what the segment can and cannot be used for: from which CPL, for what purpose (code/data/”special”), and which virtual addresses cause a GP exception. None of this stuff is getting checked in the above.
In the case of the GDT, you might get away with playing a little fast and loose, since, after all, Solaris set up the GDT itself, and perhaps it knows that there’s no fancy x86 jiggery-pokey going on. In the case of the LDT, though, it’s conceivable that it was set up by the user-level process. The user code might even be expecting sigsegv’s according to its use of those segments (e.g., maybe it’s an execution environment for old Windows 3.11 code).
Even in the GDT case, the lack of a CPL check could be a hole. Does the kernel GS in 32-bit Solaris differ from the user GS? If so, a malicious user-level program that’s waiting for some silly root user to dtrace its gs:call instruction could fab up a gs selector that refers to a kernel segment (access rights indicate a DPL of 0), but with RPL of 3. This would, in the best case, leak kernel data (the contents of a given offset from the kernel GS) in the ucontext given a SIGSEGV handler. It could do some more colorful damage if you ever did similar emulation for a load or store (and a lot of x86 opcodes end up being essentially loads and stores, including call).
I’d also wager a guess that this could would fall down on a far call to a task state segment embedded in the LDT. You probably care much less, though.
(Not trying to be a jerk to my homey Adam. One love. I’m not brave enough to post real code in my blog, and this descriptor table stuff is the hairiest part of a dizzyingly hairy architecture).
Keith! Just the man I was hoping would weigh in. Now I see there are several more fields in the segment descriptor that I’m ignoring — including but not limited to the limit. Perhaps that’s what Kostik was getting at.
I somewhat naively thought that checking the selector’s attributes would be sufficient, but — of course — that falls well short, and exposes a rather obvious hole (obvious, that is, to the x86 super-nerd). To answer your question, the Solaris kernel does indeed have a different %gs than user-land, and — as you identify — all it would take is a nimbly constructed signal handler to extract data from the kernel (that is, if I had put this back, which I haven’t yet — calm down, ZDNet). You’re also right about not caring about far calls and other similar instructions — the pid provider emulates a tiny subset of the x86 instruction set, and call and jmp are the only two instructions in that subset that can perform generic loads.
And to address your comment, it’s working under an open source license — not any bravado — that let’s me post code on my blog. How else could I have elicited a code review from you on this harrowing strait of code?
Upon further consideration, the worst users can do with the current code is to find the location of the thread pointer — not a huge security hole. The fasttrap_fulword() will not load from kernel space so we’re safe there. The other issue, of course, is that this will successfully emulate instruction that should fail — again, worth fixing but not earth-shattering.