mirror of
				https://github.com/upx/upx
				synced 2025-10-19 23:42:44 +08:00 
			
		
		
		
	
		
			
				
	
	
		
			149 lines
		
	
	
		
			6.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			149 lines
		
	
	
		
			6.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| This document explains the concept of "filtering" in UPX. Basically
 | |
| filtering is a data preprocessing method which could improve the
 | |
| compression ratio of the files UPX processes.
 | |
| 
 | |
| Currently the filters UPX uses are all based on one very special
 | |
| algorithm which is working well on ix86 executable files.
 | |
| This is what upx calls the "naive" implementation. There is also a
 | |
| "clever" method which works only with 32-bit executable file formats
 | |
| and was first implemented in UPX.
 | |
| 
 | |
| Let's start with an example (from this point I assume a 32-bit file
 | |
| format). Consider this code fragment:
 | |
| 
 | |
| 00025970: E877410600                     calln     FatalError
 | |
| 00025975: 8B414C                         mov       eax,[ecx+4C]
 | |
| 00025978: 85C0                           test      eax,eax
 | |
| 0002597A: 7419                           je        file:00025995
 | |
| 0002597C: 85F6                           test      esi,esi
 | |
| 0002597E: 7504                           jne       file:00025984
 | |
| 00025980: 89C6                           mov       esi,eax
 | |
| 00025982: EB11                           jmps      file:00025995
 | |
| 00025984: 39C6                           cmp       esi,eax
 | |
| 00025986: 740D                           je        file:00025995
 | |
| 00025988: 83C4F4                         add (d)   esp,F4
 | |
| 0002598B: 68A0A91608                     push      0816A9A0
 | |
| 00025990: E857410600                     calln     FatalError
 | |
| 00025995: FF45F4                         inc       [ebp-0C]
 | |
| 
 | |
| Here you can find two calls to a function called "FatalError". As you
 | |
| probably know the compression ratio is better if the compressor engine
 | |
| finds longer sequences of repeated strings. In this case the engine
 | |
| sees the following two byte sequences:
 | |
| 
 | |
| E877 410600 8B   and
 | |
| E857 410600 FF.
 | |
| 
 | |
| So it can find a 3-byte-long match.
 | |
| 
 | |
| Now comes the trick. On ix86 near calls are encoded as 0xE8 then a 32
 | |
| bit relative offset to the destination address. Let's see what
 | |
| happens if the position of the call is added to that offset:
 | |
| 
 | |
| 0x64177 + 0x25970 = 0x89AE7
 | |
| 0x64157 + 0x25990 = 0x89AE7
 | |
| 
 | |
| E8 E79A0800 8B
 | |
| E8 E79A0800 FF
 | |
| 
 | |
| As you can see now the compressor engine finds a 5-byte-long match.
 | |
| Which means, that we've just saved 2 bytes of compressed data. Not bad.
 | |
| 
 | |
| So this is the basic idea (the "naive" implementation). All we have to
 | |
| do is to "filter" the uncompressed data using this method before
 | |
| compression, and "unfilter" it after decompression. Simply go over the
 | |
| memory, find 0xE8 bytes and process the next 4 bytes as specified
 | |
| above.
 | |
| 
 | |
| Of course there are several possibilities where this scheme could be
 | |
| improved. First, not only calls could be handled this way - near jumps
 | |
| (0xE9 + 32-bit offset) could work similarly.
 | |
| 
 | |
| A second improvement could be if we limit this filtering only for the
 | |
| area occupied by real code - there is no point in messing with general
 | |
| data.
 | |
| 
 | |
| Another improvement comes if the byte order of the 32-bit offset is
 | |
| reversed. Why? Here is another call which follows the above fragment:
 | |
| 
 | |
| 000261FA: E8C9390600                     calln     ErrorF
 | |
| 
 | |
| 0x639C9 + 0x261FA = 0x89BC3
 | |
| 
 | |
| E8 C39B 0800     compare this with
 | |
| 
 | |
| E8 E79A 0800
 | |
| 
 | |
| As you can see these two functions are quite close together, but the
 | |
| compressor is not able to utilize this information (2-byte-long matches
 | |
| are usually not useful) unless the byte order of the offsets are
 | |
| reversed. In this case:
 | |
| 
 | |
| E8 0008 9AE7
 | |
| 
 | |
| E8 0008 9BC3
 | |
| 
 | |
| So, the compressor engine finds a 3-byte-long match here. This is a
 | |
| nice improvement - now the engine utilizes the similarity of nearby
 | |
| destinations too.
 | |
| 
 | |
| This is nice, but what happens when we find a "fake" call - ie. an 0xE8
 | |
| which is part of another instruction? Like this:
 | |
| 
 | |
| 0002A3B1: C745 E8 00000000               mov       [ebp-18],00000000
 | |
| 
 | |
| In this case those nice 0x00 bytes are overwritten with some less
 | |
| compressible data. This is the disadvantage of the "naive"
 | |
| implementation.
 | |
| 
 | |
| So let's be clever and try to detect and process only "real" calls. In
 | |
| UPX a simple method is used to find these calls. We simply check that
 | |
| the destinations of these calls are inside the same area as the calls
 | |
| themselves (so the above code is still a false positive, but it helps
 | |
| generally). A better method would be to actually disassemble the code -
 | |
| contributions are welcome :-)
 | |
| 
 | |
| But this is only half of the job. We can not simply process one call
 | |
| then skip another one - the unfiltering process needs some information
 | |
| to be able to reverse the filtering.
 | |
| 
 | |
| UPX uses the following idea, which works nicely. First we assume that
 | |
| the size of the area that should be filtered is less than 16 MiB. Then
 | |
| UPX scans over this area and keeps a record of the bytes that are
 | |
| following the 0xE8 bytes. If we are lucky, there will be bytes that
 | |
| were not found following 0xE8. These bytes are our candidates to be
 | |
| used as markers.
 | |
| 
 | |
| Do you still remember that we assumed that the size of scanned area is
 | |
| less than 16 MiB? Well, this means that when we process a real call, the
 | |
| resulting offset will be less than 0x00FFFFFF too. So the MSB is always
 | |
| 0x00. Which is a nice place to store our marker. Of course we should
 | |
| reverse the byte order in the resulting offset - so this marker will
 | |
| appear just after the 0xE8 byte and not 4 bytes after it.
 | |
| 
 | |
| That's all. Just go over the memory area, identify the "real" calls,
 | |
| and use this method to mark them. Then the job of the unfilter is very
 | |
| easy - it just searches for a 0xE8 + marker sequence and does the
 | |
| unfiltering if it finds one. It's clever, isn't it? :)
 | |
| 
 | |
| To tell you the truth it's not this simple in UPX. It can use an
 | |
| additional parameter ("add_value") which makes things a little bit more
 | |
| complicated (for example it can happen that a found marker is proven to
 | |
| be unusable because of some overflow during an addition).
 | |
| 
 | |
| And the whole algorithm is optimized for simplicity on the unfiltering
 | |
| side (as short and as fast assembly as possible - see stub/macros.ash),
 | |
| which makes the filtering process a little more difficult (fcto_ml.ch,
 | |
| fcto_ml2.ch, filteri.cpp).
 | |
| 
 | |
| As it can be seen in filteri.cpp, there are lots of variants of this
 | |
| filtering implemented - native/clever, calls/jumps/calls&jumps,
 | |
| reversed/unreversed offsets - a sum of 18 slightly different filters
 | |
| (and another 9 variants for 16-bit programs).
 | |
| 
 | |
| You can select one of them using the command line parameter "--filter="
 | |
| or try most of them with "--all-filters". Or just let upx use the one
 | |
| we defined as the default for that executable format.
 | |
| 
 | |
| EOF
 | 
