How do you find the code of .exe

Discussion in 'Computer Science & Culture' started by DNA100, Oct 28, 2011.

Thread Status:
Not open for further replies.
  1. DNA100 Registered Senior Member

    Messages:
    259
    Thaks.
    OK, I more or less understand what you are saying.
    one question - what is "parameterized address"?

    Thanks.
    So that interpreter of of all languages to English is the JRE?

    OK, that explains the OS independent part.
    But what is the JVM, then? Is Java also hardware independent?
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Emil Valued Senior Member

    Messages:
    2,801
    The program is written to begin at 00 00, and there is a jump instruction at the address 0A 80.
    What happens if the program is loaded starting at address 0F 00? The jump instruction (at the address 0A 80) will be wrong!
    Therefore you have to put a parameter, jump to ADR1+0A 80, where
    ADR1 is the beginning of the program , in our case ADR1=0F 00.
    In an advanced language that is easy, but in machine code you have to take care of everything.
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. Crunchy Cat F-in' *meow* baby!!! Valued Senior Member

    Messages:
    8,423
    Technically, you already have all the code. An .exe file is loaded with individual instructions for your computers processor. Each instruction and any associated data are represented by pure numbers (which is not very human readable but certainly not impossible if you know your processor's instruction set well... which only certain types of engineers would).

    What you are after is to turn those processor instructions (that are encoded as numbers in the .exe file) into something more human readable. The standard way is to turn those numbers into assembly language. There are many tools that will perform that conversion and they are called "Disassemblers". The only caveat is that understanding assembly language is not easy (even computer scientists often struggle with it). Once you have disassembled your .exe file and modified the assembly language to your liking, you want to compile your assembly code. Some disassemblers have a compile feature but most often you will have to turn to a separate tool to compile your assembly code (there are plenty of them and they are often called "Assemblers").
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. Aqueous Id flat Earth skeptic Valued Senior Member

    Messages:
    6,152
    The question seems to be asking how to reverse engineer an executable program. Why? Do you want to know how to program, or how to fix or alter something, or are you trying to understand how computers work? You also ask how the OS finds, or knows how to find the executable. There's a lot of ground to cover if you really want to get into this. If you just want to find the source code, the explanation for that is much simpler but the task of actually finding the source code could be impossible. Source code, if it's a commodity, is usually tightly held to protect the invention, and may have a copyright or patent attached.

    Anyway, if you really want to delve into machine code, you will need to have an education on the microprocessor architecture, a map of the memory, ports and devices, device drivers, and the kernal or OS functions that every application uses to avoid reinventing the world.

    If you want to play with programming and a debugger, try this. Get on a PC with windows and any of the office products like Word or Excel. You can enter the Visual Basic Integrated Development Environment (IDE) directly from those products. You can try a few simple programs and learn how to debug them by following the help system. This is no small task as Visual Basic is huge in syntax, functions and capabilities. You won't get close to .EXE study very quickly, but at least it's a start.

    If you are serious about this, I would suggest you go find a dinosaur computer that someone is using as a boat anchor. Get on with MS-DOS and nothing more. Dig up the MS-DOS manual, the DOS system calls manual, and the reverse assembler. You will also need a diagram of the Intel 8086 processor and the Intel assembler reference manual. This will get you started. Every time you crash the box, just reboot and start over. Slowly but surely you will unravel the mysteries of Bill Gates and friends.

    I suggest this approach because chips made in the last 15 yrs or so have protections against invasion like this. So do their OS's.
     
  8. Emil Valued Senior Member

    Messages:
    2,801
  9. Rhaedas Valued Senior Member

    Messages:
    1,516
    Given the popularity, there might be more archived help out there in amateur assembly programming on the C-64, using the 6502 chip instructions. Combine resources with a good C-64 emulator and free software, and you can play with a relatively easy 8 bit coding to get your feet wet.

    It also demonstrates something mentioned earlier about specific OSes and how they differ. Many C-64 simple assembly codes would run on Apple IIs as well, since they ran the same processor. You'd only get into trouble if your code called on machine dependant functions.
     
  10. DNA100 Registered Senior Member

    Messages:
    259
    OK, I understand. Thanks!

    Thanks for the reply.
    "An .exe file is loaded with individual instructions for your computers processor"
    How does the .exe file know which processor I use?

    "which only certain types of engineer would"
    Well, if the processor belongs to a commonly used family, shouldn't there be a readily accessible instruction set manual for general reference?

    Well, I guess the assembly language will be easier to learn than the op code for a specific machine.
    So, some deassemblers can compile back to original machine code?



    Well, you are correct, I don't want the source code. It's not that hard to get if it's open source, right?
    Rather, I want to understand the basic programming of a .exe file by simply looking at the raw binary data and tinker with the .exe.

    When I compile a program I have always been curious about how exactly the machine translates/executes the high-level code in machine level and where does the OS interfere.

    And yes, you are right, I too figured that I will have to learn machine language to go into full depth. That's a lot of work and so I was thinking if there is a short-cut. I don't want full details right away, but perhaps a reference manual of instruction sets. Or even better, if I could SEE what memory changes are done by a section of the code, sort of carrying out the operations virtually for examining it's effects without actually executing the code. Can an emulator do it? That way I don't have to learn all the gibberish machine code. I could play with the code on my own.

    And I don't think I will ever get access to a real dinosaur computer. But are there video game like simulations of dinosaur computers which can be easily played with?
     
    Last edited: Nov 5, 2011
  11. DNA100 Registered Senior Member

    Messages:
    259
    Thanks a lot for that.

    OK. I am getting the idea. Thanks!
     
  12. Crunchy Cat F-in' *meow* baby!!! Valued Senior Member

    Messages:
    8,423
    It doesn't. PCs use x86 and x64 compatible processors and the .exe format is something exlusive to the windows operating system (which targets PCs). The person who made the .exe knew ahead of time what processor type he/she was targeting.

    Processor vendors often have op code listings available for software developers who make compilers. The rest of the software development world would be using assembly language and higher level languages.

    Correct. It is no trivial task though. For example, with a single line of code in C#, I can show a dialog that says "Hello". To do the exact same thing in assembly would take hundreds if not thousands of lines of code.

    Correct. An example would be Visual Studio Professional. It has both disassembler and assembler features.
     
  13. DNA100 Registered Senior Member

    Messages:
    259
    Thanks.Sorry, I was out and hence the late reply.
    What's the equivalent of .exe in other OSs?
    I mean, I have seen installation files for the many different OSs.
    Many linux systems run on x86/x64, so shouldn't a .exe work on linux?
    Also, noting that .exe works on both x86 and x64 - what's the basic difference between the two?

    The code that I find with the hex editor is the machine code, right?
    What disassemblers do is act on that hex code and change it to assembly language?


    Thousands! That much even in assembly languages?!
    I guess there is a lot of repeating of chunks that goes on by assembly programmers?

    Anyway, what I really want to know is if there is anyway I can simulate a line of instruction from a .exe code without necessarily knowing the machine language.
    What mean is- suppose I have the architecture of the machine.
    Now can't I see the memory changes in processor registers and RAM by simulating the actions of the code by VIRTUALLY running the instructions line by line?
    Surely, one can simulate a x86 processor virtually?
     
  14. Crunchy Cat F-in' *meow* baby!!! Valued Senior Member

    Messages:
    8,423
    There are too many to list. For apple it is .app files. For linux it is files marked with an executable attribute.

    Nope. An .exe is Microsoft's portable executable format. It's something that only the Windows Operating System can work with.

    x86 uses 32-bit processor instructions.
    x64 uses 64-bit processor instructions.

    Some of it is. With a HEX editor it would be difficult to distinguish between code and data.

    Hex isn't code. It is a base 16 representation of numbers. (ex. 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 2A 2B 2C... etc.). What disassemblers do is convert base 2 binary numbers (ex. 0 1 10 11 100 101 110 111 1000 1001 1010 1011 1100... etc.) into a text-based language that is more human readable. For example, the binary 11001100101010110011001111001111 might be converted into "mov eax, 5".

    Especially assembly languages.

    In some cases there might be. The reason it takes so much code in assembly to do something is because each instruction you write can only perform a very tiny operation.

    For example, in a higher level language such as C, I could write a loop to do something 10 times in a row like this:

    for (int i = 0; i < 10; ++i) <Do Something>

    In assembly it would look like this:

    mov cx, 0
    loop_start:
    cmp cx, 10
    jz loop_end
    push cx
    loop_body:
    <Do Something>
    pop cx
    inc cx
    jmp loop_start
    loop_end:

    That's called emulation and at the very least it requires you to know the machine code of processor that you are emulating.

    Yep, this is definitely possible and it takes a lot of work. Vendors such as VMWare have been providing PC / OS emulators for years.

    Generally knowing a register's content is not useful. Most people who are doing what you are describing want to discover how to cheat video games by altering memory locations that control various aspects of the game. There are many tools such as Quick Memory Editor that will already do this.
     
  15. DNA100 Registered Senior Member

    Messages:
    259
    Thanks.
    How do you make a machine code OS specific?

    If there is a difference in the number of bits, then why does the same .exe code get executed by both of them? Can they both execute the same instructions because of some kind of buffering ?



    I know what hexadecimal is. It should be easy to convert any binary code instantaneously to an equivalent hexadecimal code.
    For example, instead of remembering 11001100 it should be much easier to remember CC.

    What I meant to ask was - Is the hex editor just showing the hexadecimal version of the binary machine code? Or is it something else?


    Machine language is much shorter? Then shouldn't it be easier?



    I see.

    But's that still only 10 lines! Not 1000!


    Why do I have to know the entire machine language? Shouldn't it be enough to know which is the beginning of an instruction, and which is the end? I can experiment with the instructions.



    I have used VMware's OS emulators. Never knew it also had PC emulator.
    Well rather than cheat a video game I want to reverse engineer the whole thing -or at least understand the basic programming structure to understand what it is doing to my computer.

    Well, if possible, I want to visualize the processing of an instruction. Can an emulator give visual details?

    What is a quick memory editor? Something comparable to a hex editor?
     
  16. Crunchy Cat F-in' *meow* baby!!! Valued Senior Member

    Messages:
    8,423
    Two ways. One is to organize the code and data in specific format so that only the OS knows how to get everything running and monitor it properly:

    http://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx

    The other is to ensure the program relies on code exclusively provided by the OS (ex. the Win32 API).

    64-bit Windows has a 32-bit emulation layer so you can run 32-bit programs on it. The reverse is not true. 64-bit programs cannot run on 32-bit Windows.

    I prefer base 10 myself.

    It's showing a hex representation of both binary machine code and binary data. Which is which is anyone's guess.

    I wouldn't call it *much shorter*. It's certainly not easier because you would be looking at numbers without an textual language context.

    It's a very simple example of the difference (and it's not just 10 lines, it's 10 times as much code). A slighly more complex example would be:

    printf("Hello World");

    and it would resolve to hundreds assembly statements. An even more complex example:

    ::TextOut(hdc, 0, 0, _T("Hello World"), 11);

    would resolve to thousands of assembly statements.

    You would have to know all the machine language involved because the context of our discussion in this part of the post is regarding simulating an instruction arbitrarily. Simulating means taking an instruction and then performing an equivalent action. Without knowing what the instruction does, you have no way to take an equivalent action. If you really meant "Can I execute code arbitrarily" then the answer is yes. You move the instruction pointer to the memory location of where your code is stored. But that isn't simulation... that is just executing code (albeit dynamically).

    You want to reverse engineer a video game? If the video game is a native .exe and any more complex than something like a DOS text-based Tic-Tac-Toe game then you don't stand a chance in hell to understand it with assembly language (unless you happen to be some kind of savant with x86 assembly). If the video game is in Java or .NET and is not obfiscated then those particular languages can be decompiled very easily into Java and any .NET compatible language respectively; however, I don't know of any PC games written in Java or .NET.

    If you are just trying to figure out how to write video games, there are quite a few books on the market that could teach you.

    In theory it could; however, I don't know of any emulators that have such a visualization feature. You *might* be able to step through it's assembly code line by line using Visual Studio:

    http://msdn.microsoft.com/en-us/library/0bxe8ytt.aspx

    Not "A quick memory editor"... it's "Quick Memory Editor" (i.e. an actual product name). It's "...a powerful game cheating tool that can modify game data in memory easily. With Quick Memory Editor, you can search for any value in your game and change or lock it to the value you want to..."
     
  17. DNA100 Registered Senior Member

    Messages:
    259
    Thanks a lot again.
    Sorry, I have been bugging you with a lot of questions, so this will be the last bunch for the next one or two weeks.

    Please Register or Log in to view the hidden image!



    Can you explain the "specific format" thing? It's all 1s and 0s, so what kind of format are you talking about? Are you talking about a binary code that only the OS can translate into the actual op code?

    OK, I understand the API part. But how does the control exactly get transferred to the OS at bit level/machine level. Can you explain with a small example(if there is one)?

    What I am trying to do is to get a relatively concrete picture in terms of machine parts, rather than just getting the basic notions/overviews.




    What exactly makes a processor 32/64 bit?
    Size of the address bus?
    size of the registers?
    Size of the EDB?
    And based on that, what exactly makes an OS 32 bit?

    I understand that.
    But it is much easier to convert binary to hexadecimal and vice versa compared to 'binary to decimal'.




    I am having a hard time understanding why it takes thousands of lines. I can understand a for loop taking 10 lines, but 1000 lines for a print line!
    My guess is that most of those lines are used in communicating with the monitor and setting it up so that it can show a command line interface and print on a specific location. But that is only troublesome if you are doing it the first time. The next time you can copy and paste.
    Is my guess correct?



    I don't understand why I need to know ALL of the language.
    (My basic intentions are written after the next quote. Read it and come back here.)
    Isn't the very purpose of simulation to produce the effects of execution, without actually executing for real? If so, then if I can start simulating the lines of instructions one by one and see the effects. Once I get to the part that I am really interested in(I can see the effects after each clock pulse), then I can trace back the memory used and the relevant instructions for that part. Then, if I don't know the meaning of a particular instruction, I can look it up. After that, I can modify the instruction.But I have to primarily know the basic instructions. Am I wrong?


    Isn't a basic processor instruction like only 16 bits in length? (That's only 4 hex chars) . So an instruction like "the next instruction is an integer value,put that to register X" is 16/32 bits. Then the next instruction is "the value is 6". Another 16/32 bits. That way, fairly complex operations should be understandable, if you can follow it from the start.

    The primary instructions like "put it into that register" or "add the value of those two registers and put into into that register" or "fetch the next instruction from that memory address" should be very limited in number. So if I can classify those into groups, it should become a lot easier to get the basic program overview.(I don't need to know full details.)
    For example, if the program is demanding input for keyboard and then using that input to run some kind of checking operation, I can easily figure out some kind of hidden password. If the program is trying to communicate with my network cards I can figure that it has something to do with internet. All I have to know are the basic instructions that ask for keyboard input or network card access.

    Also, if I can keep track of the program counter memory pointers, then I can gain a fairly good idea about the flow of control over a brief section of the code.


    Not exactly reverse engineer the entire video game from scratch. I don't want to understand the ENTIRE DETAILS.
    OK, let me state clearly what I am trying to do.
    I want to be able to take that video game installer (and not just video game, any software) and be able to divide it into modules (like this part of the code is for graphics and display, this part is for saving and storing scores, this part is for update, this part is for input from mouse, this part part is for verifying username/password etc ) and then zoom in to do selective changes to the part that I desire. For example:
    1>Change a random red object to blue one.
    2>Change the background music.
    2>Bypass Demo time limit by eliminating the serial-key/time verification part.
    3>Stop it from accessing the internet for updates and stuff.
    5>Changing the scoring system.
    6>Stop it from creating unwanted modifications in my hard drive.
    etc

    I am sure such selective modifications can be done. For example, people here commented how crack makers can figure out the code associated with key verification.

    What is "obfiscated"?
    Decompiled back to Java because it uses bytecodes?
    Can't an assembly language program be decompiled into any language of choice? I don't need it it "nicely optimized and organized". Just decompile it back to a rough and funky looking code(with may be lots of goto statements) in any language of my choice. Can't I do that?

    I see.
    I already have Visual Studio installed in my computer.
    I installed that while trying to learn VB.
    Now what do I do? There are lots of features I see. But all I know is to do some basic programming.
    Which part of it should I go to to deassemble a .exe? And which part to use it as an emulator on the .exe?

    Are there other good FREE emulators? I want to try other free stuff.



    You said a hex editor can't distinguish between data and code.
    Quick Memory Editor can? How?
    What does a memory editor do differently from a hex editor?

    Are there other Free memory editors?
     
  18. Crunchy Cat F-in' *meow* baby!!! Valued Senior Member

    Messages:
    8,423
    It's like a book. Books have a preface, table of contents, chapters, etc. The link will outline the exact format of .exe files.

    The most common way is to load a DLL that the operating system provides and invoke a function within the DLL. Example:

    Sh32Name db 'shell32.dll',0
    hSh32dll dd ?
    hIconImg dd ?

    INVOKE GetModuleHandle,
    OFFSET Sh32Name
    mov hSh32dll,eax
    INVOKE LoadImage,
    hSh32dll,
    4,
    IMAGE_ICON,
    96,
    96,
    LR_LOADFROMFILE or LR_SHARED
    mov hIconImg,eax

    Studying computer science in a college environment will get you there.

    http://en.wikipedia.org/wiki/64-bit

    The operating system is built to make use of a 64-bit processor.

    *easier* is subjective.

    You are having a hard time understanding it because you don't know the windows architecture internals for blitting text to a window. It's a huge process. After you have some code you like you would typically make a function out of it so it can be called multiple times (rather than just pasting code).

    You are absolutely wrong if you are the creator of your own simulator. The reason is that you cannot simulate the effects of executing something if you don't know what that something is supposed to do.

    Correct.

    The only item that would really be realistic is hacking the serial/demo verification.The rest will likely be ridiculously difficult.


    Yep they can be done.

    Obfuscated code is source or machine code that has intentionally been made difficult to understand for humans. A binary program can only be decompiled into assembly (not any language of your choosing).

    The instruction from the link alluded to just opening the binary and walking through it line by line. Try it.

     
Thread Status:
Not open for further replies.

Share This Page