How do you find the code of .exe

Discussion in 'Computer Science & Culture' started by DNA100, Oct 28, 2011.

Thread Status:
Not open for further replies.
  1. DNA100 Registered Senior Member

    Messages:
    259
    Let's say there is a .exe file, how does one find the code of that software?

    I mean there must be someway, otherwise how does the OS know how to execute it and how do so many people create cracks and (illegal) copies of protected files?
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. MacGyver1968 Fixin' Shit that Ain't Broke Valued Senior Member

    Messages:
    7,028
    That would be the source code...most people don't release that. The source code is written in whatever programming language, then is run through a compiler which turns the source code into the executable file which is just machine language.
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. river-wind Valued Senior Member

    Messages:
    2,671
    How to make a computer program and then retrieve the code again

    Step 1: write logic in a high level language like C++, C#, Java, etc.

    Step 2: compile that code from the pseudo-English it was programmed in into something the computer can execute
    Step 2a: machine code - actual instructions on moving data values from one place in memory to another, and add/multiply it, etc. C++ compiles into this. Programs need to be compiled for each cpu architecture (x86, x87, POWER, ARM, etc), as the machine code for each CPU is different.
    Step 2b: bytecode, something in between high-level language and machine code. This is then translated into machine code when you run the program (C# and java do it this way). Execution is a bit slower, but you don't have to compile the code for each CPU platform; any system w/ an interpreter for that bytecode can run your compiled program. So you write Java, compile it into bytecode, and run on any CPU with a Java runtime environment.

    Step 3: to view the code from a compiled .exe or other executable application, you either need to get access to the original high-level code from the developer, or you need to try and de-compile it, undoing step 2. This would, in theory, take the machine or bytecode and turn it back into human-readable C++, C#, Java or whatever else was used in step 1.

    However, when step 2 turns english-like stuff into machine-readable stuff, it also does lots of optimization; cleaning up uneeded logic duplication, speeding up certain areas of code for certain hardware designs, etc. These optimizations are not always reversible - thus you often can't effectively decompile the machine or byte code back into your original programming.

    SO, the best way to get access to the code of an .exe is to ask the programmer for it. That's what Open Source software is; where the code is available for you to look at, and modify.
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. Stryder Keeper of "good" ideas. Valued Senior Member

    Messages:
    13,105
    In successful programming projects, Documentation is a key factor. This means that when people write the human readable programming code, they also leave a lot of entries that are hashed out which are meant as notes to the programmer and potential other programmers.

    Those notes are stripped away from the program when it is compiled, which means the basic entries that aid during the development documentation are stripped.

    Reverse engineering code so you can reverse from a compiled state is greatly impacted by the absence of such notes.

    Incidentally .exe's can be hacked about using a hex editor, but to reverse engineer the actual operations requires capturing those details during a standard runtime, this is often done through Kernel Debuggers or Emulators.
     
  8. CptBork Valued Senior Member

    Messages:
    6,460
    A) A lot of the required info gets released by disgruntled/indifferent industry insiders... i.e. Adobe fires some guy who knows how Adobe verifies that your software is legit, well guess what that can potentially lead to?

    B) The .exe can be read and executed, but it comes with no explanations as to why it does all the calculations it does... The reverse-engineer has to come along and figure out "Aha! These calculations are typically performed by a program that's used to paint stuff, and this code here probably refers to a specific brush type." Terribly difficult to do without access to the original programmers and their source code, if not altogether impossible because of the complexity involved. Nonetheless, sometimes it can be done, like if you've already isolated the portion of a program responsible for checking copy protection after running it on a test machine and watching what it does.

    C) Crappy copy protection is often used, including recycling existing methods that have already been cracked to death elsewhere.
     
  9. DNA100 Registered Senior Member

    Messages:
    259
    (Thanks for the replies)
    Wow, that's a lot of complex stuff.
    So see if I got this :-
    To reverse engineer a software, you need to get the compiled form of the executable file. From there , you will need to decompile it, which is hard work. To decomple properly, it is helpfult to get the snapshots of compilation steps during runtime. This is done using kernel debuggers and emulators.

    OK, so
    1> How do you retrieve the compiled form of the code? Where do you find it?

    2> A hex editor can hack .exe files? Which means it can be used to bypass authorization?

    3>But why can't I access the original code of .exe? I mean the compiler obviously has access to it or else it will not be able to compile!

    4>Kernel debuggers and emulators, do they provide compilation steps in runtime or do they decompile?
     
  10. CptBork Valued Senior Member

    Messages:
    6,460
    Without that, there's nothing to reverse-engineer.

    No, it's just a mess of machine-language instructions telling the computer to perform various operations and calculations in its memory banks. What you've got to do is figure out what the operations and calculations are for, why and how they're being done.

    No, you use debuggers and emulators to watch what the program does or tries to do, and this can give you clues to what various aspects of the machine code are for.

    Compiled code is what you get after making your purchase. What you're thinking of is source code, which is written in a much simpler and readable language along with comments on what it does, and the original programmers then run it through a compiler to produce machine code for runtime. To get the original source code, you normally get it from the company that made the program, assuming it's available.

    Hex editors can make arbitrary modifications to the .exe, but first you gotta figure out what the relevant part of the machine code is and what it's meant to do. Not at all simple, often practically impossible to do it just via this approach.

    It's compiled on another computer. When you buy the software, you normally get something which has already been compiled (translated, optimized) into machine code, and doesn't come with any of the original source code.

    Neither, they just track what the program is doing to the blocks of memory to which it's given access as it runs, since it would usually take far too long to figure all this stuff out just by looking directly at the machine code.
     
  11. MacGyver1968 Fixin' Shit that Ain't Broke Valued Senior Member

    Messages:
    7,028
    I don't think crackers use hex editors to get past copy protection or such. I've used a hex editor to modify save files. Say...my character had 1439 gold pieces...I would just search the file for the hex equivalent of 1439 (059F) then change that value to something higher.
     
  12. Emil Valued Senior Member

    Messages:
    2,801
    It's complicated.

    I don't know about reverse-engineered anyway you need to know from what language was compiled.
    Yes, but it is very hard to write directly in hex.
    A "compiler" is used even for machine language. Assembly languages.
    If I would make the program, after compilation I would put some instructions, in several places, directly in hex, so you can not reverse-engineered .

    Please Register or Log in to view the hidden image!


    I don't know.
     
  13. DNA100 Registered Senior Member

    Messages:
    259

    Good idea! Then I will use the hex editor to set new world record high scores in games.

    Please Register or Log in to view the hidden image!


    Except, I don't know how to use a hex editor and which file to modify.:bawl:
    (I understand hexadecimal, but that is not enough)

    OK, I understand this part. Thanks!
    ______________________________________

    No, I think you misunderstand my position.
    I understand that you need to write a code in a higher level language.(source code)
    Then you need to compile it to machine readable language before it is executable by the machine.

    I thought .exe files needed further compiling. But I understand now that it is already the fully compiled form written in machine language of 1 and 0s.

    What I meant was how do you find this already compiled machine level code?
    I mean you only find an icon for the .exe file on the computer. You can't see the code.
    Do you use the hex editor for that? To see this code?

    Or forget about .exe for now, say I need the binary form or the raw data for a .mp3 file so that I can modify the sounds and create a different .mp3 with the help of it. How do I get access to this raw data included included in the .mp3 file?



    OK. Understood!

    How do you "watch" this action? Can you give an idea of what it is like to use debuggers and emulators?

    So basically, to decompile means to get an idea of what the machine code is doing and then write the program from scratch. There is no decompiling software as such. And if so, I suspect that it can be done using a completely different programming language than the original source code?
     
    Last edited: Oct 30, 2011
  14. Rhaedas Valued Senior Member

    Messages:
    1,516
    Unless they know anything, and encrypt the save file. Then you'd have to go back to de-engineering the main program again to figure out what encryption method was used.

    I remember way back in the day, it only took a few days after it was out that the complete map of Ultima Online was pulled off the CD and posted everywhere.
     
  15. Chipz Banned Banned

    Messages:
    838
    Not entirely true. Vtable/Function names are stored in the binary output albeit mangled. Debuggers can demonstrate the function calls at runtime GDB can even give me the function name un-mangled. Heck, you can open up most ELF files (and I assume others as well) in Vim and read out the constants from the data segment word for word. There's also several basic Unix tools you can use (like odump) to get a description of the file. If it's a shared library, you might be able to emulate the appropriate headers and leverage it.
     
  16. Fuse26 011 Banned

    Messages:
    54
    You could try opening it in notepad.

    Please Register or Log in to view the hidden image!

     
  17. Stryder Keeper of "good" ideas. Valued Senior Member

    Messages:
    13,105
    There are ways to detect Debuggers which can be programmed, however in the old days the main problem was you would be dealing with a limited amount of resources on the computer, so all the excess code would cause overhead. Nowadays (and mostly thanks to Moore's Law) the resources have increase exponentially, which also means the amount of code that can be added is now supported.

    This is why companies started using various "third-party" inclusions in regards to copyright protection like for instance the infamous Starforce copy protection.

    Third-party inclusions have various points for instance any software company that uses them "licenses" the use of their security software for inclusion.

    The company that produces that security add-on can concentrate on just dealing with that particular form of security, allowing software companies to not get bogged down in trying to cover too much generalisation when they only want to produce one particular piece of software.

    The problem is with "all the Eggs in one basket", namely if you have 30 companies all using the same third-party company, you only need one successful exploit to undermine the protection of all those companies (If of course the core modules are following the exact same cryptology practices)

    So from a Project Management viewpoint it's a toss up between including third-party security for a negligible fee, knowing that one exploit could effect a lot of companies, or the expense of running your own in house team to develop the software (who might not have as much experience in the field)

    One viewpoint held by some programmers was actually doing something similar to how viruses can malform code, in the sense that if someone attempts to reverse engineer the software, the software detects that and then starts randomly disabling features of the program until it's completely dysfunctional.

    The only argument against that form of protection was "What if a virus is made to give a false positive of a reverse engineer attempt?", the result would be that a program wouldn't just suffer a viral infection but also the damage inflicted by the anti-reverse engineering code. So for the most part programmers have steered clear of this method.

    There has been an increase in the usage of "Phone home" software that can identify alterations Usually such companies that use these methods also include "Online support" in the sense that there might be some software or functions you can only use when online, so this partially forces you into using their checks.
     
  18. DNA100 Registered Senior Member

    Messages:
    259
    Hey Stryder, since you seem to know these things, and since CptBork seems to have gone somewhere, can you please answer some of the questions I was asking?

    Please Register or Log in to view the hidden image!



    1>First of all, a .exe file is an already compiled machine level code right?
    Now if it is machine level, then why do we need different .exe files for different OS for doing the same thing? Shouldn't a machine level code be independent of any other kind of software, including the OS?

    2>How do we actually read or get access to that machine level code? Do we do that using a hex editor?

    3> Also, about the .mp3 file question - just like .exe files, there must be binary data associated with it, right? And there must also be a way to access it and manipulate it - that must be how mp3 coverters, editors, splitters, joiners etc work. So, how do you gain access to the binary code?

    Do you also do that using a hex-editor? (Yes/No?)

    4>If the answer to both 2 and 3 is hex-editor, can you give a brief idea of how to use a hex-editor? (If it's complicated, give a link)

    5> I am also very curious about these emulator and debugger thing that you described. Are those two the same thing?
    I want to be able visualize the process of emulating/debugging a .exe code and understand in action how it can be used to reverse engineer or crack the software.
    Perhaps you can provide a link to a good educational video about using it, or a website/ebook that describes that with examples and images?
     
  19. przyk squishy Valued Senior Member

    Messages:
    3,203
    Mostly. It contains the compiled machine code, but it also contains some technical information in a header intended for the OS. I don't know about the Windows EXE format, but other executable formats (such as ELF on Linux) can also optionally store some symbolic information such as function/subroutine names.

    No. The reason for this is that modern operating systems run applications in a protected environment which limits what they can do on their own. Pretty much all an application can do on its own is do calculations and read/write memory specifically allocated to it. If a program needs access to other resources on the computer (eg. more memory, read/write access to a file, networking, information from an input device, or access to an output device), it needs to do it through the OS, and different OSs have different interfaces. Also most programs won't do everything by themselves but will leverage available libraries, and the libraries available on different OSs aren't all the same or implemented in the same way.

    You can also disassemble it.

    You can edit *anything* with a hex editor. But for standardised data formats (MP3, JPEG, etc.) typically there are both many applications that can already edit/manipulate these formats in various ways, and program libraries for handling them if you're a programmer. So if you want to manipulate one of these formats in a way that an existing application doesn't cover, you almost certainly want to look up a library for manipulating that file type and learn how to program with it, rather than opening the thing up in a hex editor. I don't know about MP3 specifically since the format is proprietary, but at least with free file formats, open source libraries for manipulating them will usually be readily available.

    Using a hex editor in principle is easy: you just open the file in the editor and change what you want to change. The hard part is learning and understanding the binary format you want to modify, not how to use the editor itself.
     
  20. river-wind Valued Senior Member

    Messages:
    2,671
    Also, different OS's include pre-written functions for standard things like drawing windows, open and save dialog boxes, handling font display, how to access external USB drives, etc. (they are called APIs - Application Programming Interfaces) Instead of the programmer having to write all this code themselves over and over again for each program, they call these OS functions whenever possible. Since these functions are different for each OS, any program calling one of these functions will only work on that OS.

    Systems like Java avoid this problem by not letting programmers call the OS functions directly; instead, java coders call the Java "Save local file" API, which in turn either calls the Windows, OSX, Linux or other "Save File Dialog Box" API. That's why Java can be written once, then run on 'any' machine (any machine with a Java Runtime Environment).

    Microsoft's C# is in theory similar to Java with its use of bytecode and a runtime environment. But since MS didn't make any runtime environments for C# bytecode/CLR for any OS other than Windows, C# is less freely movable. The Linux Mono project is a third-party effort to build a CLR runtime environment let C# bytecode run on systems other than Windows, but it doesn't have support from MS.
     
  21. DNA100 Registered Senior Member

    Messages:
    259
    OK, THANKS A LOT.
    You have removed a lot of my confusions.

    Although you left out question 5 - the one about emulators/debuggers.
    Suppose I want to reverse engineer a .exe file and I want to see what memory is called and/or changed by which part of the code - so that I can get a feel of the overall design/structure of the programming done and also understand which part to edit. How do I start? Any good educational video or learning material with examples?

    Can't I simply figure that out simply by looking at the hex code? Do I even need an emulator/debugger?


    Thanks.
    But I never really understood how java accomplishes this machine indepence.
    It always confuses me.
    There is no way a software can figure out the very hardware architecture of the machine from scratch, can it? There is no way it can figure out the which OS to communicate with either.

    So basically, is java platform independent because it knows some magic, or is it independent because most OS and hardwares are designed specifically for java?


    And is JRE OS independent? It is not, right? You have to install different JRE installation files for different OS.



    But that still proves that Java is OS independent. What about hardware Independence? How do you accomplish that? What kind of thing is bytecode anyway? What about the JVM, how does it do the work?
    Explain please!

    Please Register or Log in to view the hidden image!

     
  22. Emil Valued Senior Member

    Messages:
    2,801
    Like I said, is complicated.

    A program directly executable machine code runs on any operating system.
    The requirement is that the microprocessor to be compatible with machine code.
    You can do to run even without operating system.

    If to startup or pressing reset, the address counter starts at 00 00 (or XX XX), you can put ROM or EPROM memory, starting at location 00 00 or (XX XX), in which you wrote the program and the microprocessor will execute the program written in that memory.
    You can manually to remove the memory from the socket and put another memory with another program and the microprocessor will run the new program.

    Basically, an operating system does the same.
    But remember you have only the program you wrote.
    You do not have keyboard, mouse, monitor etc..
    These things or should you write in your program or to call as subroutine, from the operating system, but then you can not change the OS.
    In your program has to predict how and under what conditions give control back to OS.

    Problems that can arise at launching a program directly executable are the jump type instructions. Such as jump, subroutine call or end the loop.
    If these addresses are not parameterized then the program will work only if it is loaded starting at a specific address.
    This you can do in your program or launch it from an appropriate operating system.

    So if you call subroutines from OS, then you can not change the OS.
    Other way you can change the OS, with the condition the microprocessor to accept the instructions.
     
  23. Chipz Banned Banned

    Messages:
    838
    Let me explain Java:
    Java is all written in the same code on every computer, it's compiled into something called byte code. That byte code has different interpretations for each operating system, that's the Java Runtime. One functions byte code will translate to different machine code for different operating systems.

    That is...Byte code is platform independent. Runtime environments are translators of that byte code, and they must be developed for a specific platform.

    Java Syntax [platform independent] --> Java Byte Code [platform independent] --> Java Runtime [platform dependent]

    If people haven't developed a runtime environment to translate your byte code... Java will not work.

    It's like say:
    There's a man speaking Spanish. It needs to be understood in Japanese, Mandarin, and Russian. All of those countries have an English interpreter. So someone translates the Spanish to English...then each interpreter to their own language.
     
Thread Status:
Not open for further replies.

Share This Page