Perfect Compression

Status
Not open for further replies.

kmguru

Staff member
WASHINGTON -- Physicists do not question the laws of thermodynamics. Chemistry researchers unwaveringly cite Boyle's Law to describe the relationship between gas pressure and temperature.

Computer scientists also have their own fundamental laws, perhaps not as well known, but arguably even more solid. One of those laws says a perfect compression mechanism is impossible.

A slightly expanded version of that law says it is mathematically impossible to write a computer program that can compress all files by at least one bit. Sure, it's possible to write a program to compress typical data by far more than one bit -- that assignment is commonly handed to computer science sophomores, and the technique is used in .jpg and .zip files.

But those general techniques, while useful, don't work on all files; otherwise, you could repeatedly compress a .zip, .gzip or .sit file to nothingness. Put another way, compression techniques can't work with random data that follow no known patterns.

So when a little-known company named ZeoSync announced last week it had achieved perfect compression -- a breakthrough that would be a bombshell roughly as big as e=mc2 -- it was greeted with derision. Their press release was roundly mocked for having more trademarks than a Walt Disney store, not to mention the more serious sin of being devoid of any technical content or evidence of peer review.

A Reuters article was far more credulous, saying in the lead paragraph that "a Florida research startup working with a team of renowned mathematicians said on Monday it had achieved a breakthrough that overcomes the previously known limits of compression used to store and transmit data."

For compression buffs, responding to such assertions ranks somewhere between a teeth-gnashing migraine and a full-contact sport.

The comp.compression FAQ has an entire section devoted to debunking: "From time to time some people claim to have invented a new algorithm for (perfect compression). Such algorithms are claimed to compress random data and to be applicable recursively; that is, applying the compressor to the compressed output of the previous run, possibly multiple times."

Several comp.compression fans have even offered rewards up to $5,000 for independently verifiable proof of perfect compression. They've never been claimed.

Perfect compression, or even compression of a few hundred times -- what ZeoSync claims -- would revolutionize the storage, broadband and digital entertainment industry. It would mean that modems would be as fast as DSL, and DSL speeds would be blinding. A 40-GB hard drive would hold a terabyte, and so on.

Link: http://www.wired.com/news/print/0,1294,49599,00.html
 
Is this technique based on Shanon Fano or maybe Huffman coding?

by the way dont you think with arrival of streaming techno,compression is losing its value?

bye!
 
Originally posted by zion
Is this technique based on Shanon Fano or maybe Huffman coding?

by the way dont you think with arrival of streaming techno,compression is losing its value?

bye!

NO

NO, we still need place to store and backup data. A company like Sears keeps 30 TB on line. If we can reduce that to say 100 GB - talk about money saved!

You could buy an 80 GB drive and fill it up (1000:1) with everything you ever wanted including audio, video and whatever. We have about a 1000 video tapes full of stuff recorded from television. If I can compress that to one hard drive - talk about cheap storage and quick scan.

A place like IRS or NSA stores petabytes of stuff. What a nightmare to manage on tapes...
 
It's one thing to be able to compress data 1000 to 1 or better.

It is a completely different thing to be able to do that type of compression with low latency. The latency any compression technique introduces is the limiting factor in real-time communications.
 
It is a logical impossibility for any program to compress every file by at least one bit.

Imagine we had a program that could do that. Suppose we start off with a file 10 bits long. We run it through the compressor, and out pops a 9 bit file. Again, and we have an 8 bit file. Keep going and we end up with a 1 bit file. The contents of that file can only be a single 0 or 1.

Now we want to decompress the file back to the original 10 bits. Obviously, it is impossible, since we have only 1 bit to work with. There's simply no way to store 10 bits of information in 1 bit.
 
Originally posted by James R
It is a logical impossibility for any program to compress every file by at least one bit.

Imagine we had a program that could do that. Suppose we start off with a file 10 bits long. We run it through the compressor, and out pops a 9 bit file. Again, and we have an 8 bit file. Keep going and we end up with a 1 bit file. The contents of that file can only be a single 0 or 1.

Now we want to decompress the file back to the original 10 bits. Obviously, it is impossible, since we have only 1 bit to work with. There's simply no way to store 10 bits of information in 1 bit.

This is a great example. More to the point, consider that there are 1024 possible different 10-bit files. Compress all of them down to one bit as you say. There are only two possible one-bit files. When I decompress one of these back to 10 bits, which 10 bit file will I get? I can't get all of them. Thus it is not possible to compress any (every) file by one bit. Two files cannot uniquely map back to 1024.
 
Originally posted by gnormhurst


This is a great example. More to the point, consider that there are 1024 possible different 10-bit files. Compress all of them down to one bit as you say. There are only two possible one-bit files. When I decompress one of these back to 10 bits, which 10 bit file will I get? I can't get all of them. Thus it is not possible to compress any (every) file by one bit. Two files cannot uniquely map back to 1024.

Perhaps one just needs to stop 'thinking digital' to understand new approaches like this...
 
Hi KM, a little explicit pointer;)
Computer scientists also have their own fundamental laws, perhaps not as well known, but arguably even more solid. One of those laws says a perfect compression mechanism is impossible.

If you"re talking about lossless compression,then this has been done,as in from Huffman and Shannon Fano Algorithms of Bell labs.

bye!
 
I'm reserving judgement on this until someone either proves or disproves it.

My take on an efficient compression system is something that I saw on "Tomorrow's World" about 8-10 years ago. At the time they were talking about using fractal encoding.
Take an image for example. As far as a computer is concerned, it is a series of pixels that change their RGB value from the previous pixel. This fractal encoding procedure would find a fractal equation that describes the changes in each pixel. Once you had that equation, you could apply "standard" compression techniques on it to achieve a very small file in comparison to the original image. Uncompressing the file and then solving the equation would provide you with a faithful reproduction of the original image.

I've not really kept an eye on research in recent years so it might be that it is in general use already.
In fact while doing a bit of background reading, I found this helpful FAQ... http://www.faqs.org/faqs/compression-faq/part1/section-15.html

List of links can be http://links.uwaterloo.ca/other.sites.html
 
Why is this topic in the scifi page? There is nothing Science Fiction about this.
This is Science Fact. Also what do people mean when they say "Random Data"?

My program that I made doesn't care about the file file-type nor what data
the file contains. Numers are numbers. People are so silly.

I programmed an application that creates a file 1 megabyte big and does so
using randomly generated numbers, and still my compression program works
just fine.

I like to think that TRUE non redundant data can append with more
non redundant data and still be considered non redundant alltogether.
It's similar to prime numbers in a way, as well as stable atoms.
Think abstractly people. :) Remember that theres always another way to
solve a problem. When your instructions to compress a file are larger than
the file itself, then you definetly are close to the file's approximate
minimum size.

As far as "perfect" compression, I'll never say that it's impossible.
Because it is possible. The word Perfect means so many things to so many
people. Just keep an open mind. Its possible to remove or even minimize
all known patterns within a file and call it compressed. The file's size would
be tiny, and that compression would be great yet still possible.

When it comes to digital data, many things are possible that aren't in the
real world. As the great Yoda says, "You must unlearn what you have
learned." :) Have a good one yaw...
 
// Why is this topic in the scifi page? There is nothing Science Fiction about this. This is Science Fact. //

I have no idea. I don't think I've posted to this forum in years. You must be replying to something I posted years ago. (I'm writing this message offline, based on an email notification I received.)

// Also what do people mean when they say "Random Data"? //

Random data has no predictable pattern.

// My program that I made doesn't care about the file file-type nor what data the file contains. Numers are numbers. People are so silly. I programmed an application that creates a file 1 megabyte big and does so using randomly generated numbers, and still my compression program works just fine. //

So you wrote a compression program? If you're getting good compression, the data isn't random.

// I like to think that TRUE non redundant data can append with more non redundant data and still be considered non redundant alltogether. //

That's true, but not if the appended data is redundant with respect to the data it's appended to. For example, if you took 1K of non-redundant data and appended to it a copy of itself, the resulting 2K would be redundant.

// It's similar to prime numbers in a way, as well as stable atoms. Think abstractly people. //

No, it's not like the prime numbers, because they are extremely non-random, or redundant.

// :) Remember that theres always another way to solve a problem. When your instructions to compress a file are larger than the file itself, then you definetly are close to the file's approximate minimum size. //

No, that's not true. For example, PKZIP is about 50K, but it can compress a 20K file of English text to about 5K.

// As far as "perfect" compression, I'll never say that it's impossible. Because it is possible. The word Perfect means so many things to so many people. Just keep an open mind. Its possible to remove or even minimize all known patterns within a file and call it compressed. The file's size would be tiny, and that compression would be great yet still possible. //

"Perfect compression" means compressing every possible file by at least 1 bit. It's impossible, for a very simple reason: The number of all possible files of length N bits is greater than the number of all possible files of length shorter than N bits, so at least two different files would be the same after compression.
 
Thanks for responding to my information.
Your reply is very clear and consise, thank you.
I didn't mean every little detail as set-in-stone but just an example to get
some viewers to question the idea for themselves instead of listening to
regurgitated information from other people.

It's much better to figure it out with creative ideas.
Conformity destroys creativity. - I had to slip that quote in there. :)

// "Perfect compression" means compressing every possible file by at least 1 bit.
// it's impossible.

You're right. Mathematically it is impossible. I'm in complete agreement with you.

If you definition of "Perfect Compression" existed,
That would mean that if you compressed all files down to 1 byte, when you
go to decompress it, there would be nothing to distinguish between 2
different files that have been compressed. Again I agree with you.

I've been working on my compression program for about a year now.
Its much better than 7zip and beter than cab and all the other compression
methods I've come across. I wrote it in DirectX. Pretty fun actually.

My program isn't "Perfect" per say but It removes all redundant data
throughout a file doing several passes. It takes an extremely long time.
The idea isn't new but I've taken a little different approach.

I'm still not clear about what people mean by random data. Like I said, I set
a seperate program up to create a big file that generates and adds random
bytes to a file. And my compression program compresses the file just fine.

Even though theres no real such thing as "Truely random" I do my best. :)

Any replys to this message are always welcome. It's great to finally find
some smart people to chat with.
 
Last edited:
// [Bagman] "Perfect compression" means compressing every possible file by at least 1 bit. It's impossible. //

// [hodonkain] You're right. Mathematically it is impossible. I'm in complete agreement with you. If you definition of "Perfect Compression" existed, That would mean that if you compressed all files down to 1 byte, when you go to decompress it, there would be nothing to distinguish between 2 different files that have been compressed. Again I agree with you. //

Talking about compressing every possible file to 1 byte just makes the point clear, of course. You couldn't even compress every possible file to 1 bit shorter than the originals.

// I've been working on my compression program for about a year now. Its much better than 7zip and beter than cab and all the other compression methods I've come across. I wrote it in DirectX. Pretty fun actually. //

Why or how DirectX? DirectX is for improved performance etc. with games vis-a-vis video, sound, etc. Isn't it just an API for that? Then how would you write a compression program "in" DirectX?

I don't know what "7Zip" is. I'm out of date on all this, but advanced programs I've seen don't outperform PKZip or WinZip by much. It's very hard to do, because there are diminishing returns. The so-called arithmetic encoding beats them, probably, but it's very CPU-intensive. Do you know about LZW (Lev-Zimpel-Welch) compression? PKZIP and others use variations of that.
_
// My program isn't "Perfect" per say but It removes all redundant data throughout a file doing several passes. It takes an extremely long time. The idea isn't new but I've taken a little different approach that all the other comrpession programs don't seem to do. //

Maybe they don't do it because it takes such a long time.

// I'm still not clear about what people mean by random data. Like I said, I set a seperate program up to create a big file that generates and adds random bytes to a file. And my compression program compresses the file just fine. //

If you merely add some random bytes to a file that has redundancy, you don't necessarily reduce the redundancy by much. It depends on how many you add and where you add them. If you merely append them, you just get some non-redundant stuff at the end; the rest of it still compresses as before. Even if you embed them, you can retain a lot of redundancy, depending on how many you embed, and where they're embedded. Based on what little you've said, I don't even see the point of adding or embedding random bytes.

Randomness is actually a difficult idea. *Every* string is random in the sense that it could have been produced by chance. I suppose a string is random if all codes are equiprobable at every position in the string.

You can't compress random data, or only very slightly.
 
I was writing it in delphi,
and uses C++ and DirectX as a platform. It's a newbie language but still
effective and fun to work with. Yes, I'm sure the amount of time my
program takes is probably the reason why it's not commonly used today.
Many people simply don't want to wait. For me, I don't mind.
The uncompression stage is normal though, it's just the compression that takes
a long time.
 
Last edited:
Why don't you go ahead and describe your algorithm. At the very least we can try to give you what the asymptotic running time would be. Big-O, Big-Omega, etc.

-AntonK
 
When you are talking about the theoretical limits of compression, a "random" number is one that is completely or nearly completely entropy. No discernable patterns. Such a random number could be used as the source of random numbers of the other sense (a series of numbers with no discernable correlation between one number and the next).
 
I think people mistake nonredundant data for random. They aren't the same thing. Even though something is random, it still can create patterns unintentionally.

Assuming I have an algorithm, I don't know what it would be. I tend to do things differently than other people. If anything new is going to come out of all this, I have to do things in a way that haven't been done before.

To create something new I have to think outside the box. Sorry, I'm drinking a little bit while I'm typing. lol
 
Last edited:
Status
Not open for further replies.
Back
Top