Eflex tha Vybe Scientist

Registered Senior Member
I'm new (obviously)

I know these boards can tend to be a little cliquish so I'll try to keep my posts relevant and interesting..

Question for all the geneticists...

I'm just beginning my journey in Bioinformatics, can anyone recommend a good book to read on the subject???
Study the RDBMS first and then study the object databases well. Then learn query and visualization tools out there. There are not very many good tools out there. You may have to design for proteomics and related datawarehouses. Bioinfomatics is useless without good storage and retrieval techniques. Then you have to learn good data mining skills. That is more of an art than science. Innovation is like art. Some are good at, others are not....

You have a tough road ahead...or you can get an MBA and get a cushy job.... :D

Any questions, please feel free to post...
Welcome Eflex the one with a very long name,

What kind of bioinformatics do you want to do and what's your background?

->Sequence analysis, especially developing new algorithms I recommend:
"Biological Sequence Analysis" by Durbin, Eddy, Krogh, and Mitchison

-> Databases - along the lines of what kmguru saying I think it makes sense to learn common database stuff first. If you have a linux machine you can always download and install postgres or mysql free and play with them. Doing some basic cgis in perl that serve as an interface for the db is a typically important popular skill (though trying to do complex things in a procedural format in the database itself is a good thing to think about when doing this). There are some basic tools out there to play with too ie bioperl/biopython/biojava ... I can't think of any books I would recommend to start.

-> If you're interested in micro-array analysis, I can recommend some papers if you want, but I don't know any books.

Personally I distinguish computational biology from bioinformatics and think other than the databases and arguably some of the sequence analysis (since searching databases of data is pivotal) falls under that heading. If you're interested in modelling biological processes I recommend "Mathematical Biology" by JD Murray. It's a bit dense mathematically though.

If your interested in anything else more specific I may or may not be able to give you some good starting material.
Thanks for the replies.

I've studied Biochemistry and Genetics in college.

Right now, I'm interested in the Proteome and how to design better ways to store 3 Dimensional information.

Most databases are all about the Array/Hash Map/key/value data store.

Im interested in something a bit more.....
Databases in and of themselves aren't too interesting (well building more efficient query engines might be to some), but if you have the database set up right you can do some interesting analyses. Proteins are definitely the future, but the techniques are just beginning to catch up with those for DNA and they are less understood and still not as powerful.

So you don't like PDB format, huh? I know there are other newer formats, but it seems pretty well entrenched. A lot of apps use it. Out of curiousity what's wrong with it? Not compact enough since it uses text? Doesn't have a way of storing measurement errors or electron densities? No good for NMR/non-static structures?

Proteins aren't exactly my thing, though I can ask a couple friends. Are you only interested in storage aspects or protein modeling/folding, theoretical structure function analysis, etc?
Originally posted by scilosopher

Are you only interested in protein modeling/folding, theoretical structure function analysis, etc?

The "unpredictable" nature of protein folding is what interests me.
The fact that some proteins will beheave differently in different solutes and pH levels is also very interesting.

as cells modify exiting proteins by adding sugars or lipids, this would be difficult to catalogue in a conventional database.

Construction of a Dynamic environment that would predict and detail the folding of new proteins is my ultimate goal.

I just dont see the current set of databases supporting that.
That's a tough goal. Ab initio folding of anything >~70 aa requires super computers and is still not spectacular (you might want to look up the results of CASP for the last few years).

I think most people haven't worried about glycosylation and other post translational modifications because there aren't great experimental procedures for determining them. Even mapping phosphorylation sites is a fair amount of work. I'm also not aware of anyone who has been able to use the known structure of a protein to predict the change that it undergoes after phosphorylation - like activation of a kinase or whatever.

Not to discourage you, I think there is a lot of interesting work to do in that area. And you did say it was an eventual goal. For the short term though, picking a more tractable problem is at the heart of scientific progress. The longest journey begind with a single step ...

You can store 3 dimensional data efficiently through an intermediate algorithm like using fractal calculations. But prediction of folding is another matter. If, i were the database engineer, I would use a neural net first to see if that would get me anywhere. There are other pattern algorithm that may work too. My last project in Bioinformatic was to rearchitect a pattern query using Paracel System. That is simple compared to what you are trying to do. I presented the above ideas to a major Genomic company, but they were more interested to patent Genes by the thousands than work on the other side.

Since I am not a biochemist, but an information technologist - as you work on specific problems, let me/us know, if I find a vendor or solution - I will post.
I know this probably isn't 100% useful, but holograms have bee round for years with the ability to store quantities of data, of course for a coin hologram to look three dimensional it would have to spun.

(kind of reminds me of a Victorian optical illusion involving a piece of card and two images, one on each side. A bird and a cage.
Once spun the bird appears in the cage.)

I just think this away of using 2 dimensions to express 3 dimensions. (at least a meta understanding)
Originally posted by kmguru
I presented the above ideas to a major Genomic company, but they were more interested to patent Genes by the thousands than work on the other side.

I too have encountered companies in the private sector that are only interested in profits.

This has proven very frustrating in terms of my pursuit of knowledge.

But, I will do my best to keep you guys informed of my progress concerning Proteonomics....