Ben's company, Biomind, attempts to address the current disconnect between three different scientific communities that really need to work together in order to find answers: biologists, bioinformaticists and the artificial intelligence community. As a result of this disconnect, biologists are not making the most out of their research data.
After talking to him an hour about the Novamente Cognition Engine that will soon be used in conjunction with Second Life avatars and an HTTP proxy created by the Electric Sheep Company to train dogs in Second Life, Ben casually mentioned a couple of examples of the technology of his other company, Biomind, that "blew my little (bio)mind" when I learned about what its software has been learning.
We'll get to training dogs in Second Life soon, I promise. But this, arguably, is a little more time sensitive, so I wanted to tell you about it first.
Ben: Well here's my vision - the vision I had in 2001 and 2002 when I got the idea to start Biomind. I could see that experimental biology technology was advancing like crazy, but biologists' ability to understand the data produced by their machines just wasn't keeping up. To put it simply, "What if we could feed all this data into an AI and have the AI really understand it, and produce the biological and medical answers we need?"
By this time, biologists have discovered an incredible amount of data that's barely understood, and a decent fraction of it is online. Talking just about the data that's already there online, right now - my feeling is that if you were to feed it all into a big database, and let the AI analyze it, and see what discoveries it comes up with, you could discover all sorts of cures to diseases. - Ben Goertzel, Biomind/Novamente
Biologists are great at running experiments, but typically they'll gather a huge mass of data from an experiment and then analyze it in a fairly simplistic way, one that only mines 1% of the information available within the data they've gathered. Then they'll take this 1% of information and use it to design another experiment. What they're doing works, but not as fast as would be possible with more intelligence applied to interpreting the data the machines spit out.
There are also some specific shortcomings in the standard data analysis procedures they use - especially in regard to the understanding of biological systems as whole systems. The standard data analysis methods biologists use are biased toward zooming in on one gene, one protein, one mutation that makes a difference. AI methods have a lot more capability to understand the interactions between different parts of a biological systems - and it is these interactions that really make life LIFE!
But these interactions are complicated to understand - if I look at the spreadsheet of data coming out of some modern biological equipment (say a microarrayer) I can't see the biological system dynamics in all those numbers ... and conventional data analysis methods can't usually see them either. But AI methods can look at the reams of messy, noisy data and pick out some really important glimmers of the holistic biological system underneath.
Lisa: Isn't this the same problem that bioinformatics is trying solve?
Ben: Yeah, bioinformatics is the discipline that deals with crunching bio data. And it has obviously made huge advances in the last decade. But when you really get into it, almost everyone trained in bioinformatics recently is really trained in very specific stuff - for instance, gene sequence analysis. However, bioinformatics training doesn't include much about data mining, mathematics, or advanced statistics. So the bioinformaticists' training isn't much more advanced than that of the biologists, by and large, in terms of mining the complex interactions and dynamics out of reams of experimental data.
I mean, at Biomind we've been doing specific work along these lines for a while. In a recent experiment, we took three data sets off the web, which are about mice under calorie restriction diets, fed those three data sets into our AI system, and then analyzed them to find out what genes are most important for distinguishing calorie restriction from control. We were able to pinpoint a couple of genes that no one has ever thought were important for calorie restriction before, but I'm quite certain they are. And this is just from three data sets! I mean, if you could feed in tens of thousands of data sets. Even those related to varios aspects of aging and aging-related diseases...
Now, if Biomind was richer, we'd follow this analytic work up with some wet lab work. Mutate those genes in mice. Make some mutant mice. Give some of them a calorie restriction diet, and see what happens. As it is, Biomind is not rich. So we'll just publish a paper and hope that somebody else picks up on it, or maybe partners with us in the future.
On another note, when we worked with the CDC a couple years ago, our AI crunched some of their mutation data gathered from people with Chronic Fatigue Syndrome, and what came out of it was the first-ever evidence that there is some sort of basis for CFS. That paper made a pretty big hit in the CFS community.
And we've done some great work with Davis Parker from the University of Virginia, on understanding Parkinsons and Alzheimer's disease based on heteroplasmic mutations in mitochondrial DNA..
Lisa: When did you do that work?
Ben: The Parkinson's work was in 2004. The Alzheimer's work is current research, but the preliminary results are pretty exciting, and the preliminary indications are that Alzheimers works basically the same way as Parkinsons, but with different genes and mutations involved.
Part One of Two
This list of references goes with this post, so far...
Biomind ArrayGenius and GeneGenius: Web Services Offering Microarray and SNP Data Analysis via Novel Machine Learning Methods
By Ben Goertzel, Cassio Pennachin, Lucio Coelho, Leonardo Shikida, Murilo Queiroz
IAAI-07 conference, July 2007
BioMind Whitepapers (registration required):
Machine Learning Algorithms for Clinical and Research Microarray Data Analysis
Mining Microarray Data to Discover: Disease Biomarkers & Complex Genetic Relationships
Enhancement of Gene Ontologies using Analytic Algorithms which Combine
Microarray and DNA Sequence Similarity Data
Mining Microarray and Sequence Data to Enhance Gene Ontologies
Video: Artificial General Intelligence: Now Is the Time
More information about video here
Artificial General Intelligence Research Institute Workshop
HTML version of AGIRI powerpoint:
The Novamente Project