With entire genomes available for study, finding specific genes of interest is challenging. University of Nebraska–Lincoln’s Yanbin Yin, a bioinformatics specialist, is creating advanced computational tools to quickly identify a class of enzymes found in all living organisms.
Yin’s tools are aiding research, including his own, into human gut health, biofuel production, crop diseases and our evolutionary past.
“If you sequence a plant or bacterial genome, there are probably tens of thousands of genes. But just 5% of those genes are these enzymes,” said Yin, associate professor of food science and technology. “If you do experiments, it could take 20 or 30 years to figure it out. With this software system, you can do it in five minutes.”
Yin and his team focus on carbohydrate-active enzymes, or CAZymes, the enzymes that produce, modify and break down all carbohydrates. He’s building on his earlier work that identifies CAZymes within genetic code researchers upload to a website. It has proven popular, receiving 50 to 60 uploads a day, Yin said.
Now, he’s advancing his software to analyze and classify CAZymes at a more detailed level. The software looks for key features within genetic code to distinguish among different CAZyme groups and predict how the enzymes function.
Yin is creating computer algorithms that learn and improve as data is added. His starting point is a CAZyme database compiled from the scientific literature and maintained by other researchers. He’s using the existing database to train his identification software and will package the software into a free, user-friendly website for CAZyme researchers.
“Our contribution is to have a software system that can learn from those training datasets and make predictions,” Yin said. The software will give researchers the ability to better understand the CAZymes they’re investigating.
Because CAZymes provide critical functions in nature, the tools will speed research across a wide variety of disciplines. Bioenergy researchers are investigating microbial CAZymes that break down complex carbohydrates into simple sugars, which can be converted into biofuels. Harnessing this ability would allow biofuel production from agricultural waste.
Plant pathologists are interested in the CAZymes that pathogens use to break through plant cell walls, causing disease.
Yin’s team is using the software to identify and investigate the CAZymes of beneficial bacteria living in the gut. The bacterial enzymes break down indigestible fibrous food into sugars the host can use. His research could lead to improved human and animal health.
The team is also looking deep into the past to better understand how plants migrated out of the water in a critical, early evolutionary process. He’s studying the CAZymes of an alga, Zygnema circumcarinatum, to investigate how early algae-like plants altered the carbohydrate chemistry of their plant cell walls to protect against the harsher conditions of living on land. Yin leads an international Zygnema circumcarinatum genome sequencing consortium with collaborators from UNL and institutions in Austria and Germany.
Yin’s projects are funded with $911,000 from the National Science Foundation’s Faculty Early Career Development Program, the prestigious award given to outstanding pre-tenure faculty. Yin received his CAREER award as a faculty member at Northern Illinois University and is continuing this work as a researcher in the Nebraska Food for Health Center at the University of Nebraska.
The CAREER award allowed Yin to develop bioinformatics workshops for preservice high school teachers and to recruit undergraduates, including underrepresented students, to work in his lab in Illinois and explore bioinformatics research. He plans to continue his outreach and recruiting efforts at Nebraska through opportunities with the Undergraduate Creative Activities and Research Experience program, known as UCARE, and the Center for Science, Mathematics and Computer Education.