A Nebraska project to develop software to examine centuries-old newspapers as easily as the human eye has received a funding boost.
The three-year, $462,000 grant from the Institute of Museum and Library Services will enable the refinement and expansion of the Image Analysis for Archival Discovery project led by University of Nebraska-Lincoln researchers Elizabeth Lorang and Leen-Kiat Soh.
In 2013, Lorang, associate professor in University Libraries and a fellow in the Center for Digital Research in the Humanities, joined with Soh, a professor of engineering and computer science, to develop software to quickly scan and locate poetry among microfilmed newspapers. Similar to text-mining applications in which specific words and phrases are mined from digital sources, the program was designed to find specific images or outlines of images. The research earned a startup grant from the National Endowment for the Humanities.
Lorang and Soh developed an image-based classifier that could detect poetry content and published their findings in D-Lib Magazine. The software, however, was limited in its usability. A program manager at the Institute of Museum and Library Services urged Lorang and Soh to apply for funding.
“It’s an immensely more complicated project than we initially thought it was going to be,” Lorang said. “Extraneous information gets introduced on the pages that we need to be able to account for. We could have taken our code and deployed it, but we wouldn’t have gotten as accurate of information as we wanted.”
“Modeling and duplicating how human vision accurately identifies poems is challenging,” Soh said, and designing the software to mimic that function is made more difficult by the varying conditions of archived newspapers.
With the grant, Lorang said she hopes to make the software work as efficiently as the human eye – to instantaneously look at pages to see if it has poetry printed on it. Lorang said that prior to this research, she took for granted what her own eyes could process among varying types of information.
“It seems so straightforward, so we should be able to have a computer do that as well,” she said. “But that didn’t take into account how humans are able to naturally look past things like if the page is slightly turned or there is damage or bleed-through text.”
Lorang said those issues are filtered quickly by the human brain because they are not relevant. That needs to be built into the software, she said.
Lorang and Soh will also work with John O’Brien of the University of Virginia, who will use the software to examine 18th century British newspapers and provide feedback to improve the software. Soh said this partnership will provide an opportunity to gather different types of data and employ more comprehensive tests of the algorithms.
“We want to really understand this system and understand the complications so that we come out with a better piece of software and a better understanding of the range of issues involved,” Lorang said.
Lorang and Soh plan to make the related software and algorithms freely available and licensed for reuse and continued development. Lorang said she was invited to the Library of Congress in September to present at the Symposium on Understanding Collections as Data about the project.
“It allows researchers to think about different ways to access the data in these large collections and recast their ideas about what’s possible,” Lorang said.