Lehigh University logo
Lehigh University logo
Lehigh University logo

Keeping pace with the data explosion

With a torrent of new content unleashed on the Internet every hour, how do you find the news articles, status updates and videos you want to view? How do websites like Yahoo and Facebook feed you enough interesting content to make you want to click on the ads?

“We process terabytes of data every hour,” says Liangjie Hong ’13 Ph.D., a scientist at Yahoo! Labs. “You cannot consume it all.”

“If you’re really engaged, you have too many people to keep up with,” agrees Brian Davison, associate professor of computer science and engineering and head of Lehigh’s Web Understanding, Modeling and Evaluation (WUME) laboratory. Davison himself follows hundreds of people on sites like Facebook, where he is spending the 2013-14 academic year on sabbatical in the data science group.

Davison and Hong have collaborated on an innovative project that attempts to discern users’ behavior from a small sample of online activity and then to predict the types of content users would like to see.

“If we can better understand what you are interested in,” says Davison, “we can decide what to filter, rank higher or flag for your attention.”

The two researchers analyzed a spurt of Twitter activity, including millions of tweets posted by thousands of individual users. Then they trained an algorithm to predict with high accuracy how often the recipients of tweets would “retweet,” or rebroadcast, the messages to their own followers. They received a best poster paper at the 2011 World Wide Web Conference and then decided they could obtain more relevant information by modeling how individual users respond to new content.

“If we could record a user’s activities for 24 hours,” says Hong, “we would know exactly what they are looking for.”

Read the full story in the Lehigh University News Center.

Related Links