A database of millions of books tries to make it possible to track cultural trends through the quantities of individual words in print over the last few centuries. Cynthia Graber reports
一个有着百万本图书的数据库正尝试把一项举措变为可能,即通过统计上几个世纪所有出版的单词数量,来追踪文化的趋势。 Cynthia Graber报道。
They call it culturomics: the obvious play on the word “genomics” looks at trends in human thought and culture. But scientists say culturomics has been hampered by a lack of quantitative data. So researchers at Harvard, along with Google, Encyclopedia Britannica, and the American Heritage Dictionary, have come up with a new tool.
It’s a database of 5.2 million books, published since the year 1500. That’s four percent of all the books ever published, with a total of 500 billion words. The focus is on English language culture, so three quarters of the books are in English.
Among the first findings of the research, published in the journal Science [Jean-Baptiste Michel et al., "Quantitative Analysis of Culture Using Millions of Digitized Books"]: about, 8500 new words enter the English language annually. But many of them don’t end up in dictionaries. And about fame—actors become famous around age 30, writers around 40, and politicians around 50. But the fame of politicians can eventually exceed that of actors.
A Google tool called the Books Ngram Viewer is available based on this data—users can track the usage and frequency of a word or phrase over the past few centuries. Thus, we can watch the fall and rise of Melville. And soon the rise and fall of Snooki.
—Cynthia Graber