How big a problem is plagiarism? We might suspect that a fair amount of dubious publishing practices go on, such as publishing the same results more than once, submitting a paper to more than one journal, or reproducing others' work without acknowledgement. It seems to have become more prevalent as pressures to publish increase.

But how do we begin to identify incidences of plagiarism? It is surely not enough to go by anecdotal reports, or rely on researchers to highlight cases as they come across them by chance in the literature. Well, it's probably not that difficult. Computer-based tools that search documents for similarities have been around for a while and are widely used to identify cheating in school and college exams. Now Harold Garner and Mounir Errami at the University of Texas Southwestern Medical Center have developed a computer code that checks multiple documents for duplication of key words and compares word order and proximity. Applying their eTBLAST program to >7 million abstracts in Medline, ∼70 000 papers came back as being highly similar [Nature (2008) 451, 397].

Now there are a number of circumstances where significant duplication is to be expected and is wholly ethical. These include updates to clinical trials, conference papers, and corrections to papers. But Garner and Errami are determined in their pursuit of unethical practices. They have placed the 70 000 potential duplicate papers on a publically available database, Déjà vu (, and have begun to check them manually (quite a task!). Already, one suspected case of plagiarism has reportedly resulted in an investigation by a journal. “We can identify near-duplicate publications using our search engine,” says Garner. “But neither the computer nor we can make judgment calls as to whether an article is plagiarized or otherwise unethical. That task must be left to human reviewers, such as university ethics committees and journal editors.”

That does point to one problem: whose responsibility is it to chase cases of unethical practice, journals or universities? The simplicity of instigating such computational tools probably points toward the journals in identifying suspect papers (and many publishers are jointly investigating the best way to proceed), but subsequent steps are more open to debate.

I am convinced that it would be a very worthwhile step. Caution needs to be applied with suspect papers and common standards about how much new work is needed for a new paper need to be agreed upon, but clear-cut cases should be discovered in this way. But it does lead me to wonder what else is possible? Can figures be checked for manipulation or duplication? This is where scientific fraud and misrepresentation of data most often happen.

Read full text on ScienceDirect

DOI: 10.1016/S1369-7021(08)70001-4