December 02, 2004

BitTorrent/p2p for scientific journal articles

Slashdot | Decentralizing Bittorrent. This article got me thinking again. Follow my stream of consciousness below, if you can:

1. A user on a p2p network to share papers can be: 1. a leech (no papers to share) or 2. an author (allowed to share if you want).
2. An author must get a unique user ID -- perhaps from CrossRef or COS -- that's the only way the p2p network knows that the author is "authentic"/"certified." This is important because only authors can share out their papers. Don't know yet the method of validation? Any ideas?
3. p2p software is married to each specific author or user; sharing is only allowed if the author "activates" the software for sharing -- insert method of validation here.
4. A decentralized p2p like Exeem would be nice, because each paper (i.e. each torrent) can be found without a centralized tracker. The only thing required is that the author creates the "torrent" file for the paper and keep his/her p2p software running on his PC on the net at all times. Thus, whenever a leech wants to get his/her paper, it's available.
5. This model of paper sharing makes it easier for authors to place his paper on the net for download without needing to setup a web server, needing to know html, etc. Only thing required is persistent network access.
6. Pertaining to the paper/article itself, it can be stored on the author's computer with a specific filename -- preferably the PubMed ID or something like a DOI number. That way, the paper can be referenced quickly from say Pubmed (or Hubmed).
7. As you can see, this network will only be successful if the author undertakes an extraordinary amount of work filing and indexing his/her past papers and sharing it on the p2p -- could be made easier with drag and drop tools and automatic parsing tools to determine the authenticity of the paper. Perhaps some sort of hashing function could verify the paper, like md5 sum on the published PDF (this means that somewhere there has to be a "master" list of all PDFs and the hash), but this is not likely to succeed since some journals now print PDFs "on the fly" with downloader identifiers embedded in the PDF. The network also needs a threshold level of users and sharers to succeed -- a threshold level that will be tough to exceed and overcome without good marketing. For some reason I doubt a grassroots level of marketing will garner much attention.
8. Just like the BitTorrent client software, this could rapidly be developed using Python and wxPython (GUI) -- any takers?

Posted by johnvu at December 2, 2004 06:02 PM
Post a comment