Saturday, March 20, 2010

SXSWi discussion: Mendeley

During South by Southwest Interactive 2010, I was fortunate enough to sit down with Victor Henning and Jan Reichelt of Mendeley.com about their project and their plans for it. Both are exciting - if you're an academic, anyway.

First, some background. Mendeley has been described as a sort of iTunes for research papers: its desktop client organizes your citations and accompanying files (PDFs, DOC files, etc.), while its web interface lets you share these citations and files. You can organize citations by project and use them to create bibliographies in a number of common formats. Think of it as EndNote meets iTunes meets Last.fm. (The Last.FM connection reaches deeper, since Last.FM's first investor is also Mendeley's.)

I've been using Mendeley for a while, and switched over to it in earnest last summer, when I realized that I needed a more flexible cloud-based solution. (I've been using a cloud-based system, Google Docs, for drafting papers.) It's been great for organizing citations, but I've never used the sharing function - partially because so few humanities scholars seem to use Mendeley.

When I sat down with Victor and Jan, I was very interested in their model, their plans, and their long-term stability. Here are some of the points they made.

The model. Yes, they want Mendeley to be the Last.FM or iTunes of research. Originally, they created Mendeley because as graduate students, they found that they had to spend a lot of time looking up and organizing citations. Why not organize and also share their citations and PDFs as a small community? So a free account will let users share their citations with everyone, and their files among 10 users.

But they were surprised to find out that institutions, both enterprises and academic departments, were also very interested. In particular, pharma, biotech and petrochemical companies, which are research-intensive, wanted to limit the amount of redundant information by sharing citations and files across their organizations. So they're working on a premium account that will allow institutional subscriptions, larger storage space, and more sharing. They estimate that they'll launch this premium account option in under six months.

Their plans. They also plan to expand the sorts of files that users can share, particularly institutional users. In particular, they'd like users to be able to share raw research data: data tables, audio, video, and so forth. Eventually, they would also like to add version tracking. These additions would particularly cater to institutional users, who must share data in private groups.

However, Mendeley also wants to push in the other direction, enabling more social sharing. Currently, users can create their own RSS feeds for publications, allowing them to feed their public collections into FriendFeed, Cliqset, or the like. And in Mendeley Desktop, when you add a citation, you can add shared notes around it. They anticipate more sharing features very soon, though. For instance, they expect to initiate a Facebook Connect connection in the next two months so that scholars can share recent citations and comments in their Facebook activity stream. (For scholars, it's preferable to FarmVille.) They're also working on an API (to be released in perhaps two months). And they want to introduce more granularity to collaborative PDF annotations, allowing users to choose case-by-case with which groups to share their sticky notes. Their next major release will also allow a browser view in the desktop software for comments and newsfeed.

In the medium term (which means 3-6 months in this business), they want to add other features. One is a recommendation system similar to Last.FM's, something that recommends citations to you based on what your friends are reading. Another is the ability to pull the conversation into the desktop. They'll connect all of the silos on the website. They also want to facilitate communication among relevant others and to display readership of citations.

They also see tremendous potential for enriching the metadata in the database. For instance, a department at Stanford has geocoded research - associating publications with the geospatial coordinates they describe - so Mendeley will incorporate that information. They also see potential for crowdsourcing some metadata, for instance, they are already training Mendeley to recognize PDFs by comparing crowdsourced metadata attached to previously uploaded PDFs. (In this way, Mendeley is more like the Shazam of research.)

They're eager to get more citations in Mendeley. Right now, they can pull in information from Zotero and Endnote, but they can't sync two-way with those formats. (Zotero synching would have to run as a Firefox instance, and Endnote's format is proprietary.) They hope that by developing an API, they can give others the tools to sync these formats.

Long-term stability. Mendeley's founders include founders of Last.FM and Skype as well as a former Warner Music executive. They have expertise in user interface development, they have academics in the driver's seat, and they have investor expertise - three critical ingredients for long-term stability. And they just finished raising second-round funding, funding that well exceeds the first round. Importantly, they also received two research grants from the EU and employ four people for R&D.

Beyond the above, they have a plan for revenue, focusing on the above-mentioned premium accounts plan for institutions.

Summary. I have been impressed by Mendeley as a desktop client, but I walked away even more impressed by the potential of the plans. After all, I'm very curious about what others in my field are reading and what they think of those readings - and I'd love to see these thoughts show up in their activity streams. Further, I can see all sorts of potential for sharing tagged citations or collections within and across professional organizations. The drawback I see for me is that the humanities are not well represented in the Mendeley community - at least not yet.

18 comments:

Unknown said...

It would have been interesting to learn more about how Mendeley is positioning itself against Zotero (which seems to serve a large community of humanists). Nearly every existing Mendeley feature and most of their proposed future functionality is already offered by Zotero, e.g.

* syncing of any file format
* large storage accounts
* no limit on sharing

Did Victor and Jan indicate how they expect to compete with an established open-source alternative that doesn't need to support dozens of employees and return a profit to investors?

Clay Spinuzzi said...

Hi, Buckwheat. Perhaps Victor and Jan will comment here, but I'll see if I can answer these questions.

In the interview, Victor and Jan didn't seem too concerned about competing with Zotero, beyond the fact that synching with Zotero libraries is troublesome due to having to run a Firefox instance. They did mention that they thought it was a big plus that Mendeley has both a standalone client and a fully featured web interface, rather than being anchored to a single browser. For those of us using Chrome (or Safari, Opera, or even IE), that's been a big plus.

My sense is that Mendeley limits file sharing to 10 people due to copyright restrictions, but the citations themselves are of course shareable to anyone. Does Zotero allow you to share files, including copyrighted article PDFs, without limit? If so, how are copyright restrictions handled?

As far as how they expect to compete with Zotero, they didn't tackle that question directly. But it's telling that Mendeley and Zotero both compare themselves to iTunes rather than an open source alternative ("The MPlayer of research"?). Open source and closed source models tend to excel at different things. In this case, I'd expect that Mendeley will use its second-round funding to establish the institution-level licensing model, allowing institutions to efficiently share copyrighted materials, then use the proceeds to subsidize the free aspect of its service as well as underwrite further development. Whether that will ultimately prevail against an open source model is anyone's guess, but I think there's enough room for two players in this space.

Mr. Gunn said...

I think Mendeley and Zotero are suited to different groups of users. I mean, not everyone can or will want to run their reference manager as a Firefox extension.

The main difference is the last.fm angle. While both Zotero and Mendeley work great as paper and research material organizers, Mendeley goes a step further with exposing the data on what papers are being read the most, bookmarked the most, and potentially cited the most. They also highlight not only the open access papers they make freely available for download, but the self-archived papers, which may have been published in a toll-access journal, but which you can get as a reprint directly from the author's profile on Mendeley Web.

So while it's true that Mendeley isn't open source, I think that's kinda missing the point. The objects that need to be open for many academics are the research papers and other data objects, and Mendeley is simply the service that helps make those objects available.

To me, it's more about open access that open source, and Mendeley is certainly taking the lead in that area. They can earn a profit from their data services to institutions (in a similar way to Neilsen Media) while the users continue to enjoy free access to the content.

Unknown said...

Mr. Gann: it's not clear to me how Mendelay is more open access than Zotero (or any other software), since it's a proprietary solution, with restrictive terms of service (seems like users not even allowed to open data files), and no API at all. Will the API be free to everyone? Or will it cost money? Your Blogger profile says you work for Mendeley, so I think I'll take your insistence that Mendeley is "taking the lead" with a grain of salt.

Clay: As far as I know, Zotero doesn't put any restrictions on group sizes or file sharing. For Mendeley to say that limiting the group size avoids copyright problems is either naive or completely misleading. Sharing a file with one unauthorized person is still unauthorized. It seems a lot more likely that they have artificially limited groups to 10 people in order to be able to sell larger groups. That's perfectly reasonable as a business model, but they're not being up front about it if they're telling you that it's for copyright reasons.

On the open source front, what many people are finding troubling is Mendeley's free rider behavior. It would be like if OpenOffice came first, and then Microsoft repackaged that code for sale. This space doesn't really need another EndNote.

Clay Spinuzzi said...

Let me be clear that when I said that "My sense is that Mendeley limits file sharing ..." it was my inference rather than what Mendeley told me. So any naivete is mine; I'm certainly not trying to be misleading.

But I'm not clear about your accusation of free rider behavior. Mendeley and Zotero seem like distinct services operating in the same general space to me, more like Last.FM vs. Pandora than MS Office vs. OpenOffice or Twitter vs. identi.ca.

Clay Spinuzzi said...

Wait a minute. I was bothered by the "free rider" accusation, so I went back to look at the documented chronology for Zotero and Mendeley. And I have a question.

Zotero's changelog says that support for group libraries, including creating and joining public and private groups, was added on May 14, 2009:

http://www.zotero.org/support/changelog#changes_in_2.0rc1_january_26_2010

But Mendeley had online resource sharing - which is a central part of its service - from the beginning. That was the reason cited for its winning the Plugg startup rally - March 12, 2009.

http://plugg.eu/media/blog/p/detail/winners-for-plugg-start-ups-rally-2009-announced

Buckwheat, doesn't this suggest that although Zotero was around first, it added the robust community-sharing features -- the ones that I highlighted, the ones that make both services interesting, the ones that distinguish both from EndNote -- *after* Mendeley had won an award for those features? If so, doesn't that impact the "free rider" accusation you make? If not, could you provide a documented chronology?

Mr. Gunn said...

Buckwheat's quite right that this space doesn't need another Endnote, and it's quite clear the founders of Mendeley agree. Endnote, Mendeley, and Zotero are all fairly differentiated in their offerings and revenue plans.

I can briefly address the API questions:
Yes, there will be an API, through which all of a users own data will always be freely available to them. In addition, Mendeley has supported from the very beginning export in a variety of formats. Of course, you're welcome to open and look at your own data files. It's just an sqlite database. Have at it!

We're always going to be criticized because we accepted venture capital in order to grow faster, but I think the important thing for researchers is that research materials and data objects are made freely available. That's what open access and open data are about and Mendeley wouldn't serve researchers well if it didn't realize this. We've said so from the very start.

Victor said...

Hi Buckwheat,

Victor (one of the Mendeley founders Clay interviewed) here. I'd like to answer some of your questions and clarify others.

First, as Clay and Mr Gunn have written, we will have an API in the near future, and access to your individual and shared data will of course be free. Access to much of the aggregate data and statistics will also be free, but not without limits - both because we need to limit queries to our servers for performance reasons, but also because some of it will be part of future premium packages.

Second, I also don't understand your "freerider" claim. We used some of Zotero's Word/OpenOffice plugin code, which is licensed under the ECL license, in our Word/OO plugins, which are also open-sourced under the ECL license. This re-use (which to me seems to be in the spirit that open source intended) is acknowledged in the license, in our software's "About" dialog, and on our website. Moreover, we are developing an open-source (ECL-licensed) WYSIWYG citation style editor: http://bitbucket.org/csledit/csl-wysiwyg-editor/. This editor will produce Citation Style Language (CSL) files which will benefit both Zotero and Mendeley.

Third, file sharing. Contrary to what you are saying, the number of people you are sharing with does indeed matter. There are no hard rules, of course, but when arguing that scholarly sharing falls under the "fair use" exemption, it makes a difference whether the sharing is limited versus unlimited.

Best,
Victor

Dan Cohen said...

FWIW, as Buckwheat notes, most of the functionality in Mendeley has been part of the vision of Zotero since 2006, when we were funded to add cloud syncing, social features like groups, and recommendations. I don't much care about precedence, but while I'm at it I would also note that I've been using "iTunes for your research" to describe Zotero since 2006. As they say, imitation is the sincerest form of flattery, and so I guess we're feeling very flattered over here at Zotero HQ.

Anyway, we're delighted that the Zotero project sparked innovation in a software field that was stagnant, and look forward to continuing to build our open platform for research with new functionality such as the features that will result from our alliance with the Internet Archive.

Victor said...

Dan - so I guess the team that must be blushing up to their ears right now for all the iTunes flattery is at the Apple HQ in Cupertino :-)

Sean said...

I'm also not sure what exactly Buckwheat had in mind when s/he called Mendeley a "free rider, but the company has sought to profit from open-source software in ways that have at times flouted the law and continue to run counter to the spirit of free and open-source software. For example, Mendeley for months distributed without any attribution hundreds of style files that it downloaded from zotero.org. Even today, there is no signal on Mendeley's style page of the provenance or authorship of these styles. Some were produced by Zotero developers, and the rest came from Zotero users. To my knowledge not a single Mendeley user or developer has ever produced one of these (now well over a thousand) style files, which are absolutely essential for the generation of properly formatted references. Moreover, the code behind the actual insertion of these references into word-processing documents was also borrowed from Zotero, again initially without any attribution. Only when confronted by multiple parties did Mendeley finally begin to admit such use. Needless to say, this core functionality is at the very heart of research management software. It is true that Mendeley has undertaken to develop a graphical interface to allow users to edit these files, but this project has not yet born any fruit.

Mendeley's devil-may-care attitude toward open-source software would be insignificant if the company did not also lay claim to supporting open access. As a scholar I find this attempt to bear the mantle of open access deeply troubling. For example, Mendeley frequently trumpets its "synchronization" with Zotero and CiteULike. Yet this sync is, perhaps unsurprisingly, only one-way, which cannot but suggest the desire for vendor lock-in. One might want to review the company's terms of use should one ever want to get all of one's data out of Mendeley. For example, when using Mendeley you agree not to do any tinkering with the software at all. Mendeley might reply that they support exporting data via such formats as BibTeX, but these formats are anything but lossless and do not really represent all of a user's "data," which is not just bibliographic information but more importantly includes things like the way that a user has structured that information into collections, annotated it, or any of the myriad other ways that "research" is not just a list of citations.

Even more disturbing are Mendeley's provisions regarding "acceptable use" and "content standards." It would seem, for example, that even using Zotero or other software to access mendeley.com is prohibited to Mendeley users:

"You may not use any system, software or other device or program (whether automated or otherwise and including any spider, bot, scraping or data extraction tools) to extract content, data, information or other material from the Site for commercial or any other purposes without our express prior written permission."

The weirdness only continues with the requirement that content stored at mendeley.com must "be accurate," whatever that means.

Should Mendeley decide that you have violated any of these provisions, they will proceed with "legal action against you including proceedings for reimbursement of all costs on an indemnity basis" and disclose your identity to law enforcement authorities and any third party making claims against you. Mendeley even goes so far as to make wild claims about who can link to its website and how. I wonder if even this blog comment qualifies as actionable in their minds?

It's certainly possible that Mendeley will ultimately make the right decisions, but for now all concrete evidence points to the contrary.

Mr. Gunn said...

Sean -

It's clear that there's some ill-will regarding the use of the site styles, but I don't think the past lack of attribution (they do include attribution now) is evidence of malice or any kind of "devil may care" attitude on Mendeley's behalf. In fact, it's my understanding that they reached out to Zotero multiple times and were either ignored or rebuffed regarding their involvement.

I can address the availability of user data. As Mendeley's founders have said, and as I mentioned above, all a user's data will always be available to them. They don't include the caveat of only what comes out in the lossy .ris format, so one has to take them on their word that it includes everything, including annotations and other added information.

I shouldn't be commenting here, but it's the claims Mendeley is somehow against open access (supported by rather selective quoting from the terms of use) that are the most personally risible. Mendeley highlights open access and self-archived papers directly on the site itself, and have plans for the research stats to correct the long-standing undervaluing of publications in open access vehicles.

I've heard these same arguments before (and if this goes true to form the next step would be to suggest I'm only saying this because I work for Mendeley) so rather than re-post my bonafides, I'll just link to this tidy summary at the chronicle forums.

I think the best way forward now is to have an open and honest discussion about what the next steps should be.

Victor said...

Hi Sean,

good to hear from you again. As William has pointed out, I've been in touch with you directly in the past about a collaboration on many of these issues (API, two-way sync, citation styles). You didn't seem interested in a dialogue then, and from your continuous referral to Mendeley in the third person (as if William and I weren't here), I gather that this unfortunately hasn't changed.

I'd prefer that any issues that you have with Mendeley would be resolved directly instead of via Clay's blog, so I'll just say a few things for the record:

As required by your ECL license, the use of your code had been acknowledged in the license in Mendeley's installation folder from the start. After you requested a link from the "About" dialogue and the website, we added those as well.

Also, I have to admit to finding it slightly ironic that you would criticise Mendeley for using "Zotero's" style files, when Thomson Reuters sued Zotero over it's conversion/use of "EndNote's" style files. Isn't the point of CSL that citation styles shouldn't belong to anyone, and shouldn't lock you in with a particular vendor? That's also why we are developing the open-source citation style editor, which we invited you to collaborate on, but you declined, only to say now that it has "not yet born any fruit". I beg to differ - you can clone the repository here: http://bitbucket.org/csledit/csl-wysiwyg-editor/ or play with the work-in-progress here: http://csleditor.quist.de/csleditor/show/1/example-citation-style.

Regarding the "data lock-in" you're suggesting: Besides BibTex, RIS and the upcoming open API, Mendeley also uses a simple SQlite database for storing data locally, not a proprietary format. As I have mentioned to you before, the one-way sync with CiteULike was their proposal to get the collaboration off the ground quickly while they prepare their API, and we're still jointly aiming for a two-way sync. Also, working on a two-way sync with Zotero was what I proposed to you several months ago, because we didn't have a Firefox plugin to access your API. We are also involved in several open access initiatives right now, but you might say that they have "not yet born any fruit" - please bear with us, we're working on it.

Lastly, the legal terms you are referring to. These are, of course, for protecting Mendeley from malicious uses of our software and website, not for suing people randomly.

To underline what William said - we don't wish to pick a fight with anyone, and we'd still love to meet and talk about a common way forward if you're interested.

Best wishes,
Victor

Sean said...

I didn't use anyone's names because I didn't want to get into ad hominem attacks. And why bother? It's abundantly clear that Mendeley's words speak for themselves, whether the onerous terms of use or the as-yet-unfounded claims to support open access.

They don't include the caveat of only what comes out in the lossy .ris format, so one has to take them on their word that it includes everything, including annotations and other added information.

Exactly.

Also, I have to admit to finding it slightly ironic that you would criticise Mendeley for using "Zotero's" style files, when Thomson Reuters sued Zotero over it's [sic] conversion/use of "EndNote's" style files.

The above remark once again betrays a complete lack of understanding of FOSS. No one from the Zotero project has ever claimed ownership of CSLs, and why would we? And how perfectly tone deaf to raise the specter of a lawsuit based on the purported violation of a proprietary software license.

Lastly, the legal terms you are referring to. These are, of course, for protecting Mendeley from malicious uses of our software and website, not for suing people randomly.

Translation: "Trust me."

Frank Bennett said...

Hi, my name is Frank Bennett, I'm the author of the recently released citeproc-js processor that implements CSL 1.0. I wasn't part of the community when the events discussed here took place, and I don't have any views on who said what to whom at this or that point in time. I'm excited, though, about the possibilities of the CSL standard -- excited enough to invest several hundred hours of my time over the past year in design discussions and work on the new processor. I'm not alone, and that's what makes the development process so rewarding.

Personally, I'm glad to see that CSL is finding service in multiple contexts, but it's early days for the standard, and promotion is certainly an issue.

On that count, and although one might take issue with the strident tone of some of Sean's comments above, I must say that I see a lot more support for CSL as a standard coming from Zotero than I do from Mendeley. Apart from the fact that Zotero developers got the ball rolling by building the original 0.8 processor and exposing the source (which led directly to my own recent coding efforts), Zotero users are generally aware that citation formatting is driven by CSL, and that it is an active project in its own right. Mendeley, not so much, unfortunately. In fact, I was a bit surrprised at the result of the following Google searches:

Zotero
"csl style" site:zotero.org: 444 hits
"csl styles" site:zotero.org: 229 hits
"citation style language" site:zotero.org: 1030 hits

Mendeley
"csl style" site:mendeley.com: 4 hits
"csl styles" site:mendeley.com: 0 hits
"citation style language" site:mendeley.com: 7 hits

One of the hits is from a 2008 release notice that indicates the processor was not touched in an upgrade. Another, from this year, mentions CSL in a bugfix listing (there was apparently a problem with non-latin characters). All other mentions, sparsely scattered though they be, are from the user forums. There is plenty of interest and pressure for the modification and extension of "styles", but judging from forum postings, there seems little awareness in the Mendeley community that these files are based on CSL, a general standard, with its own existence independent of project Mendeley.

I was a little disappointed to see these results, I'm afraid. If a community of users is unaware of our very existence, that's a community of potential contributors to development and evangelism that we're losing out on.

It would be nice to see a little more balance in the numbers above one day. But for the present, I'd have to side with Sean, rather, and say that if Mendeley is committed to open systems, that message doesn't seem to be communicating itself to the user base. I think you need to push it a bit more for it to reach them.

It's probably not necessary to mention, but citeproc-js carries an attribution license that is meant to help give the dissemination and publicity side of CSL a little boost. Bascially, anyone who uses the processor code in production, in a client or on a server, needs to provide a visible reference to CSL, and a link to the project home at http://citationstyles.org/. It's an obvious thing that one might choose to do anyway (Thomson Reuters Scientific lawsuit aside, CSL is an intelligent choice, and it's in your interest to tell your users and investors that you've opted for it). It's also good for everyone, because we get network effects as the standard spreads.

But more immediately, support for CSL 0.8 styles will dry up pretty rapidly when Zotero moves to 1.0, there's no reverse style converter, implementing CSL 1.0 is non-trivial (been there, done that), and encapsulating the citeproc-js processor would save developer time that might be better placed elsewhere. So there are multiple incentives for attribution, some of them purely pragmatic.

Will certainly be look forward to future developments!

Mr. Gunn said...

Just to briefly update this comment thread: Mendeley has now implemented support for CSL 1.0 and is actively working with the developers to allow people to author their own citation styles and submit them to the CSL repository. This should serve as an effective way to promote the standard.

Also worth mentioning: The API is now live at http://dev.mendeley.com and the research catalog is CC-licensed. I would think that sufficiently addresses any concerns about data portability.

Frank Bennett said...

Yes indeed! Carles Pina has provided invaluable feedback on the citeproc-js processor in recent months.

Clay Spinuzzi said...

Thanks for the update!