2010 release of the EVEX dataset

In 2010, the Turku Event Extraction System together with BANNER and the McClosky-Charniak Parser were applied to all abstracts in the 2009 distribution of PubMed. The dataset consists of 19.2 million occurrences. This data is available in text format following the Shared Task conventions. In addition to the full Shared Task format data, we provided a normalized version of the events in the EVEX dataset in an XML format. This XML version has been obsoleted for all practical purposes by the MySQL format release in 2011. The original files are still available for download for archival purposes, but should not be used for any new work.

Download

We strongly recommend that you subscribe to our low-traffic, announcement-only mailing list where we will inform of major releases and updates of the EVEX dataset. To subscribe, simply send an empty email to evexdb+subscribe@googlegroups.com.

Proceed to the download page of the 2010 release of the data.

License

The extracted events are licensed under the Creative Commons Attribution Share Alike license allowing free use of the data.
PubMed information contained in the event output, such as the abstract texts, is covered by the National Library of Medicine's terms.

Publications

Björne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T (2010)
Scaling up Biomedical Event Extraction to the Entire PubMed.
Proceedings of BioNLP 2010. [PDF]