Towards building a scholarly big data platform: Challenges, lessons and opportunities

Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung Hsuan Chen, Wenyi Huang, Suppawong Tuarob, Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

45 Scopus citations

Abstract

We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

Original languageEnglish
Title of host publication2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages117-126
Number of pages10
ISBN (Electronic)9781479955695
DOIs
StatePublished - 1 Dec 2014
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London, United Kingdom
Duration: 8 Sep 201412 Sep 2014

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Conference

Conference2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
Country/TerritoryUnited Kingdom
CityLondon
Period8/09/1412/09/14

Keywords

  • Big Data
  • Information Extraction
  • Scholarly Big Data

Fingerprint

Dive into the research topics of 'Towards building a scholarly big data platform: Challenges, lessons and opportunities'. Together they form a unique fingerprint.

Cite this