TY - GEN
T1 - CiteSeerX
T2 - 28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014
AU - Wu, Jian
AU - Williams, Kyle
AU - Chen, Hung Hsuan
AU - Khabsa, Madian
AU - Caragea, Cornelia
AU - Ororbia, Alexander
AU - Jordan, Douglas
AU - Giles, C. Lee
N1 - Publisher Copyright:
Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2014
Y1 - 2014
N2 - CitcSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
AB - CitcSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
UR - http://www.scopus.com/inward/record.url?scp=84908200442&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:84908200442
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 2930
EP - 2937
BT - Proceedings of the National Conference on Artificial Intelligence
PB - AI Access Foundation
Y2 - 27 July 2014 through 31 July 2014
ER -