We propose and implement an alternative source of contextual features for word similarity detection based on the notion of lexico-grammatical construction. On the assumption that selectional restrictions provide indicators of the semantic similarity of words attested in selected positions, we extend the notion of selection beyond that of single selecting heads to multiword constructions exerting selectional preferences. Our model of 92 million cross-indexed hybrid n-grams (serving as our machine-tractable proxy for constructions) extracted from BNC provides the source of contextual features. We compare results with those of a grammatical dependency approach (Lin 1998), testing both against WordNet-based similarity rankings (Lin 1998; Resnik 1995). Averaged over the entire set of target nouns and 10-best candidate similar words, Lin's approach gives overall similarity results closer to WordNet rankings than the constructional approach does, while the constructional approach overtakes Lin's in approximating WordNet similarity for target nouns with a frequency over 3000. While this suggests feature sparseness for constructions that resolves with higher frequency nouns, constructions as shared contextual features render a much higher yield in similarity performance in approximating WordNet similarity than grammatical relations do. We examine some cases in detail showing the sorts of similarity detected by a constructional approach that are undetected by a grammatical relations approach or by WordNet or both and thus overlooked in benchmark evaluations.
|Number of pages||9|
|State||Published - 2013|
|Event||2013 Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, JSSP 2013 - Trento, Italy|
Duration: 20 Nov 2013 → 22 Nov 2013
|Conference||2013 Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, JSSP 2013|
|Period||20/11/13 → 22/11/13|