Linguistically Conditioned Semantic Textual Similarity

Jingxuan Tu; Keer Xu; Liulu Yue; Bingyang Ye; Kyeongmin Rim; James Pustejovsky

doi:10.48550/arxiv.2406.03673

Back

Preprint

Linguistically Conditioned Semantic Textual Similarity

Jingxuan Tu, Keer Xu, Liulu Yue, Bingyang Ye, Kyeongmin Rim and James Pustejovsky

06/05/2024

DOI: https://doi.org/10.48550/arxiv.2406.03673

Abstract

Computer Science - Artificial Intelligence

Computer Science - Computation and Language

Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.

Metrics

22 Record Views

Details

Title: Linguistically Conditioned Semantic Textual Similarity
Creators: Jingxuan Tu
Keer Xu
Liulu Yue
Bingyang Ye
Kyeongmin Rim
James Pustejovsky
Identifiers: 9924368787501921
Academic Unit: Michtom School of Computer Science; Benjamin and Mae Volen National Center for Complex Systems; Interdepartmental Program in Linguistics and Computational Linguistics
Language: English
Resource Type: Preprint

Linguistically Conditioned Semantic Textual Similarity

Abstract

Metrics

Details

Brandeis University Social media