大屠杀口述史情感分类稳定性

Arxiv cs.CL2026-04-01🔗 查看原文

以107,305条话语、579,013句大屠杀口述史为语料，评估三种预训练Transformer极性分类器，提出基于一致性的ABC分层法划分输出稳定性，并用T5情感分类做辅助分析。结果显示模型一致性低到中等，分歧主要集中在中性与极性边界；为在敏感历史叙事中审慎使用多模型三角化提供操作框架。

原文内容

arXiv:2603.28913v1 Announce Type: new
Abstract: Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.