大屠杀口述史情感分类稳定性

Arxiv cs.CL2026-04-01🔗 查看原文
以107,305条话语、579,013句大屠杀口述史为语料,评估三种预训练Transformer极性分类器,提出基于一致性的ABC分层法划分输出稳定性,并用T5情感分类做辅助分析。结果显示模型一致性低到中等,分歧主要集中在中性与极性边界;为在敏感历史叙事中审慎使用多模型三角化提供操作框架。
原文内容
arXiv:2603.28913v1 Announce Type: new
Abstract: Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.