High dimension representation of human values in LLMs

Samuel Cahyawijaya*, Delong Chen*, Yejin Bang*, Leila Khalatbari,
Bryan Wilie*, Ziwei Ji, Etsuko Ishii, Pascale Fung*
HKUST
NAACL 2025

*Indicates Equal Contribution
gif

LLM Human Value visualizer using our proposed high-dimension human value embedding. The red color shows the tested LLM's value position. You can test your model with our visualizer.

Abstract

We introduce a high-dimensional neural representation capable of encapsulating symbolic human value distributions within Large Language Models (LLMs). It is a continuous, scalable representation learned in a self-supervised manner from value-relevant responses of 8 LLMs and evaluated across 15 open-source and commercial LLMs. By visualizing the human value embedding in LLMs, we shed light on the differences and similarities in value systems within LLMs, offering insights into their underlying value systems and their prioritization across different languages. This ultimately drives forward transparency and accountability in the design and deployment of LLMs.

Exploring the Human Value Embedding Map in LLMs

Figure a

The human value embedding is trained through a surrogate task, called value embedding learning, to learn a compact representation that contains maximized mutual information with value-relevant aspects of LLMs while discarding other confounding factors as much as possible. With the incorporation of value-eliciting QAs, the embedding applies multi-view self-supervised learning by maximizing mutual information across views to ensure capturing the shared value-relevant aspects across the two views while excluding other non-shared factors.

It is not a Sentence Embedding

experience

To ensure minimal sharing of linguistics aspect across views, we translate all the value-eliciting QAs to English and perform paraphrasing to avoid language-specific markers and increase the linguistics diversity. The embedding displays a strong capability surpassing all baselines by ~15% k-NN accuracy and 10-15% linear probing accuracy@10 on the LLM value identification task. While, word and sentence embedding representations perform poorly indicating that there are significant differences between value representations from the embedding and existing embedding representations.

Poster

BibTeX

@misc{cahyawijaya2024highdimension,
        title={High-Dimension Human Value Representation in Large Language Models}, 
        author={Samuel Cahyawijaya and Delong Chen and Yejin Bang and Leila Khalatbari and Bryan Wilie and Ziwei Ji and Etsuko Ishii and Pascale Fung},
        year={2024},
        eprint={2404.07900},
        archivePrefix={arXiv},
        primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
  }