MSC Thesis Project on attribution study of large language models
Scientific writing requires a combination of composing novel statements and linking these with published facts. Modern natural language processing (NLP) systems tend to focus on one or the other of these aspects, but not both. Language models such as GPT2 are good at producing abstractive statements, but what source should be cited as evidence? When applying a large language model like GPT2 or T5 to generate a scientific statement, we are interested in identifying the samples in the training dataset that contribute most to the generation of this statement.
Finding relevant papers for statements that are missing proper citations has been previously addressed as a citation recommendation task. Using the statement text as a query, we can find relevant papers from a paper database using techniques such as embedding-based nearest neighbor search or neural network-based scoring systems. Although this text retrieval scheme is effective, it does not help to answer our previous question, because the retrieved papers are not guaranteed to be the ones most responsible for the generation of the statement.
In this project, we want to address this problem by quantitively analyze which training samples are most responsible for the generating of a specific text. Our idea was inspired in part by the work of Koh and Liang [1] who proposed an influence function that models the impact of a small perturbation of a certain training example on the prediction performance, typically in the classification task. Here we plan to apply this technique to text generation tasks, use the influence function as a measure to score and rank training examples, and finally evaluate the effectiveness of this approach for finding reasonable supporting documents of a generated text.
MSC students will be primarily involved in learning about influence functions, exploring ways to apply influence function techniques to text generation tasks, and finally identifying the most responsible training examples for the generated text, such as generated summaries or citation sentences. The ideal candidate for this position has a solid background in machine learning and Python programming, and a strong desire to learn new things. Knowledge of machine learning frameworks such as PyTorch or Tensorflow is certainly a plus.
Keywords: natural language processing, citation recommendation, explainable AI, adversarial attacks, information retrieval.
Contact: Jessica Lam Jia Hong (lamjessica (at) ini.ethz.ch), Nianlong Gu (nianlong (at) ini.ethz.ch), Richard Hahnloser (rich (at) ini.ethz.ch).
References
[1] Koh, Pang Wei, and Percy Liang. "Understanding black-box predictions via influence functions." International conference on machine learning. PMLR, 2017.