Analysis of the relationship between headings and texts

Scientific writing involves complex content that is carefully organized for maximal readability. For example, scientific papers are typically divided into sections whose headings hint at the content of the section. Well-formalized section headings not only highlight the main information of the section text but also facilitate the reading experience. Therefore, it would be very useful to study the relationship between section headings and section texts.

However, the underlying mechanism of how section headings are produced still remains understudied. Section headings can differ in length and purpose: they may be a short indication of the role of the section text (e.g. introduction, results, conclusions) or be an informative summary of the section text (e.g. transporting plasmids using lentiviruses). Different types of scientific papers (e.g. original articles, case reports, technical notes, pictorial essays, reviews) may also have different preferences in types of the section heading.

For this project, students will work on (i) investigating the underlying mechanism of generating section headings from section texts, and (ii) analyzing the salience of sentences in the texts to the generated headings. In a similar vein, students will also explore the generation of paper titles from diverse paper texts and analyze which sentences were most salient to the generated titles. Students are also expected to do a brief literature review on the relevant state-of-the-art works and come up with novel exploratory approaches.

The ideal candidate for this project has a solid background in machine learning and Python programming, as well as a strong aptitude for learning new things. Being familiar with machine learning frameworks such as PyTorch is a plus.

For more information, please see the studies listed in the references [1,2,3].

Keywords: natural language processing and generation, extractive/abstractive summarization, explainable AI modeling

Contact: Yingqiang Gao (yingqiang.gao@ini.ethz.ch), Jessica Lam (lamjessica@ini.ethz.ch), Richard Hahnloser (rich@ini.ethz.ch)

References:

[1] Pang Wei Koh and Percy Liang. “Understanding Black-Box Predictions via Influence Functions”. In: ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, pp. 1885–1894

[2] Li H, Einolghozati A, Iyer S, et al. EASE: Extractive-Abstractive Summarization with Explanations[J]. arXiv preprint arXiv:2105.06982, 2021.

[3] Zhang R, Guo J, Fan Y, et al. Outline generation: Understanding the inherent content structure of documents[C]. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 745-754.

back