Semantic information in deep transformer models
Postdoctoral Researcher at SISSA, Trieste
amascare [at] sissa [dot] it
I am a postdoctoral researcher at SISSA in Trieste, Italy. I obtained my Ph.D. in Statistical Sciences from the University of Padua and hold an M.Sc. in Mathematical Engineering from Politecnico di Milano. During my Ph.D., I spent six months at Rice University in Houston.
Outside of work, I enjoy swing dancing, creative writing, and reading.
My primary research focus is Bayesian models and algorithms for structured data. The common thread is that explicitly modelling the dependence between observations buys efficiency: narrower credible intervals, better imputation, and inference that remains reliable in high dimensions. My Ph.D. developed this idea in two directions. For envelope models, I studied mixtures of envelopes, posterior inference on the envelope dimension, and computationally cheaper relaxations of their geometric constraints. For trend filtering on graphs, I built shrinkage priors from the differential operators of the graph.
At SISSA, in Alessandro Laio's group, I study how to compare representations of the same inputs: two layers of a network, two models, two languages. Using the information imbalance, a measure of how well the neighbourhood structure of one representation predicts that of another, we have shown that the semantic information of large language models is spread across many tokens, concentrates in a set of central layers, and is systematically asymmetric across languages, modalities, and model scales. I am currently working on a probabilistic formulation of this problem, casting the alignment of metrics across representations as Bayesian inference.
Much of this work grows out of applied collaborations on input–output tables (the subject of my M.Sc. thesis), neurological signals, clinical prediction, and GPS mobility data. I am always interested in new ones.