Loading...
Researchers are investigating ways to make large language models forget specific kinds of data. Salvatore Raieli explores this complex question from both a theoretical and pragmatic perspective. The research focuses on developing a new architecture based on advective diffusion that combines the computational structure of message-passing neural networks (MPNNs) and Transformers. The goal is to improve the ability of AI language models to selectively forget certain information, which could have implications for data privacy and bias mitigation.
Sequence Length Independent Norm-Based Generalization Bounds for Transformers. (arXiv:2310.13088v1 [https://t.co/zjV5HgYw5a]) https://t.co/M3y1cHYYkn
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets. (arXiv:2310.13061v1 [cs.LG]) https://t.co/gPibY9RD7o
Mean Estimation Under Heterogeneous Privacy Demands. (arXiv:2310.13137v1 [https://t.co/wV6LxuXL4M]) https://t.co/qHgi2KF4Fr
Interaction Screening and Pseudolikelihood Approaches for Tensor Learning in Ising Models. (arXiv:2310.13232v1 [https://t.co/BMRR0LzNKS]) https://t.co/8bezNr1NWd
Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs. (arXiv:2310.13270v1 [https://t.co/zjV5HgYw5a]) https://t.co/wJbh43eUvg
Non-Negative Spherical Relaxations for Universe-Free Multi-Matching and Clustering. (arXiv:2310.13311v1 [https://t.co/zjV5HgYw5a]) https://t.co/RzK91NkSM6
DeepFDR: A Deep Learning-based False Discovery Rate Control Method for Neuroimaging Data. (arXiv:2310.13349v1 [https://t.co/zjV5HgYw5a]) https://t.co/HwXfx5a6iE
Optimal Best Arm Identification with Fixed Confidence in Restless Bandits. (arXiv:2310.13393v1 [https://t.co/zjV5HgYw5a]) https://t.co/yZ0N8c99UQ
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability. (arXiv:2310.13402v1 [https://t.co/zjV5HgYw5a]) https://t.co/M4xzPMdWMz
Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption. (arXiv:2310.13434v1 [cs.LG]) https://t.co/mnwUEGAVjr
Y-Diagonal Couplings: Approximating Posteriors with Conditional Wasserstein Distances. (arXiv:2310.13433v1 [cs.LG]) https://t.co/DLpcfvMS4D
Variational measurement-based quantum computation for generative modeling. (arXiv:2310.13524v1 [quant-ph]) https://t.co/sixo2KnFqM
Towards Understanding Sycophancy in Language Models. (arXiv:2310.13548v1 [https://t.co/x5f9xnJFAw]) https://t.co/c4cWgCfoIV
Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes. (arXiv:2310.13550v1 [cs.LG]) https://t.co/ik2ZgRkXdS
Optimal Transport for Measures with Noisy Tree Metric. (arXiv:2310.13653v1 [https://t.co/zjV5HgYw5a]) https://t.co/OjTwdGGloA
Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems. (arXiv:2206.00934v5 [https://t.co/2UYnDWkVUv] UPDATED) https://t.co/pmKtZ72TWB
Interpretable Sequence Classification Via Prototype Trajectory. (arXiv:2007.01777v3 [cs.LG] UPDATED) https://t.co/BV3wtdb2wM
Event-Triggered Time-Varying Bayesian Optimization. (arXiv:2208.10790v4 [cs.LG] UPDATED) https://t.co/eseSL1chPD
Trade-off Between Efficiency and Consistency for Removal-based Explanations. (arXiv:2210.17426v3 [cs.LG] UPDATED) https://t.co/GTuVPVX6De
On the Overlooked Structure of Stochastic Gradients. (arXiv:2212.02083v3 [cs.LG] UPDATED) https://t.co/65UInPGogS
Kernel Ridge Regression Inference. (arXiv:2302.06578v2 [https://t.co/kehXLTMwJD] UPDATED) https://t.co/TswNII1luj
Adaptive Selective Sampling for Online Prediction with Experts. (arXiv:2302.08397v2 [https://t.co/zjV5HgYw5a] UPDATED) https://t.co/jo35kitJHh
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning. (arXiv:2306.07818v2 [cs.LG] UPDATED) https://t.co/d8Khl3g1Ng
Verifiable Learning for Robust Tree Ensembles. (arXiv:2305.03626v3 [cs.LG] UPDATED) https://t.co/QjXj05jtn2
Trained Transformers Learn Linear Models In-Context. (arXiv:2306.09927v3 [https://t.co/zjV5HgYw5a] UPDATED) https://t.co/Gb6dbOJNRj
Predicting Battery Lifetime Under Varying Usage Conditions from Early Aging Data. (arXiv:2307.08382v2 [cs.LG] UPDATED) https://t.co/F8voH3qEXN
On the quality of randomized approximations of Tukey's depth. (arXiv:2309.05657v2 [https://t.co/zjV5HgYw5a] UPDATED) https://t.co/p6CXAoua0y
Modeling Supply and Demand in Public Transportation Systems. (arXiv:2309.06299v2 [cs.LG] UPDATED) https://t.co/5EzORfTw8b
On Double Descent in Reinforcement Learning with LSTD and Random Features. (arXiv:2310.05518v2 [cs.LG] UPDATED) https://t.co/KThYdhm9rK
Label Differential Privacy via Aggregation. (arXiv:2310.10092v2 [cs.LG] UPDATED) https://t.co/9dqd9fobO2
``The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions. (arXiv:2307.14502v2 [https://t.co/3pcQCkeyAA] UPDATED),'' George Close, Thomas Hain, Stefan Goetze, https://t.co/Rcmas47Hph
``Low-latency Speech Enhancement via Speech Token Generation. (arXiv:2310.08981v2 [https://t.co/mPAjnto8C8] UPDATED),'' Huaying Xue, Xiulian Peng, Yan Lu, https://t.co/ELyLU7m3Mb
In their new research, @mmbronstein, @qitianwu_, and @Chenxia58917359 explore "a new architecture based on advective diffusion that combines the computational structure of message-passing neural networks (MPNNs) and Transformers [...]." https://t.co/8pkWPEZA15
[CL] Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective M Zhong, C An, W Chen, J Han, P He [University of Illinois Urbana-Champaign & The University of Hong Kong & Microsoft Azure AI] (2023) https://t.co/r9Rg3XA2wO - The paper… https://t.co/wTRZlLSj84 https://t.co/31T4tkDia1
[LG] Approximating Two-Layer Feedforward Networks for Efficient Transformers R Csordás, K Irie, J Schmidhuber [The Swiss AI Lab IDSIA & Harvard University] (2023) https://t.co/AdUz3ppmaS - The paper presents a unified framework to understand methods for approximating two-layer… https://t.co/birWBw276Z https://t.co/zYJ8hi422C
How can we make large language models forget specific kinds of data? Salvatore Raieli explores a complex question from both a theoretical and pragmatic perspective. https://t.co/FLcgFwRM96