Posts
-
Experimental Findings on LLM Training Efficiency
Practical experiments to speed up LLM training: ISL, VSL+DD, cross‑vocab embeddings (~30% speedup), multi‑token, MoE, with math on EOT retention and transfer. -
Stealing a Part of Production LM: Improving the Algorithm
Enhancing algorithms for extracting logit distributions from closed proprietary language models, focusing on leveraging the bias map feature and incorporating a normal distribution prior for more efficient extraction.