Probabilistic Language Models With Model Efficiency And Data Efficiency