Microsoft scales Transformer sequence length to 1 billion tokens

July 8, 2023 / admin / 0 Comments

LongNet, a new Transformer variant introduced in recent research by Microsoft, has successfully scaled sequence lengths to over 1 billion tokens without compromising shorter sequence performance. Its key innovation, dilated attention, allows an exponential expansion of the attentive field with growing distance. The model exhibits linear computational complexity and logarithmic token dependency, while also demonstrating strong performance on long-sequence modeling and general language tasks.

NLP

Microsoft NLP Transformer

Microsoft scales Transformer sequence length to 1 billion tokens

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories