The launch of StarCoder2, a state-of-the-art code-generating AI, marks a significant advancement in the field of coding and software development. Developed through a collaborative effort by BigCodeProject, ServiceNow, Hugging Face, and NVIDIA, StarCoder2 is built on The Stack v2, the largest code dataset with over 900 billion tokens. It has been trained on a 16k token context and repo-level information across more than 4 trillion tokens, supporting over 600 programming languages. StarCoder2 comes in various sizes, including 3B, 7B, and 15B, with the 15B model trained on 4.3 trillion total tokens via 4.5 epochs. This new iteration outperforms its predecessor, StarCoder1, by a significant margin and offers the best overall performance in code completion tasks. Moreover, StarCoder2 is designed to run on most GPUs, making it accessible to a broader range of developers. Its open-access nature allows developers to use GenAI to build enterprise applications more efficiently, promising powerful performance and cost optimization. StarCoder2 matches CodeLlama 33B in code completion benchmarks at twice the speed and half the cost for training and production use, and it even beats CodeLlama 34B.
StarCoder 2 Is a Code-Generating AI That Runs On Most GPUs https://t.co/OisO4sJ6DQ
ServiceNow, Hugging Face, and Nvidia expand StarCoder2 coding LLM https://t.co/dNqUPVTHdN
Nvidia, Hugging Face and ServiceNow release new StarCoder2 LLMs for code generation https://t.co/iMWD9sT9XV https://t.co/6Phn9AY854
Congratulations to the @BigCodeProject and @SWHeritage communities on developing and releasing the #StarCoder2 code LLM foundation models and The Stack v2 dataset, and to our research partners @huggingface and @nvidia for training the models. #OpenScience #OpenSource #FTW https://t.co/NWjJ5CbIex
StarCoder2 AI code generator released with support for 619 programming languages https://t.co/WeFSWozYMs
Excited to have been a part of StarCoder2! We use Data Portraits to document the training data. One cool thing is the dataset contains lots of text about code - like pull requests, documentation, and papers⬇️ https://t.co/O0CW1XQsxI
Accelerate your coding tasks, from code completion to code summarization with StarCoder2, the latest state-of-the-art, open code #LLM built by @HuggingFace, @ServiceNow, and NVIDIA. Learn more 👉 https://t.co/48MClod9PP https://t.co/O1PUWKNSQN
Thrilled to share the release of StarCoder2! @ServiceNow , @huggingface, and @nvidia have partnered to deliver a family of open-access code LLMs to help developers everywhere tap the power of GenAI to build software better. Check out model checkpoints on the Hugging Face Hub! https://t.co/UbjPyceW45
StarCoder2 highlights ✨ - Trained on 16k token context window on 4T+ tokens 🦖 on The Stack dataset, on 900B+ tokens 🦕 - Comes in different sizes: 3B, 7B, 15B - Currently the state-of-the-art in code completion Use with 🤗transformers: https://t.co/DscgdoXj8E
StarCoder2 15B is trained on 4.3 trillion total tokens via 4.5 epochs!💫 Great work by @BigCodeProject ❤️ https://t.co/YWHG550CDR
BigCode presents StarCoder2 StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens https://t.co/8yGuNGfGPH
StarCoder2 is here!💫 A family of open LLMs enabling users with powerful performance and cost optimization. StarCoder 15B matches CodeLlama 33B in code completion benchmarks at 2x speed and 2x as cheap to train and use in production.🤯 https://t.co/zGUBDJi4IB https://t.co/DMb4itWxPD
StarCoder 2 is a code-generating AI that runs on most GPUs https://t.co/C4AXmj0Dar
StarCoder2 is the new SOTA code completion model! https://t.co/CQIbF2PHMz https://t.co/5x4PzAV9yr
Introducing StarCoder2 15B 🌟 > Beats CodeLlama 34B. > 16,384 context window. > Trained in 600+ programming languages from The Stack v2. > Trained on Fill-in-the-middle objective on 4 trillion + tokens. Along with that, we release smol-StarCoder2 3B & 7B ⭐ > 16K context… https://t.co/Sgi6eCK4Rv
Introducing StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance… https://t.co/LVclRcq5ZM
We're live with StarCoder2! https://t.co/nyS7YPszYc
StarCoder 2 is a code-generating AI that runs on most GPUs: https://t.co/c4JLvwpyJE by TechCrunch #infosec #cybersecurity #technology #news
Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open! https://t.co/fM7GinxJBd https://t.co/NUeRjHEa05
.@ServiceNow + @huggingface + @nvidia = 🚀 Together, we’ve teamed up to launch StarCoder2: a family of open-access LLMs to help developers use GenAI to build enterprise applications. https://t.co/JzmKBrC42e #PutAIToWork https://t.co/a3GnyJA8PP