Redirected from: RedPajama dataset

Definition: AI dataset

A volume of data used to train an AI model. There are many bodies of data that are combined to train small and large language models. For example, the RedPajama AI small model dataset comprises 300 billion data items from books, GitHub, Wikipedia and other sources. The Dolma dataset's three trillion data items came from sources such as Reddit, Project Gutenberg, Wikipedia and Wikibooks (Dolma stands for Data to feed OLMo's Appetite). See OLMo, large language model and Hugging Face.

misc

Term of the Moment

close

Look Up Another Term

Redirected from: RedPajama dataset

Definition: AI dataset