Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
Modalities
3D
Audio
Document
Geospatial
Image
Tabular
Text
Time-series
Video
Size (rows)
Reset Size
10M
100M
Format
json
csv
parquet
optimized-parquet
imagefolder
soundfolder
webdataset
text
arrow
Evaluation
Benchmark
Apply filters
Datasets
6,227
Full-text search
Edit filters
Sort: Trending
Active filters:
10M<n<100M
Clear all
open-index/hacker-news
Viewer
•
Updated
2 minutes ago
•
47.3M
•
1.43k
•
80
nvidia/Nemotron-Pretraining-Specialized-v1.1
Viewer
•
Updated
8 days ago
•
19.8M
•
2.2k
•
22
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
89.3k
•
1.15k
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer
•
Updated
Nov 24, 2025
•
91.5M
•
205k
•
63
amphion/Emilia-Dataset
Viewer
•
Updated
Feb 28, 2025
•
54.8M
•
55.6k
•
439
HuggingFaceM4/FineVision
Viewer
•
Updated
Oct 21, 2025
•
24.2M
•
126k
•
474
Brianferrell787/financial-news-multisource
Viewer
•
Updated
Nov 29, 2025
•
57.1M
•
2.36k
•
66
sarulab-speech/commonvoice22_sidon
Viewer
•
Updated
Oct 8, 2025
•
15.1M
•
615
•
23
Shrijanagain/ShORT-Hinglish_Dataset-10m
Viewer
•
Updated
11 days ago
•
10M
•
34
•
4
himalaya-ai/nepali-corpus-compile
Viewer
•
Updated
2 days ago
•
16.8M
•
287
•
3
PleIAs/French-Science-Commons
Viewer
•
Updated
about 5 hours ago
•
42.6M
•
11
•
3
ncbi/pubmed
Updated
Jan 26, 2024
•
950
•
160
kz-transformers/multidomain-kazakh-dataset
Viewer
•
Updated
Oct 2, 2025
•
80M
•
1.74k
•
27
TigerResearch/pretrain_zh
Viewer
•
Updated
Jun 14, 2023
•
16.9M
•
1.03k
•
122
common-canvas/commoncatalog-cc-by
Viewer
•
Updated
May 16, 2024
•
14.6M
•
9.1k
•
34
paperswithbacktest/Stocks-Daily-Price
Viewer
•
Updated
11 days ago
•
24.9M
•
10.4k
•
47
UCSC-VLAA/MedTrinity-25M
Viewer
•
Updated
Oct 11, 2024
•
24.9M
•
4.01k
•
197
agibot-world/AgiBotWorld-Alpha
Viewer
•
Updated
Sep 29, 2025
•
49.8M
•
15.9k
•
214
OmniAICreator/ASMR-Archive-Processed
Viewer
•
Updated
about 24 hours ago
•
18.9M
•
30k
•
85
ibragim-bad/github-repos-metadata-40M
Viewer
•
Updated
Jan 18
•
41.1M
•
125
•
20
mvp-lab/LLaVA-OneVision-1.5-Instruct-Data
Viewer
•
Updated
Nov 21, 2025
•
21.9M
•
58.3k
•
67
Open-Bee/Honey-Data-15M
Viewer
•
Updated
9 days ago
•
14.8M
•
38.3k
•
113
PleIAs/SYNTH
Viewer
•
Updated
Nov 11, 2025
•
68M
•
92.7k
•
254
ginkgo-datapoints/GDPx4
Viewer
•
Updated
14 days ago
•
29.9M
•
72
•
4
MIL-UT/Japanese-Medical-VQA-12m
Viewer
•
Updated
5 days ago
•
12.1M
•
1.2k
•
4
Helsinki-NLP/opus-100
Viewer
•
Updated
Feb 28, 2024
•
55.1M
•
17.5k
•
224
google/wiki40b
Viewer
•
Updated
Mar 11, 2024
•
18.1M
•
12k
•
34
wmt/wmt14
Viewer
•
Updated
Apr 3, 2024
•
47.8M
•
8.21k
•
25
MLCommons/ml_spoken_words
Updated
Dec 6, 2022
•
1.91k
•
36
speechcolab/gigaspeech
Viewer
•
Updated
Feb 7
•
11.9M
•
8.53k
•
149
Previous
1
2
3
...
100
Next