Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huh, I wonder, can I download reddit? Like, all the text posts, ignoring images. I wonder how big of a db that is and how hard would it be to crawl it myself. It can't be more than a few gb of data. I mean, at this point there is a lot of information there that is just begging to be leveraged.


Pushshift has a monthly comment[1] and submission data dump that you can download. Last June 2021's (comment) size was 20+ GB compressed in ZS.

[1]- https://files.pushshift.io/reddit/comments/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: