Are there any large(ish) public data sets available that I can use in my Data Mining experiments?
This was selected as Best Answer Sure – at least two sources I can think of : Amazon has a group of data resources “”Public data sets” (link #1) including freebase (wikipedia et-al), human genome project, US census data The Enron email dataset, released after the company collapsed (link #2), which is particularly useful for testing applications like Sphinx and Apache Lucene search for raw speed on large volume text indexing and searches.