> having a text copy of all articles in their database was some legal risk
the risk should've been the same with google's index, and yet they're dandy!
I think it's more easily explained by incompetence. Esp. when stop words like 'of' and 'the' are somehow included in the index. These are almost trivial to remove prior to indexing (any decent indexing library, such as lucene, would have a prepared list of stop words filter, and it's not like you even need to do any work to have it!).
the risk should've been the same with google's index, and yet they're dandy!
I think it's more easily explained by incompetence. Esp. when stop words like 'of' and 'the' are somehow included in the index. These are almost trivial to remove prior to indexing (any decent indexing library, such as lucene, would have a prepared list of stop words filter, and it's not like you even need to do any work to have it!).