How can I quickly match one of 150 million names to a code?
Assuming that you want the results pretty quickly, siskin’s approach should work just fine. And the Bloom filter doesn’t have to be that big; a 200 mb table should give you a 1% false positive rate, which is a mere 1000 extra hits in your case. This gives you 800 mb for the (in-process) database cache, which is way more than you need. And yes, if you can spend the whole day on generating the results, you can skip the bloom filter even if you have a steam-powered 1.5 QPS database engine. 🙂 (There are fancy database engines that can do bloom filtering all on their own, but I’m not up to date on the Java universe so I’m not sure how common that is.