> So the code is made up entirely from the description? Yuck, it would be much
> better to replace the description with a lookup into another table. Or is
> this what you're trying to do?
Kind of. The items come in with a detailed description (i.e. quantity, etc) and the program should map those into codes that carry a generic description (i.e. what it is only). The codes are pre-existent and known (in other words I don't have to generate the code, just match it).
I would indeed write all the codes in a separate table for easy lookup anyway. There are a LOT of codes though. I don't know how many exactly, but at least 100,000! Fortunately, they have some kind of hierachy. For example every sub-category under the "1234" belong to the same category. Thus, in theory, I should be able to make a first pass sorting things into generic categories, and then subsequently sort them into sub categories. If I can already sort these into generic categories I'll consider it's a success :)
Handling a list of exceptions is a great idea. I'm thinking that I could test algorithms quickly by making them in PHP/mySQL and measure time it takes to process a fixed number of entries, then randomly checking a significant number of results to measure accuracy (lets say processing 1000 items, checking 100 results to get a % of success). Then go with whatever's most accurate.
All this is because I saw a friend working with a system like that, in which they do things via manual entry (ugh). When I heard that I immediately thought for myself "a program could help here", but on second thought I realized that I really didn't know what it would take. Then I remembered the fuzzy logic MAME uses to guess the game to run, and I thought there might be a correlation :)
[download a life]