Database search: problems with too simple pattern matching

An RSS/Atom newsreader with features comparable to commercial newsreaders.

Database search: problems with too simple pattern matching

Postby Macsico » Sun Apr 23, 2006 12:28 pm

I switched from another newsreader to Vienna a few months ago mainly for its database feature. I like the idea of just having all RSS feeds in one single database even with no internet connection which is good for notebook use. Everything was fine until today.

Today I created a new smart folder and was surprised that it gave me too many results on a very specific search string. My smart folder had only one parameter set: "Text includes ISNM" where ISNM is the abbreviation of "Initiative Neue Soziale Marktwirtschaft". But the smart folder found articles about the INSM and the German soccer star Jürgen Klinsmann, which last name includes a part of my search string.

I ran a few additional tests and it became clear that Vienna fails to pass the sand-test for search engines as defined by the German meta search engine MetaGer (http://meta.rrzn.uni-hannover.de/sand.html, only available in German). To run this sand test you just type in a string which is both a single word and part of other words. The string "sand" was chosen due to its frequent appearance in German words.

The sand-test found articles containing the following words (translations in parenthesis) in my Vienna database:

- Badesandalen (sandal or flip-flop)
- Versand (shipment)
- Versandtasche (shipment bag)
- versandt (shipped)
- Apfel-Sanddorn-Fruchtsaftgetränk (apple sallow thorn fruit juice)

None of the above words has anything to do with sand (which is the same in German and English). It seems that Vienna is just doing pattern matching regardless of word boundaries. This current behavior renders the search function/smart folders function a little bit useless due to the pure number of unwanted results which can show up.

Below are links to search systems which are capable of controlling the truncation of strings by the user:

http://www.dialogweb.com/help/CmdWhatTruncation.html
http://www.lib.uct.ac.za/infolit/truncation.htm
http://www.nlm.nih.gov/pubs/techbull/jf ... llkit.html

I understand that Vienna is not a fully fledged database system. But if it incorporates a database system it should behave similar to common systems and should avoid this strange kind of getting its search results by simple pattern matching.

Best regards from Frankfurt/Germany
Macsico
Harmless
 
Posts: 3
Joined: Sun Apr 23, 2006 11:28 am

Postby jeff_johnson_dev » Sun Apr 23, 2006 1:06 pm

Macsico,

Have you tried setting the condition to "Text is ISNM" rather than "Text contains ISNM"? (I think in German it would be "Text ist ISNM".)
jeff_johnson_dev
Vienna Team
 
Posts: 1365
Joined: Wed Mar 01, 2006 4:12 pm

Postby jeff_johnson_dev » Sun Apr 23, 2006 1:30 pm

Never mind. That doesn't seem to work. Improved text searching in smart folders is on our To Do list for Vienna 2.1.
jeff_johnson_dev
Vienna Team
 
Posts: 1365
Joined: Wed Mar 01, 2006 4:12 pm

Postby stevepa » Sun Apr 23, 2006 9:20 pm

SQLite supports the REGEXP expression along with a user-defined regexp function, so I think this probably won't be too hard to do in 2.1. I'll have a play with this tonight.
stevepa
Vienna Team
 
Posts: 447
Joined: Fri Jan 13, 2006 10:13 am

Postby stevepa » Sun Apr 23, 2006 10:58 pm

OK - I just implemented a simple phrase matching addition to the list of available string search operators. So:

text - has phrase - <phrase>

will match <phrase> when it appears as a word in any title or article.

This needs a bit more testing but it looks pretty solid and should be in 2.1.
stevepa
Vienna Team
 
Posts: 447
Joined: Fri Jan 13, 2006 10:13 am


Return to Vienna

Who is online

Users browsing this forum: No registered users