Vienna 2.4 Database Suggestion

An RSS/Atom newsreader with features comparable to commercial newsreaders.

Vienna 2.4 Database Suggestion

Postby wesley.spikes » Wed Nov 12, 2008 11:16 pm

Yes, I've titled this as Vienna 2.4. While I recognize that this release is a ways out there, I do not suggest immediate implementation of this database alteration, due to the time it may take to process.

Currently, the message body is stored alongside the basic information of the message, in the Messages table. This design can cause severe delays and excessive memory usage when loading messages from a large table. The proposed change is to create a new table with two columns (message_id, text) and move all text values into that new table, with their coresponding message_id, of course.

I also suggest a revert on the database schema, of sorts, to move enclosure information into it's own table, for the same reason as the message text change, as well as to support futher enhancements in the way of enclosures in the future builds.

The suggested SQL for these text change is as follows:
CREATE TABLE Messages_bodies AS SELECT message_id, text FROM Messages;
CREATE TABLE Messages_new AS SELECT [[comma-seperated list of all columns except the removed columns]] FROM Messages;
[[Optionally, make a verification procedure here, to ensure that all data transferred correctly.]]
DROP TABLE Messages;
ALTER TABLE Messages_new RENAME TO Messages;

I'm still working on trying to redesign enclosure support, as well as learning Objective-C to program in support for it.

Input appreciated.
Last edited by wesley.spikes on Fri Nov 14, 2008 4:27 pm, edited 1 time in total.
wesley.spikes
Harmless
 
Posts: 10
Joined: Fri Nov 07, 2008 7:11 pm

Re: Vienna 2.4 Database Suggestion

Postby wesley.spikes » Wed Nov 12, 2008 11:21 pm

wesley.spikes wrote:This design can cause severe delays and excessive memory usage when loading messages from a large table.


As a point of reference, a quick Google search already revealed a couple people who seem to complain of Vienna-RSS being slow at points. One such comment is quoted below.
Jon wrote:I use Vienna. And I hate it, mostly because it's slow with occasional periods of very slow.

Source: Wired Blog Entry Comment

Edit: Before anyone comments/reads, I know that the comment does not point to the activities that are causing the slow down, but I cited it as it illustrates exactly what sort of feelings an end user can have on such issues.
wesley.spikes
Harmless
 
Posts: 10
Joined: Fri Nov 07, 2008 7:11 pm

Re: Vienna 2.4 Database Suggestion

Postby jeff_johnson_dev » Thu Nov 13, 2008 5:36 pm

Hi Wesley.

The changes sound like a good idea. The Development build configuration of Vienna already logs database transaction times, I believe, so it should be fairly easy to test how it affects performance.

I think that the enclosure changes should be fairly easy, because as you noted before, the database structures aren't even fully utilized yet by Vienna. The messages changes will be more difficult, but we do already have some code that only lazily loads message bodies.

That blog comment, though, is almost two years old. ;)
jeff_johnson_dev
Cocoaforge Admin
 
Posts: 1371
Joined: Wed Mar 01, 2006 4:12 pm

Re: Vienna 2.4 Database Suggestion

Postby wesley.spikes » Fri Nov 14, 2008 12:11 am

Heh, yea, as a point of note, if this is implemented, we'll definately want to do a full status window for this upgrade... I just timed it out on a 4MB database file locally, and while the basic operation of the changes could still be improved upon (I did the timed tests in Python on Win32, XP, fairly high-end system). Assuming that in the upgrade process we did a VACUUM statement to clean out all the excess left behind by the reworking of the database, I'm looking at 2.6 seconds average for a 5MB file.
Steps: RENAME, CREATE, CREATE, DROP INDEX, DROP INDEX, DROP TABLE, CREATE INDEX, CREATE INDEX, REINDEX, VACUUM.

Raw data from my timed tests is available here:
http://www.tyrannobyte.com/vienna-rss/d ... t-data.txt

The data is outputted at semi-colon seperated data (not comma-seperated, due to SQL having commas in it)

Any ideas on how this conceptual change can be improved upon is appreciated.

Notes:
- REINDEX was executed as with the changes, I'd rather be safe than sorry. Being that this accounts for only approximately 5% of the run time, it isn't too bad of a time-consumer. Possibly may be omitted.
- VACUUM was executed due to necessity. At least in my SQLite3 database file, there was a fair chunk of un-unsed space just sitting in the file being used. This trimmed about 1 MB off my output file size. Results may vary. Perhaps offer the VACUUM service as an optional step that users may elect to have run.

Edit: Updated link.
wesley.spikes
Harmless
 
Posts: 10
Joined: Fri Nov 07, 2008 7:11 pm


Return to Vienna

Who is online

Users browsing this forum: josh64x2