Web Wiz
Home
Home
Hosting Services
Hosting Services
Knowledgebase
Knowledgebase
Community
Community
Customer Login
Customer Login

Web Wiz Forums Homepage

  New Posts New Posts RSS Feed - 5000 newspaper articles
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

5000 newspaper articles

 Post Reply Post Reply
Author
jellejacob View Drop Down
Newbie
Newbie
Avatar

Joined: 19 June 2002
Location: Netherlands
Status: Offline
Posts: 10
Points: 10
Post Options Post Options   Quote jellejacob Quote  Post ReplyReply Direct Link To This Post Topic: 5000 newspaper articles
    Posted: 11 July 2003 at 2:53am
Hi all, I hope this is the right forum to post.

I dont know where to start so I'll start from the beginning.

My client asked me to convert 5000 printed newspaper articles to usable text with OCR (Optical Character Recognition) for use on there intranet which runs on an windows server. The also would like an option to search by keyword through the content of the whole range of articles.

I know how to convert these articles to usable text. But my question is, what is the most usable/fastest way of getting this articles in a database and which database MS Acces or SQL server? I've also have an option to export directly from my OCR program to HTML or XML.

I hope someone can give me some advice.

Back to Top
the boss View Drop Down
Senior Member
Senior Member
Avatar

Joined: 19 January 2003
Location: Saudi Arabia
Status: Offline
Posts: 1733
Points: 1735
Post Options Post Options   Quote the boss Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 5:31pm

export them to HTMl.. then use the Web wiz search application.. the application has a capability to search in the text of a file for given keywords..


Back to Top
Gullanian View Drop Down
Senior Member
Senior Member
Avatar

Joined: 04 January 2002
Location: England
Status: Offline
Posts: 4292
Points: 4292
Post Options Post Options   Quote Gullanian Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 7:33pm
Ah yes, but it uses the description meta tag to get the keywords, for each of the 5000 articles you would need to put some HTML keywords in..... that could obviously take a while!
Tom
Back to Top
the boss View Drop Down
Senior Member
Senior Member
Avatar

Joined: 19 January 2003
Location: Saudi Arabia
Status: Offline
Posts: 1733
Points: 1735
Post Options Post Options   Quote the boss Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 7:40pm
export them to XML then..use XSL for formatting.. and for searching too in combination with ASp i guess

Back to Top
WebWiz-Bruce View Drop Down
Admin Group
Admin Group
Avatar
Web Wiz Developer

Joined: 03 September 2001
Location: Poole, England
Status: Offline
Posts: 23183
Points: 6383
Post Options Post Options   Quote WebWiz-Bruce Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 1:24am
I would put all the ariticles in an SQL Server database, databases are made for searching so that should be simple enough
Back to Top
jellejacob View Drop Down
Newbie
Newbie
Avatar

Joined: 19 June 2002
Location: Netherlands
Status: Offline
Posts: 10
Points: 10
Post Options Post Options   Quote jellejacob Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 4:01am

Thanks people for your replies. I think I'll go for the SQL-server database option. XML is at this piont one bridge to far for me.

Thanks again!

Back to Top
Bunce View Drop Down
Senior Member
Senior Member
Avatar

Joined: 10 April 2002
Location: Australia
Status: Offline
Posts: 847
Points: 847
Post Options Post Options   Quote Bunce Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 7:29pm

When you say search by keyword, what do you mean?

Remember that if this was a subset of words from an article then you'd need to specify exactly what these keywords are...

If however you just mean to search every word in an article then its a lot easier.

Might pay to look into the 'Full-Text-Index' feature of SQL Server:
http://www.microsoft.com/sql/evaluation/features/fulltext.asp

Cheers,
Andrew

There have been many, many posts made throughout the world...
This was one of them.
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down


© Copyright 2001-2010 Web Wiz®. All rights reserved.
All prices are shown excluding VAT, which is charged where applicable. $, € prices shown as a guideline only.