Skip to main content
Internet Archive's 25th Anniversary Logo

View Post [edit]

Poster: brewster Date: Mar 14, 2016 10:01am
Forum: texts Subject: Re: djvu files for new uploads


the workflow has changed but the filenames have not.

There is a new way to create the _djvu.txt files that does not go through the .djvu files. We went over and found 185,000 texts that could get .txt files from the _djvu.xml (which come from the abbyy.xml) and then created them.

There are still some truncated _djvu.txt files, but these are not easy to figure out what they are, so it will have to wait for a wider sweep. and that has not been scheduled.

Reply [edit]

Poster: tylerox Date: Jul 4, 2016 10:02am
Forum: texts Subject: Re: djvu files for new uploads

Could you be so kind as to reupload

I can't find this CD in flac anywhere. hznk you so much in advance!

Reply [edit]

Poster: Nemo_bis Date: Mar 15, 2016 11:17am
Forum: texts Subject: Re: 185,000 got .txt files

185,000 texts with a .txt file is great news! Thank you for improving so many texts; it makes a lot of sense to extract the OCR text "directly" when available. However I'm still interested in my question above on statistics.