Skip to main content
Internet Archive's 25th Anniversary Logo

1,310
UPLOADS


More right-solid

More right-solid

Show sorted alphabetically

More right-solid

Show sorted alphabetically

More right-solid

More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Internet Archive Web Crawls
Internet Archive Web Crawls
collection
1,619,426
ITEMS
40.4B
VIEWS
collection

eye 40.4B

The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine .
Topic: webwidecrawl
Worldwide Web Crawls
Worldwide Web Crawls
collection
634,919
ITEMS
16.8B
VIEWS
collection

eye 16.8B

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites. Since September 10th, 2010, the Internet Archive has been running Worldwide Web Crawls of the global web, capturing web elements, pages, sites and parts of sites. Each Worldwide Web Crawl was initiated from one or more lists of URLs that are known as "Seed Lists". Descriptions of the Seed Lists associated with each crawl may be provided as part of the metadata for...
Alexa Crawls
Alexa Crawls
collection
226,901
ITEMS
15.2B
VIEWS
collection

eye 15.2B

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Topics: web crawl, Alexa
Live Web Proxy Crawls
Live Web Proxy Crawls
collection
108,940
ITEMS
10B
VIEWS
collection

eye 10B

Content crawled via the Wayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org. Liveweb proxy is a component of Internet Archive’s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.
Cover Art Archive
Cover Art Archive
collection
1,620,765
ITEMS
9.9B
VIEWS
collection

eye 9.9B

To see or download images please visit MusicBrainz . The Cover Art Archive is a joint project between the Internet Archive and MusicBrainz , whose goal is to make cover art images available to everyone on the Internet in an organised and convenient way. Images in the archive are curated by the MusicBrainz community and go through a peer review process to ensure that they are correct, free of spam and of the best quality. If you would like to contribute cover art, create a MusicBrainz account...
Survey Crawls
Survey Crawls
collection
100,903
ITEMS
10.2B
VIEWS
collection

eye 10.2B

Survey crawls are run about twice a year, on average, and attempt to capture the content of the front page of every web host ever seen by the Internet Archive since 1996.
Topic: survey crawls
Archive Team
Archive Team
collection
3,256,063
ITEMS
3.9B
VIEWS
collection

eye 3.9B

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history. History is littered with hundreds of conflicts over the future of a community, group, location or...
Fix Broken Links Web Crawls
Fix Broken Links Web Crawls
collection
148,470
ITEMS
3.5B
VIEWS
collection

eye 3.5B

These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
Wikipedia Outlinks
Wikipedia Outlinks
collection
94,506
ITEMS
2.1B
VIEWS
collection

eye 2.1B

Crawl of outlinks from wikipedia.org . These files are currently not publicly accessible. from Wikipedia : Wikipedia is a multilingual, web-based, free-content encyclopedia project operated by the Wikimedia Foundation and based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the...
Common Crawl
Common Crawl
collection
25,892
ITEMS
541.4M
VIEWS
collection

eye 541.4M

Web crawl data from Common Crawl.
Wikipedia Near Real Time (from IRC)
Wikipedia Near Real Time (from IRC)
collection
18,250
ITEMS
1.6B
VIEWS
collection

eye 1.6B

This is a collection of web page captures from links added to, or changed on, Wikipedia pages. The idea is to bring a reliability to Wikipedia outlinks so that if the pages referenced by Wikipedia articles are changed, or go away, a reader can permanently find what was originally referred to. This is part of the Internet Archive's attempt to rid the web of broken links .
Topics: Wikipedia, Wikimedia
collection

eye 1.2B

Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
collection

eye 1.9B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
collection

eye 1.4B

Web wide crawl.
collection

eye 1.2B

The seeds for this crawl came from: 251 million Domains that had at least one link from a different domain in the Wayback Machine, across all time ~ 300 million Domains that we had in the Wayback, across all time 55,945,067 Domains from https://archive.org/details/wide00016 This crawl was run with a Heritrix setting of "maxHops=0" (URLs including their embeds) The WARC files associated with this crawl are not currently available to the general public.
GDELT
GDELT
collection
57,657
ITEMS
1.1B
VIEWS
collection

eye 1.1B

A daily crawl of more than 200,000 home pages of news sites, including the pages linked from those home pages. Site list provided by The GDELT Project
Topics: GDELT, News
Wordpress Blogs and the Pages They Link To
Wordpress Blogs and the Pages They Link To
collection
71,085
ITEMS
751.7M
VIEWS
collection

eye 751.7M

This is a collection of pages and embedded objects from WordPress blogs and the external pages they link to. Captures of these pages are made on a continuous basis seeded from a feed of new or changed pages hosted by Wordpress.com or by Wordpress pages hosted by sites running a properly configured Jetpack wordpress plugin.
Topics: Wordpress.com, blogs, jetpack
Wide Crawl Number 12 - started March, 14th 2015
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
1.3B
VIEWS
collection

eye 1.3B

Web wide crawl with initial seedlist and crawler configuration from January 2015.
Wide Crawl started April 2013
Wide Crawl started April 2013
collection
25,035
ITEMS
1.3B
VIEWS
collection

eye 1.3B

Web wide crawl with initial seedlist and crawler configuration from April 2013.
Wide Crawl Number 13
Wide Crawl Number 13
collection
46,050
ITEMS
935.2M
VIEWS
collection

eye 935.2M

Web Wide Crawl Number 13
Wide Crawl started June 2014
Wide Crawl started June 2014
collection
45,341
ITEMS
1.2B
VIEWS
collection

eye 1.2B

Web wide crawl with initial seedlist and crawler configuration from June 2014.
collection

eye 1.3B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Audio Books & Poetry
Audio Books & Poetry
collection
49,883
ITEMS
1.8B
VIEWS
collection

eye 1.8B

Listen to free audio books and poetry recordings! This library of audio books and poetry features digital recordings and MP3's from the Naropa Poetics Audio Archive, LibriVox, Project Gutenberg, Maria Lectrix, and Internet Archive users.
Community Images
Community Images
collection
345,843
ITEMS
692.7M
VIEWS
collection

eye 692.7M

Images contributed by Internet Archive users and community members. These images are available for free download. Please select a Creative Commons License during upload so that others will know what they may (or may not) do with with your images.
Topic: images
collection

eye 1B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wayback Indexes
Wayback Indexes
collection
554
ITEMS
1.1B
VIEWS
collection

eye 1.1B

Wayback indexes. This data is currently not publicly accessible.
Wide Crawl started August 2013
Wide Crawl started August 2013
collection
21,932
ITEMS
856.4M
VIEWS
collection

eye 856.4M

Web wide crawl with initial seedlist and crawler configuration from August 2013.
collection

eye 749.9M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wide Crawl started January 2012
Wide Crawl started January 2012
collection
30,373
ITEMS
749.9M
VIEWS
collection

eye 749.9M

Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
Wide Crawl started April 2012
collection
39,279
ITEMS
654M
VIEWS
collection

eye 654M

Web wide crawl with initial seedlist and crawler configuration from April 2012.
Wide Crawl started February 2014
Wide Crawl started February 2014
collection
9,806
ITEMS
552.1M
VIEWS
collection

eye 552.1M

Web wide crawl with initial seedlist and crawler configuration from February 2014.
collection

eye 644.6M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
National Library of Australia Crawls
National Library of Australia Crawls
collection
46,989
ITEMS
469.8M
VIEWS
collection

eye 469.8M

Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
Wide Crawl started September 2012
Wide Crawl started September 2012
collection
22,423
ITEMS
483.7M
VIEWS
collection

eye 483.7M

Web wide crawl with initial seedlist and crawler configuration from September 2012.
Host Screen Captures
Host Screen Captures
collection
17,458
ITEMS
172.7M
VIEWS
collection

eye 172.7M

Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Wide Crawl Started January 2013
Wide Crawl Started January 2013
collection
15,157
ITEMS
487.3M
VIEWS
collection

eye 487.3M

Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
collection

eye 362.4M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wide Crawl started October 2010
Wide Crawl started October 2010
collection
15,839
ITEMS
500.9M
VIEWS
collection

eye 500.9M

Web wide crawl with initial seedlist and crawler configuration from October 2010
.com survey started January 2011
.com survey started January 2011
collection
2,535
ITEMS
493.3M
VIEWS
collection

eye 493.3M

Survey crawl of .com domains started January 2011.
Topic: webcrawl
Wide Crawl started October 2011
Wide Crawl started October 2011
collection
12,648
ITEMS
463.8M
VIEWS
collection

eye 463.8M

Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Non-English Audio
Non-English Audio
collection
147,390
ITEMS
342.2M
VIEWS
collection

eye 342.2M

Non-English language collections contributed to the Open Source Audio collection are featured here.
Podcasts
Podcasts
collection
61,022,320
ITEMS
158.8M
VIEWS
collection

eye 158.8M

A great resource for podcasters: the Creative Commons  Podcasting Legal Guide .
Wide Crawl started March 2011
Wide Crawl started March 2011
collection
8,528
ITEMS
426.3M
VIEWS
collection

eye 426.3M

Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Internet Memory Foundation
Internet Memory Foundation
collection
1,918
ITEMS
239M
VIEWS
collection

eye 239M

Data crawled on behalf of Internet Memory Foundation . This data is currently not publicly accessible. from Wikipedia : The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.
Wikipedia Outlinks February 2012
Wikipedia Outlinks February 2012
collection
2,951
ITEMS
337.5M
VIEWS
collection

eye 337.5M

Crawl of outlinks from wikipedia.org started February, 2012. These files are currently not publicly accessible.
International News Crawls
International News Crawls
collection
8,716
ITEMS
238.2M
VIEWS
collection

eye 238.2M

Crawls of International News Sites
web_wk
web_wk
collection
9,973
ITEMS
288.4M
VIEWS
collection

eye 288.4M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Books for People with Print Disabilities
Books for People with Print Disabilities
collection
6,676,802
ITEMS
138.1M
VIEWS
collection

eye 138.1M

Free books for the  people with disabilities that impact reading.  If you have a disability that interferes with reading printed text then all of these books can be instantaneously available in your browser or via protected download. Want access? Individuals If you would like to apply for access (it is free),  make sure you have an Archive.org account and then  fill in this form to contact the Vermont Mutual Aid Society . If you are affiliated with any of...
Topics: print disabled, print disability
Alexa Crawl EG
Alexa Crawl EG
collection
1,678
ITEMS
290M
VIEWS
collection

eye 290M

Crawl EG from Alexa Internet. This data is currently not publicly accessible.
National Library of Spain Crawls
National Library of Spain Crawls
collection
6,742
ITEMS
255.6M
VIEWS
collection

eye 255.6M

Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
Spirituality & Religion
Spirituality & Religion
collection
388,816
ITEMS
171.6M
VIEWS
collection

eye 171.6M

Listen to sermons and lectures concerning religion and spirituality here.
Books to Borrow
Books to Borrow
collection
5,199,721
ITEMS
124.2M
VIEWS
collection

eye 124.2M

Books in this collection may be borrowed by logged in patrons.  You may read the books online in your browser or, in some cases, download them into Adobe Digital Editions , a free piece of software used for managing loans.  Please note that works in this collection are protected by copyright law (Title 17 U.S. Code) and copying, redistribution or sale, whether or not for profit, by the recipient is not permitted unless authorized by the rightsholder or by law. See FAQs about...
Internet Archive Books
Internet Archive Books
collection
3,339,829
ITEMS
131.8M
VIEWS
collection

eye 131.8M

Books contributed by the Internet Archive.
Topic: internet archive books
web_iq
web_iq
collection
2,637
ITEMS
245.5M
VIEWS
collection

eye 245.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Music, Arts & Culture
Music, Arts & Culture
collection
714,405
ITEMS
103M
VIEWS
collection

eye 103M

This collection features audio collections reflecting music, art and culture. Collections include the unique contemporary compositions and performances found in the Other Minds collection, the hundreds of popular songs from the early 20th Century found in the 78 RPM collection and oral history projects.
Alexa Crawl EI
Alexa Crawl EI
collection
1,408
ITEMS
198M
VIEWS
collection

eye 198M

Crawl EI from Alexa Internet. This data is currently not publicly accessible.
Serials in Microfilm
Serials in Microfilm
collection
3,983,257
ITEMS
22.1M
VIEWS
collection

eye 22.1M

Digitized version from Serials In Microform collection originally from NA Publishing. Record of the acquisition of the microfilm:  https://archive.org/details/SerialsOnMicrofilmCollection
Wikipedia Outlinks May 2011
Wikipedia Outlinks May 2011
collection
1,638
ITEMS
167M
VIEWS
collection

eye 167M

Crawl of outlinks from wikipedia.org started May, 2011. These files are currently not publicly accessible.
Periodicals
Periodicals
collection
4,021,130
ITEMS
22.3M
VIEWS
collection

eye 22.3M

Periodical publications including magazines, trade magazines, and journals.  Please peruse the growing list of publications .
Topics: periodicals, journals, serials, magazines
Movies
Movies
collection
83,810
ITEMS
357.7M
VIEWS
collection

eye 357.7M

Watch full-length feature films, classic shorts, world culture documentaries, World War II propaganda, movie trailers, and films created in just ten hours: These options are all featured in this diverse library! Many of these videos are available for free download.
Shallow Crawls
Shallow Crawls
collection
1,042
ITEMS
163.9M
VIEWS
collection

eye 163.9M

Shallow crawls that collect content 1 level deep including embeds. This data is currently not publicly accessible.
Alexa Crawl EH
Alexa Crawl EH
collection
1,218
ITEMS
164.6M
VIEWS
collection

eye 164.6M

Crawl EH from Alexa Internet. This data is currently not publicly accessible.
Sermons & Religious Lectures
Sermons & Religious Lectures
collection
323,230
ITEMS
99.2M
VIEWS
collection

eye 99.2M

A number of religious and spiritual organizations regularly upload their sermons and lectures to the Archive through the Open Source Audio collection. You may easily locate them here.
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
176.1M
VIEWS
collection

eye 176.1M

Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
Youtube Videos
Youtube Videos
collection
674,730
ITEMS
80.8M
VIEWS
collection

eye 80.8M

Captures of pages from YouTube. Currently these are discovered by searching for YouTube links on Twitter.
Topics: YouTube, Twitter, Video
Radio Show and Programs Archive
Radio Show and Programs Archive
collection
31,092,581
ITEMS
203.6M
VIEWS
collection

eye 203.6M

Alexa Crawl DX
Alexa Crawl DX
collection
1,442
ITEMS
164.8M
VIEWS
collection

eye 164.8M

Crawl DX from Alexa Internet. This data is currently not publicly accessible.
web_mon
web_mon
collection
3,809
ITEMS
137M
VIEWS
collection

eye 137M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
National Archives and Records Administration
National Archives and Records Administration
collection
12,089
ITEMS
113M
VIEWS
collection

eye 113M

National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
Television Archive
Television Archive
collection
9,318,171
ITEMS
218.2M
VIEWS
collection

eye 218.2M

Programs in  TV News Archive for research and educational purposes. The programs allow users to search across a collection of television news programs dating back to 2009 for research and educational purposes such as fact checking. Users may view short clips, share links to customized short quotes, embed customized short quotes, or borrow a copy of the full program.
( 1 reviews )
Wayback CDX Shards
Wayback CDX Shards
collection
1,214
ITEMS
141.9M
VIEWS
collection

eye 141.9M

CDX Index shards for the Wayback Machine. The Wayback Machine works by looking for historic URL's based on a query. This is done by searching an index of all the web objects (pages, images, etc) that have been archived over the years. This collection holds the index used for this purpose, which is broken up into 300 pieces so they fit into items more naturally and distribute the lookup load. Each of these 300 pieces is stored in at least 2 items, and then those are also stored on the backup...
Television Archive News Search Service
Television Archive News Search Service
collection
2,168,501
ITEMS
208.4M
VIEWS
collection

eye 208.4M

Items included in the Television News search service. Part of TV News Archive .
News & Public Affairs
News & Public Affairs
collection
1,461,284
ITEMS
192.1M
VIEWS
collection

eye 192.1M

An analysis of news and public affairs independent from traditional corporate media is available from this diverse video library. From Democracy Now's daily news program, to three days of TV news coverage following the 911 attacks, to Mosaic’s timely clips of Middle East newscasts, to UCSF's Tobacco Industry Videos: These collections offer an alternative way to view and interpret current news and public affairs. Many of these videos are available for free download.
Geocities Closing Crawl
Geocities Closing Crawl
collection
149
ITEMS
91.9M
VIEWS
collection

eye 91.9M

Geocities crawl performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Yahoo! GeoCities is a Web hosting service. GeoCities was originally founded by David Bohnett and John Rezner in late 1994 as Beverly Hills Internet (BHI), and by 1999 GeoCities was the third-most visited Web site on the World Wide Web. In its original form, site users selected a "city" in which to place their Web pages. The "cities" were metonymously named after...
web_tran
web_tran
collection
4,192
ITEMS
124.3M
VIEWS
collection

eye 124.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Wikipedia Outlinks July 2011
Wikipedia Outlinks July 2011
collection
1,011
ITEMS
116M
VIEWS
collection

eye 116M

Crawl of outlinks from wikipedia.org started July, 2011. These files are currently not publicly accessible.
Alexa Crawl EB
Alexa Crawl EB
collection
653
ITEMS
127.5M
VIEWS
collection

eye 127.5M

Crawl EB from Alexa Internet. This data is currently not publicly accessible.
Television
Television
collection
248,170
ITEMS
69M
VIEWS
collection

eye 69M

Collections of items recorded from television, including commercials, old television shows, government proceedings, and more.
Alexa Crawl DZ
Alexa Crawl DZ
collection
1,207
ITEMS
140.4M
VIEWS
collection

eye 140.4M

Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
78rpm Records Digitized by George Blood, L.P.
78rpm Records Digitized by George Blood, L.P.
collection
303,271
ITEMS
38.2M
VIEWS
collection

eye 38.2M

Newest uploads! Auto-78-twitter .  Through the Great 78 Project the Internet Archive has begun to digitize 78rpm discs for preservation, research, and discovery with the help of George Blood, L.P. . 78s were mostly made from shellac, i.e., beetle resin, and were the brittle predecessors to the LP (microgroove) era.   @great78project for uploads as they happen. Turntable used for 78rpm digitization of four simultaneous recordings with different needles. The...
Topics: 78rpm, digitization
Source: 78
Accelovation Crawl
Accelovation Crawl
collection
1,324
ITEMS
83.3M
VIEWS
collection

eye 83.3M

Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
Alexa Crawl EF
Alexa Crawl EF
collection
975
ITEMS
90.1M
VIEWS
collection

eye 90.1M

Crawl EF from Alexa Internet. This data is currently not publicly accessible.
Arts & Music
Arts & Music
collection
16,720
ITEMS
567.8M
VIEWS
collection

eye 567.8M

This library of arts and music videos features This or That (a burlesque game show), the Coffee House TV arts program, punk bands from Punkcast and live performances from Groove TV. Many of these movies are available for free download.
University of Toronto - Robarts Library
University of Toronto - Robarts Library
collection
216,575
ITEMS
309.9M
VIEWS
collection

eye 309.9M

The John P. Robarts Research Library, commonly referred to as Robarts Library, is the main humanities and social sciences library of the University of Toronto Libraries and the largest individual library in the university. Opened in 1973 and named for John Robarts, the 17th Premier of Ontario, the library contains more than 4.5 million bookform items, 4.1 million microform items and 740,000 other items. The library building is one of the most significant examples of brutalist architecture in...
Newspapers
Newspapers
collection
810,724
ITEMS
49.5M
VIEWS
collection

eye 49.5M

The newspapers in this collection have been scanned as part of a pilot project using microfilm and microfiche. After using a microfilm/fiche scanner to create a digital image of each page, we process the resulting images so that each reel is contained in a single item with easily navigable files. For a few examples, please see: The New York times (Oct 16 31 1915) The New York times (1919 July 1-15) The New York times (May 1-15 1915)
Scanned in China
Scanned in China
collection
819,729
ITEMS
70.6M
VIEWS
collection

eye 70.6M

Books scanned in Shenzhen and Beijing, China.
Topic: books
Institut national de l’audiovisuel
Institut national de l’audiovisuel
collection
50
ITEMS
81.5M
VIEWS
collection

eye 81.5M

Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia : The Institut national de l'audiovisuel (or INA, French for National Audiovisual Institute), is a repository of all French radio and television audiovisual archives. Since 2006, it has allowed free online consultation on a website called ina.fr with a search tool indexing 100,000 archives of historical programs, for a total of 20,000 hours.
Alexa Crawl DL
Alexa Crawl DL
collection
413
ITEMS
93.2M
VIEWS
collection

eye 93.2M

Crawl DL from Alexa Internet. This data is currently not publicly accessible.
COM Survey Crawl 2009-2010
COM Survey Crawl 2009-2010
collection
729
ITEMS
75.8M
VIEWS
collection

eye 75.8M

COM survey crawl data collected by Internet Archive in 2009-2010. This data is currently not publicly accessible.
Biodiversity Heritage Library
Biodiversity Heritage Library
collection
236,838
ITEMS
161.2M
VIEWS
collection

eye 161.2M

Inspiring discovery through free access to biodiversity knowledge. | The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community. | Please read BHL's Acknowledgment of Harmful Content . About the Biodiversity Heritage Library The Biodiversity Heritage Library (BHL) is the world's largest open access digital library for biodiversity literature and archives. BHL is...
web_ma
web_ma
collection
1,085
ITEMS
69.6M
VIEWS
collection

eye 69.6M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl DJ
Alexa Crawl DJ
collection
341
ITEMS
78.7M
VIEWS
collection

eye 78.7M

Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
Shallow Crawl Started 2013
Shallow Crawl Started 2013
collection
252
ITEMS
66.5M
VIEWS
collection

eye 66.5M

Shallow crawl started 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Alexa Crawl DI
Alexa Crawl DI
collection
250
ITEMS
74.7M
VIEWS
collection

eye 74.7M

Crawl DI from Alexa Internet. This data is currently not publicly accessible.
Shallow Crawl Started 2013
Shallow Crawl Started 2013
collection
544
ITEMS
65.7M
VIEWS
collection

eye 65.7M

Shallow crawl started 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Alexa Crawl EE
Alexa Crawl EE
collection
484
ITEMS
69.1M
VIEWS
collection

eye 69.1M

Crawl EE from Alexa Internet. This data is currently not publicly accessible.
web_con
web_con
collection
1,507
ITEMS
67.7M
VIEWS
collection

eye 67.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
survey_net00000
survey_net00000
collection
300
ITEMS
60.5M
VIEWS
collection

eye 60.5M

Survey crawl of .net domains started December 2010.
Topic: webcrawl
Old Time Radio
Old Time Radio
collection
7,704
ITEMS
114.3M
VIEWS
collection

eye 114.3M

web_el
web_el
collection
925
ITEMS
62.8M
VIEWS
collection

eye 62.8M

Crawl performed by Internet Archive. This data is currently not publicly accessible.