2
2.0
Mar 31, 2020
03/20
by
Archive-It
web
eye 2
favorite 0
comment 0
17
17
web
eye 17
favorite 0
comment 0
52
52
Feb 13, 2015
02/15
by
Internet Archive
web
eye 52
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl443.us.archive.org:youtube from Fri Feb 13 15:24:57 PST 2015 to Fri Feb 13 08:07:48 PST 2015.
Topic: crawldata
56
56
Jun 19, 2015
06/15
by
Internet Archive
web
eye 56
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl443.us.archive.org:youtube from Fri Jun 19 21:38:26 PDT 2015 to Fri Jun 19 15:46:57 PDT 2015.
Topic: crawldata
14
14
web
eye 14
favorite 0
comment 0
jobId=176685, recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=CRAWL_SELECTED_SEEDS, seedCount=2, accountId=350, accountType=SUBSCRIBER, organizationName="National Library of Medicine", collectionId=6285, collectionName="NLM Specialized Information Services", collectionPublic=true
7
7.0
Dec 23, 2015
12/15
by
Archive-It
web
eye 7
favorite 0
comment 0
jobId=188659, recurrence=DAILY, maxDuration=82800, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=null, seedCount=1, accountId=980, accountType=SUBSCRIBER, organizationName="Massachusetts Department of Transportation", collectionId=6179, collectionName="Aeronautics", collectionPublic=true
66
66
Sep 30, 2015
09/15
by
Internet Archive
web
eye 66
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl445.us.archive.org:youtube from Wed Sep 30 07:12:00 PDT 2015 to Wed Sep 30 01:08:10 PDT 2015.
Topic: crawldata
Incremental crawl of the Portuguese web performed between 13 August 2015 and 5 November 2015 mainly from .PT domain. The AWP18 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP17 as baseline. Thus, the files that remained unchanged from the AWP17 complete crawl were not archived (duplicated) on the AWP18 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
2
2.0
Apr 13, 2020
04/20
by
Archive-It
web
eye 2
favorite 0
comment 0
Incremental crawl of the Portuguese web performed between 12 November 2015 and 5 January 2015 mainly from .PT domain. The AWP19 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP18 as baseline. Thus, the files that remained unchanged from the AWP18 complete crawl were not archived (duplicated) on the AWP19 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
22
22
Jun 24, 2015
06/15
by
Internet Archive
web
eye 22
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl443.us.archive.org:youtube from Wed Jun 24 16:10:22 PDT 2015 to Wed Jun 24 09:36:13 PDT 2015.
Topic: crawldata
1
1.0
web
eye 1
favorite 0
comment 0
Incremental crawl of the Portuguese web performed between 13 August 2015 and 5 November 2015 mainly from .PT domain. The AWP18 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP17 as baseline. Thus, the files that remained unchanged from the AWP17 complete crawl were not archived (duplicated) on the AWP18 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=3, accountId=643, accountType=SUBSCRIBER, organizationName="Bucknell University", collectionId=3239, collectionName="Bucknell University Website Archive", collectionPublic=true
11
11
web
eye 11
favorite 0
comment 0
jobId=171181, recurrence=NONE, maxDuration=604800, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=CRAWL_SELECTED_SEEDS, seedCount=1, accountId=75, accountType=SUBSCRIBER, organizationName="University of Toronto", collectionId=6209, collectionName="Federal Election Candidate Sites 2015", collectionPublic=true
9
9.0
Aug 18, 2015
08/15
by
Archive-It
web
eye 9
favorite 0
comment 0
jobId=169391, recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=true, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=18, accountId=824, accountType=SUBSCRIBER, organizationName="Academy of Motion Picture Arts and Sciences, Margaret Herrick Library", collectionId=5182, collectionName="Film Websites 2015", collectionPublic=true
29
29
Nov 15, 2015
11/15
by
Internet Archive
web
eye 29
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl443.us.archive.org:youtube from Sun Nov 15 03:17:21 PST 2015 to Sat Nov 14 19:51:13 PST 2015.
Topic: crawldata
3
3.0
Apr 14, 2020
04/20
by
Archive-It
web
eye 3
favorite 0
comment 0
19
19
web
eye 19
favorite 0
comment 0
jobId=174613, recurrence=SEMIANNUAL, maxDuration=604800, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=null, seedCount=106, accountId=401, accountType=SUBSCRIBER, organizationName="University of Alberta", collectionId=2901, collectionName="Government Information Collection", collectionPublic=true
recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=true, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=3, accountId=309, accountType=SUBSCRIBER, organizationName="Z. Smith Reynolds Library", collectionId=1104, collectionName="Wake Forest University Archives", collectionPublic=true
Complete crawl of the Portuguese web performed between 10 April 2015 and 9 June 2015 mainly from .PT domain. The AWP17 crawl did NOT use DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/).
Topics: Complete crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
6
6.0
Sep 28, 2015
09/15
by
Archive-It
web
eye 6
favorite 0
comment 0
jobId=174918, recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=true, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=44, accountId=824, accountType=SUBSCRIBER, organizationName="Academy of Motion Picture Arts and Sciences, Margaret Herrick Library", collectionId=5182, collectionName="Film Websites 2015", collectionPublic=true
51
51
Mar 23, 2015
03/15
by
Internet Archive
web
eye 51
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl440.us.archive.org:youtube from Mon Mar 23 00:00:48 PDT 2015 to Sun Mar 22 17:50:05 PDT 2015.
Topic: crawldata
40
40
Jun 15, 2015
06/15
by
Archive-It
web
eye 40
favorite 0
comment 0
recurrence=SEMIANNUAL, maxDuration=432000, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=null, seedCount=11, accountId=687, accountType=SUBSCRIBER, organizationName="Gettysburg College", collectionId=3898, collectionName="Gettysburg Sports", collectionPublic=true
3
3.0
Apr 16, 2020
04/20
by
Archive-It
web
eye 3
favorite 0
comment 0
Incremental crawl of the Portuguese web performed between 12 November 2015 and 5 January 2015 mainly from .PT domain. The AWP19 crawl is incremental because it was performed using DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/) taking the content of AWP18 as baseline. Thus, the files that remained unchanged from the AWP18 complete crawl were not archived (duplicated) on the AWP19 incremental crawl.
Topics: Incremental crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
4
4.0
Sep 23, 2015
09/15
by
Archive-It
web
eye 4
favorite 0
comment 0
jobId=174336, recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=true, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=1, accountId=824, accountType=SUBSCRIBER, organizationName="Academy of Motion Picture Arts and Sciences, Margaret Herrick Library", collectionId=5182, collectionName="Film Websites 2015", collectionPublic=true
Complete crawl of the Portuguese web performed between 10 April 2015 and 9 June 2015 mainly from .PT domain. The AWP17 crawl did NOT use DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/).
Topics: Complete crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...
3
3.0
Oct 7, 2015
10/15
by
Archive-It
web
eye 3
favorite 0
comment 0
jobId=177305, recurrence=DAILY, maxDuration=82800, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=null, seedCount=1, accountId=980, accountType=SUBSCRIBER, organizationName="Massachusetts Department of Transportation", collectionId=6179, collectionName="Aeronautics", collectionPublic=true
22
22
Oct 26, 2015
10/15
by
Archive-It
web
eye 22
favorite 0
comment 0
jobId=180309, recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=true, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=71, accountId=388, accountType=SUBSCRIBER, organizationName="Tufts University", collectionId=3145, collectionName="Athletics", collectionPublic=true
38
38
Aug 13, 2015
08/15
by
Internet Archive
web
eye 38
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl819.us.archive.org:youtube from Thu Aug 13 15:06:21 PDT 2015 to Thu Aug 13 09:43:34 PDT 2015.
Topic: crawldata
recurrence=NONE, maxDuration=86400, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=MISSING_URLS_PATCH_CRAWL, seedCount=25, accountId=811, accountType=SUBSCRIBER, organizationName="University of Mary Washington ", collectionId=4772, collectionName="University of Mary Washington Website", collectionPublic=true
14
14
web
eye 14
favorite 0
comment 0
43
43
Apr 28, 2015
04/15
by
Internet Archive
web
eye 43
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl444.us.archive.org:youtube from Tue Apr 28 13:55:41 PDT 2015 to Tue Apr 28 07:44:00 PDT 2015.
Topic: crawldata
16
16
Jun 19, 2015
06/15
by
Archive-It
web
eye 16
favorite 0
comment 0
recurrence=QUARTERLY, maxDuration=604800, maxDocumentCount=null, isTestCrawl=false, isPatchCrawl=false, oneTimeSubtype=null, seedCount=55, accountId=550, accountType=SUBSCRIBER, organizationName="Innsbruck Newspaper Archive / University of Innsbruck", collectionId=2697, collectionName="DILIMAG", collectionPublic=true
Internet Archive crawldata from the NARA 113th Congressional Crawl, captured by wbgrp-crawl013.us.archive.org:congress113th from Fri Jan 9 06:21:58 PST 2015 to Thu Jan 8 22:38:29 PST 2015.
Topic: crawldata
Complete crawl of the Portuguese web performed between 10 April 2015 and 9 June 2015 mainly from .PT domain. The AWP17 crawl did NOT use DeDuplicator (http://landsbokasafn.github.io/DeDuplicator/).
Topics: Complete crawl of the Portuguese web, Portuguese Web Archive, Portuguese online publications,...