Discussions
Trying Demo in Tester results in Error
I'm trying to debug a 80Legs app using the tester at: http://80apptester.80legs.com/ and am getting the error `TypeError: EightyAppBase is not a constructor`.
As a sanity check, I tried with a demo app from the github repo: https://raw.githubusercontent.com/datafiniti/EightyApps/master/apps/LinkCollector.js
Using this demo I am receiving the same error. Is the Testing app not up to date with the spider runner?
Scott
Posted by Scott Yewell over 2 years ago
Crawl returning timeout error
Hello. Crawl id 3261670, returning timeout error on relatively big website page.
The same type of website page, but smaller was crawled without errors.
Posted by Andy over 2 years ago
Demo 80Apps not working
Hi There,
I tried running the DomainCollector.js app and am seeing this error on the output:
```EightyAppBase is not a constructor```
Has there been a change to the API?
Scott
Posted by Scott Yewell over 2 years ago
non-circular object error
Hi There, The output from the crawling consistently shows
```expected result of processDocument to be a non-circular object or array of non-circularobjects```
Testing the app using the test framework returns the results we are looking for.
The processDocument script is capturing all external links and storing them in array, then converting this array to a JSON and adding it to the return object.
```
app.processDocument = function (html, url, headers, status, $) {
const $html = this.parseHtml(html, $);
const links = [];
const object = new Object();
const r = /:\/\/(.[^/]+)/;
const urlDomain = url.match(r)[1];
const normalizedUrlDomain = urlDomain.toLowerCase();
// gets all links in the html document
$html.find('a').each(function (i, obj) {
const link = app.makeLink(url, $(this).attr('href'));
if (link) {
const linkDomain = link.match(r)[1];
if (linkDomain.toLowerCase() !== normalizedUrlDomain) {
if (!links.includes(linkDomain)) {
links.push(linkDomain);
}
}
}
});
object['list'] = JSON.stringify(links);
return object;
}
```
Any help on how to recode this so that we can get an array of external links as opposed to this error message would be great.
Scott
Posted by Scott Yewell over 2 years ago
Could not scrape from amazon.ca
https://amazon.ca
returned
[{"url":"https://amazon.ca","result":"\"{}\""}]
Posted by steve nevis over 2 years ago
Crawl Stuck 3230056
Hello,
This crawl is stuck. What would be the way to troubleshoot this? I have a list of 10,000 urls, and the 80 legs tester only shows the backend response for 1 url.
Thank you.
Posted by JPP over 2 years ago
How to scrape JS generated data
There is a website, that has span tags with data-id attribute. Text in this span is generated when the page is loading, so my 80legs app can't get this text.
Is there a solution?
Posted by hqbar almost 3 years ago
Could not scrape data from amazon.co.jp
I could not get html data from amazon.co.jp when we tried yesterday,
TargetURL: https://www.amazon.co.jp/s?i=hobby&bbn=2189632051&rh=n%3A2277721051%2Cn%3A2277722051%2Cn%3A2189632051%2Cp_n_feature_fifteen_browse-bin%3A3307621051&s=date-desc-rank&page=155&pf_rd_i=2189632051&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_p=cf2542d6-8f93-4f8b-8803-343c480de726&pf_rd_r=6RSZ5NDTY1HYWG4670MK&pf_rd_s=merchandised-search-6&pf_rd_t=101&qid=1563941970&ref=sr_pg_155
The result of scraping was below
```
<!DOCTYPE html>
<!--[if lt IE 7]> <html lang="jp" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]-->
<!--[if IE 7]> <html lang="jp" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]-->
<!--[if IE 8]> <html lang="jp" class="a-no-js a-lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="a-no-js" lang="jp"><!--<![endif]--><head>
<meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title dir="ltr">Amazon CAPTCHA</title>
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css">
<script>
if (true === true) {
var ue_t0 = (+ new Date()),
ue_csm = window,
ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },
ue_furl = "fls-fe.amazon.co.jp",
ue_mid = "A1VC38T7YXB528",
ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],
ue_sn = "opfcaptcha.amazon.co.jp",
ue_id = 'KKTM8F5RHSCN88RHYEX8';
}
</script>
</head>
<body>
<!--
To discuss automated access to Amazon data please contact [email protected]
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.jp/ref=rm_c_sv, or our Product Advertising API at https://affiliate.amazon.co.jp/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.
-->
<!--
Correios.DoNotSend
-->
<div class="a-container a-padding-double-large" style="min-width:350px;padding:44px 0 !important">
<div class="a-row a-spacing-double-large" style="width: 350px; margin: 0 auto">
<div class="a-row a-spacing-medium a-text-center"><i class="a-icon a-logo"></i></div>
<div class="a-box a-alert a-alert-info a-spacing-base">
<div class="a-box-inner">
<i class="a-icon a-icon-alert"></i>
<h4>���ɕ\������Ă��镶������͂��Ă�������</h4>
<p class="a-last">�\����܂��A���q�l�����{�b�g�łȂ����Ƃ��m�F�����Ă��������K�v������܂��B�ŗǂ̂������ŃA�N�Z�X���Ă����������߂ɁA���g���̃u���E�U���N�b�L�[������Ă��邱�Ƃ����m�F���������B</p>
</div>
</div>
<div class="a-section">
<div class="a-box a-color-offset-background">
<div class="a-box-inner a-padding-extra-large">
<form method="get" action="/errors/validateCaptcha" name="">
<input type=hidden name="amzn" value="vMTrEHkdsJiaQr9x5UfAgA==" /><input type=hidden name="amzn-r" value="/s?i=hobby&bbn=2189632051&rh=n%3A2277721051%2Cn%3A2277722051%2Cn%3A2189632051%2Cp_n_feature_fifteen_browse-bin%3A3307621051&s=date-desc-rank&page=2&pf_rd_i=2189632051&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_p=cf2542d6-8f93-4f8b-8803-343c480de726&pf_rd_r=6RSZ5NDTY1HYWG4670MK&pf_rd_s=merchandised-search-6&pf_rd_t=101&qid=1563942549&ref=sr_pg_2" />
<div class="a-row a-spacing-large">
<div class="a-box">
<div class="a-box-inner">
<h4>���̉摜�Ɍ����镶������͂��Ă�������:</h4>
<div class="a-row a-text-center">
<img src="https://images-na.ssl-images-amazon.com/captcha/qujzzelu/Captcha_lewcclnfpa.jpg">
</div>
<div class="a-row a-spacing-base">
<div class="a-row">
<div class="a-column a-span6">
</div>
<div class="a-column a-span7 a-span-last a-text-right">
<a onclick="window.location.reload()">�ʂ̉摜�ɂ��Ă�������</a>
</div>
</div>
<input autocomplete="off" spellcheck="false" placeholder="��������͂��Ă�������" id="captchacharacters" name="field-keywords" class="a-span12" autocapitalize="off" autocorrect="off" type="text">
</div>
</div>
</div>
</div>
<div class="a-section a-spacing-extra-large">
<div class="a-row">
<span class="a-button a-button-primary a-span12">
<span class="a-button-inner">
<button type="submit" class="a-button-text">�V���b�s���O�𑱂���</button>
</span>
</span>
</div>
</div>
</form>
</div>
</div>
</div>
</div>
<div class="a-divider a-divider-section"><div class="a-divider-inner"></div></div>
<div class="a-text-center a-spacing-small a-size-mini">
<a href="https://www.amazon.co.jp/gp/help/customer/display.html/ref=footer_cou/376-1267051-7966065?ie=UTF8&nodeId=643006">���p�K��</a>
<span class="a-letter-space"></span>
<span class="a-letter-space"></span>
<span class="a-letter-space"></span>
<span class="a-letter-space"></span>
<a href="https://www.amazon.co.jp/gp/help/customer/display.html/ref=footer_privacy/376-1267051-7966065?ie=UTF8&nodeId=643000">�v���C�o�V�[�K��</a>
</div>
<div class="a-text-center a-size-mini a-color-secondary">
© 1996-2013, Amazon.com, Inc. or its affiliates
<script>
if (true === true) {
document.write('<img src="https://fls-fe.amaz'+'on.co.jp/'+'1/oc-csi/1/OP/requestId=KKTM8F5RHSCN88RHYEX8&js=1" />');
};
</script>
<noscript>
<img src="https://fls-fe.amazon.co.jp/1/oc-csi/1/OP/requestId=KKTM8F5RHSCN88RHYEX8&js=0" />
</noscript>
</div>
</div>
<script>
if (true === true) {
var elem = document.createElement("script");
elem.src = "https://images-fe.ssl-images-amazon.com/images/G/01/csminstrumentation/csm-captcha-instrumentation.min._V" + (+ new Date()) + "_.js";
document.getElementsByTagName('head')[0].appendChild(elem);
}
</script>
</body></html>
```
Posted by Genki almost 3 years ago
Error loading Pages issue
I am trying to crawl yellow pages for Malaysia .. it is able to crawl 1-2 pages at once. However, when I increase number of pages, it gives an error showing Error loading page in few while loading the rest. Can you help me with the reason behind such scenarios ?
Posted by Rashi Goel almost 3 years ago
Results show completed with 0 file crawled
In some crawls , the results show completed without crawling any URL or showing any JSON file. What are the reasons for such scenarios?
Posted by Rashi Goel almost 3 years ago
My crawls start but don't progress
for the last 12 h my crawls start but don't progress (URLs Crawled 0)
Posted by Koen Koeppen almost 3 years ago
Bug I am seeing
Paying for 5 crawls parallel however I am only seeing 3 at a time for a few days now.
Posted by Courtney Rogers about 3 years ago
Crawl doesn't seem to progress
Dear team - I just bought a plus account yesterday to run a 1000000 limit crawl. I started a crawl 1 hour back, but do not see any progress on the dashboard, # of URLs crawled is still 0, after 1 hour. Please look into the matter.
ID: 1985335
Posted by AChauhan about 3 years ago
Crawls not progressing
Dear team - I bought a plus account today to run a 1000000 limit crawl. I started a crawl 30 minutes back, but do not see any progress on the dashboard, # of URLs crawled is still 0, after 3 minutes. Please look into the matter.
Posted by Aditya about 3 years ago
Crawls not strating
I am working on an urgent crawl and tried many times but it always stays at 0 URLs no matter how many times I restart it. The crawl I care about now is 1958503. Any help?
Posted by Omar about 3 years ago
Not starting anymore
Hello, since friday night none of my crawls are working. It is said as "started" and never start
I trie to create another account in case mine was screwed but i got the same?
Posted by Xavier Paulik about 3 years ago
Crawls are not starting since upgrading...
I upgraded to the 299 package and it froze my crawls to a dead stop... Nothing is progressing from STARTED and no urls are being crawled.
Posted by Courtney Rogers about 3 years ago
Error result
I'm getting this error message.
"url": "https://www.kmi-kriegsman.com/",
"result": "{}",
"errors": [
"parseLinks failed: text.replace is not a function",
"processDocument failed: text.replace is not a function"
]
eventhough I use default Generated 80App
Posted by prio bagus suwanto about 3 years ago
How many crawls per month is actually possible?
What are the top users actually able to crawl? Billions of urls? Is the structure of 80legs able to facilitate such feats?
Posted by Courtney Rogers about 3 years ago
Crawl not starting or progressing
Hi,
A crawl I'm performing (1690200) seems to be stuck. The number of crawled URLs stays at 0, status is "STARTED" but nothing happens. It is a custom app that works perfectly when using the app tester.
What can be done to fix this?
Thanks!
Posted by Martin Pelotas about 3 years ago