Ask a Question

Trying Demo in Tester results in Error

I'm trying to debug a 80Legs app using the tester at: http://80apptester.80legs.com/ and am getting the error `TypeError: EightyAppBase is not a constructor`. As a sanity check, I tried with a demo app from the github repo: https://raw.githubusercontent.com/datafiniti/EightyApps/master/apps/LinkCollector.js Using this demo I am receiving the same error. Is the Testing app not up to date with the spider runner? Scott

Crawl returning timeout error

Hello. Crawl id 3261670, returning timeout error on relatively big website page. The same type of website page, but smaller was crawled without errors.

Demo 80Apps not working

Hi There, I tried running the DomainCollector.js app and am seeing this error on the output: ```EightyAppBase is not a constructor``` Has there been a change to the API? Scott

non-circular object error

Hi There, The output from the crawling consistently shows ```expected result of processDocument to be a non-circular object or array of non-circularobjects``` Testing the app using the test framework returns the results we are looking for. The processDocument script is capturing all external links and storing them in array, then converting this array to a JSON and adding it to the return object. ``` app.processDocument = function (html, url, headers, status, $) { const $html = this.parseHtml(html, $); const links = []; const object = new Object(); const r = /:\/\/(.[^/]+)/; const urlDomain = url.match(r)[1]; const normalizedUrlDomain = urlDomain.toLowerCase(); // gets all links in the html document $html.find('a').each(function (i, obj) { const link = app.makeLink(url, $(this).attr('href')); if (link) { const linkDomain = link.match(r)[1]; if (linkDomain.toLowerCase() !== normalizedUrlDomain) { if (!links.includes(linkDomain)) { links.push(linkDomain); } } } }); object['list'] = JSON.stringify(links); return object; } ``` Any help on how to recode this so that we can get an array of external links as opposed to this error message would be great. Scott

Could not scrape from amazon.ca

https://amazon.ca returned [{"url":"https://amazon.ca","result":"\"{}\""}]

Crawl Stuck 3230056

Hello, This crawl is stuck. What would be the way to troubleshoot this? I have a list of 10,000 urls, and the 80 legs tester only shows the backend response for 1 url. Thank you.

How to scrape JS generated data

There is a website, that has span tags with data-id attribute. Text in this span is generated when the page is loading, so my 80legs app can't get this text. Is there a solution?

Could not scrape data from amazon.co.jp

I could not get html data from amazon.co.jp when we tried yesterday, TargetURL: https://www.amazon.co.jp/s?i=hobby&bbn=2189632051&rh=n%3A2277721051%2Cn%3A2277722051%2Cn%3A2189632051%2Cp_n_feature_fifteen_browse-bin%3A3307621051&s=date-desc-rank&page=155&pf_rd_i=2189632051&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_p=cf2542d6-8f93-4f8b-8803-343c480de726&pf_rd_r=6RSZ5NDTY1HYWG4670MK&pf_rd_s=merchandised-search-6&pf_rd_t=101&qid=1563941970&ref=sr_pg_155 The result of scraping was below ``` <!DOCTYPE html> <!--[if lt IE 7]> <html lang="jp" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]--> <!--[if IE 7]> <html lang="jp" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]--> <!--[if IE 8]> <html lang="jp" class="a-no-js a-lt-ie9"> <![endif]--> <!--[if gt IE 8]><!--> <html class="a-no-js" lang="jp"><!--<![endif]--><head> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS"> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <title dir="ltr">Amazon CAPTCHA</title> <meta name="viewport" content="width=device-width"> <link rel="stylesheet" href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css"> <script> if (true === true) &#123; var ue_t0 = (+ new Date()), ue_csm = window, ue = &#123; t0: ue_t0, d: function() &#123; return (+new Date() - ue_t0); &#125; &#125;, ue_furl = "fls-fe.amazon.co.jp", ue_mid = "A1VC38T7YXB528", ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1], ue_sn = "opfcaptcha.amazon.co.jp", ue_id = 'KKTM8F5RHSCN88RHYEX8'; &#125; </script> </head> <body> <!-- To discuss automated access to Amazon data please contact [email protected] For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.jp/ref=rm_c_sv, or our Product Advertising API at https://affiliate.amazon.co.jp/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases. --> <!-- Correios.DoNotSend --> <div class="a-container a-padding-double-large" style="min-width:350px;padding:44px 0 !important"> <div class="a-row a-spacing-double-large" style="width: 350px; margin: 0 auto"> <div class="a-row a-spacing-medium a-text-center"><i class="a-icon a-logo"></i></div> <div class="a-box a-alert a-alert-info a-spacing-base"> <div class="a-box-inner"> <i class="a-icon a-icon-alert"></i> <h4>���ɕ\������Ă��镶������͂��Ă�������</h4> <p class="a-last">�\���󂠂�܂��񂪁A���q�l�����&#123;�b�g�łȂ����Ƃ��m�F�����Ă��������K�v������܂��B�ŗǂ̂������ŃA�N�Z�X���Ă����������߂ɁA���g���̃u���E�U���N�b�L�[���󂯓���Ă��邱�Ƃ����m�F���������B</p> </div> </div> <div class="a-section"> <div class="a-box a-color-offset-background"> <div class="a-box-inner a-padding-extra-large"> <form method="get" action="/errors/validateCaptcha" name=""> <input type=hidden name="amzn" value="vMTrEHkdsJiaQr9x5UfAgA==" /><input type=hidden name="amzn-r" value="&#047;s?i&#061;hobby&amp;bbn&#061;2189632051&amp;rh&#061;n&#037;3A2277721051&#037;2Cn&#037;3A2277722051&#037;2Cn&#037;3A2189632051&#037;2Cp_n_feature_fifteen_browse&#045;bin&#037;3A3307621051&amp;s&#061;date&#045;desc&#045;rank&amp;page&#061;2&amp;pf_rd_i&#061;2189632051&amp;pf_rd_m&#061;A3P5ROKL5A1OLE&amp;pf_rd_p&#061;cf2542d6&#045;8f93&#045;4f8b&#045;8803&#045;343c480de726&amp;pf_rd_r&#061;6RSZ5NDTY1HYWG4670MK&amp;pf_rd_s&#061;merchandised&#045;search&#045;6&amp;pf_rd_t&#061;101&amp;qid&#061;1563942549&amp;ref&#061;sr_pg_2" /> <div class="a-row a-spacing-large"> <div class="a-box"> <div class="a-box-inner"> <h4>���̉摜�Ɍ����镶������͂��Ă�������:</h4> <div class="a-row a-text-center"> <img src="https://images-na.ssl-images-amazon.com/captcha/qujzzelu/Captcha_lewcclnfpa.jpg"> </div> <div class="a-row a-spacing-base"> <div class="a-row"> <div class="a-column a-span6"> </div> <div class="a-column a-span7 a-span-last a-text-right"> <a onclick="window.location.reload()">�ʂ̉摜�ɂ��Ă�������</a> </div> </div> <input autocomplete="off" spellcheck="false" placeholder="��������͂��Ă�������" id="captchacharacters" name="field-keywords" class="a-span12" autocapitalize="off" autocorrect="off" type="text"> </div> </div> </div> </div> <div class="a-section a-spacing-extra-large"> <div class="a-row"> <span class="a-button a-button-primary a-span12"> <span class="a-button-inner"> <button type="submit" class="a-button-text">�V���b�s���O�𑱂���</button> </span> </span> </div> </div> </form> </div> </div> </div> </div> <div class="a-divider a-divider-section"><div class="a-divider-inner"></div></div> <div class="a-text-center a-spacing-small a-size-mini"> <a href="https://www.amazon.co.jp/gp/help/customer/display.html/ref=footer_cou/376-1267051-7966065?ie=UTF8&nodeId=643006">���p�K��</a> <span class="a-letter-space"></span> <span class="a-letter-space"></span> <span class="a-letter-space"></span> <span class="a-letter-space"></span> <a href="https://www.amazon.co.jp/gp/help/customer/display.html/ref=footer_privacy/376-1267051-7966065?ie=UTF8&nodeId=643000">�v���C�o�V�[�K��</a> </div> <div class="a-text-center a-size-mini a-color-secondary"> &copy; 1996-2013, Amazon.com, Inc. or its affiliates <script> if (true === true) &#123; document.write('<img src="https://fls-fe.amaz'+'on.co.jp/'+'1/oc-csi/1/OP/requestId=KKTM8F5RHSCN88RHYEX8&js=1" />'); &#125;; </script> <noscript> <img src="https://fls-fe.amazon.co.jp/1/oc-csi/1/OP/requestId=KKTM8F5RHSCN88RHYEX8&js=0" /> </noscript> </div> </div> <script> if (true === true) &#123; var elem = document.createElement("script"); elem.src = "https://images-fe.ssl-images-amazon.com/images/G/01/csminstrumentation/csm-captcha-instrumentation.min._V" + (+ new Date()) + "_.js"; document.getElementsByTagName('head')[0].appendChild(elem); &#125; </script> </body></html> ```

Error loading Pages issue

I am trying to crawl yellow pages for Malaysia .. it is able to crawl 1-2 pages at once. However, when I increase number of pages, it gives an error showing Error loading page in few while loading the rest. Can you help me with the reason behind such scenarios ?

Results show completed with 0 file crawled

In some crawls , the results show completed without crawling any URL or showing any JSON file. What are the reasons for such scenarios?

My crawls start but don't progress

for the last 12 h my crawls start but don't progress (URLs Crawled 0)

Bug I am seeing

Paying for 5 crawls parallel however I am only seeing 3 at a time for a few days now.

Crawl doesn't seem to progress

Dear team - I just bought a plus account yesterday to run a 1000000 limit crawl. I started a crawl 1 hour back, but do not see any progress on the dashboard, # of URLs crawled is still 0, after 1 hour. Please look into the matter. ID: 1985335

Crawls not progressing

Dear team - I bought a plus account today to run a 1000000 limit crawl. I started a crawl 30 minutes back, but do not see any progress on the dashboard, # of URLs crawled is still 0, after 3 minutes. Please look into the matter.

Crawls not strating

I am working on an urgent crawl and tried many times but it always stays at 0 URLs no matter how many times I restart it. The crawl I care about now is 1958503. Any help?

Not starting anymore

Hello, since friday night none of my crawls are working. It is said as "started" and never start I trie to create another account in case mine was screwed but i got the same?

Crawls are not starting since upgrading...

I upgraded to the 299 package and it froze my crawls to a dead stop... Nothing is progressing from STARTED and no urls are being crawled.

Error result

I'm getting this error message. "url": "https://www.kmi-kriegsman.com/", "result": "{}", "errors": [ "parseLinks failed: text.replace is not a function", "processDocument failed: text.replace is not a function" ] eventhough I use default Generated 80App

How many crawls per month is actually possible?

What are the top users actually able to crawl? Billions of urls? Is the structure of 80legs able to facilitate such feats?

Crawl not starting or progressing

Hi, A crawl I'm performing (1690200) seems to be stuck. The number of crawled URLs stays at 0, status is "STARTED" but nothing happens. It is a custom app that works perfectly when using the app tester. What can be done to fix this? Thanks!