Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

Web Scraping of a an internet site that is running a long test

I am trying to web scrap an internet speed test site, and not able to get the data. I tried few sites that needs a click but could not find the proper place in the html page with the button. So I switched to a site that does not require a button click. But now I do not get the result back, even though I waited 60 seconds for the test to finish. When I inspect the HTML in the browser, after the test is done, I see the results, but it is not included in the html page downloaded, and I do not understand why. Here is the code I used (Python):

from bs4 import BeautifulSoup as bs
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

URL = 'https://www.bezeq.co.il/internetandphone/internet/speedtest/'
chrome_options = Options()
#chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

driver.get(URL)
sleep(60) # give time for the test to finish
page0 = driver.page_source
driver.close()

When I inspect the HTML code on the site I can see the results (upload 9501, download 86.19) but it is not showing in the 'page0' I get from the driver.

    <div class="bezeq-results">
   <div class="speed download">
      <div class="title">?????? ?????</div>
      <div class="value">86.19 Mb/s</div>
   </div>
   <div class="speed upload">
      <div class="title">?????? ?????</div>
      <div class="value">9501 Kb/s</div>
   </div>
   <div class="info"><span class="company"><span class="value">ITC NG ltd</span> <span class="hebrew">:???</span></span><span class="ping">Ping: <span class="value">30ms</span></span><span class="ip">IP: <span class="value">185.108.81.221</span></span></div>
   <button class="btn">???? ???</button>
</div>

What am I doing wrong? How should I get the data?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I found out the data is in an Iframe. So I had to do a request to get the data from the Iframe and got the data


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...