Author Topic: get a particular element  (Read 730 times)

Offline pavansss91

  • Level 18
  • *
  • Posts: 185
  • Reputation: +1/-0
    • View Profile
get a particular element
« on: June 07, 2009, 11:49:36 AM »
Hi,

 I have got a doubt. I don't think there are any solutions for this but i thought i would better ask here.

Can we get the innerhtml of an element in a html page using php ?

what i mean is:

 I have a file userprofile.php which is used to see user's profile.
the link to see an user "Mario Bhoola" with user id=2356 is
 mydomain.com/userprofile.php?id=2356

Now i need to get the name Mario Bhoola which is in some element id of html page produced

so i want to code a wanted.php on a different server where $i<3000 and run
myotherdomain.com/wanted.php
which writes every name of user into a user.txt whose id is less than 3000.

I have explained as much as possible.
if u are still confused about what i am asking i will try to go more detailed.
bbgFramework v0.1.3
Sun Database Class v0.3

Offline Nox

  • Level 35
  • **
  • Posts: 767
  • Reputation: +12/-2
    • View Profile
Re: get a particular element
« Reply #1 on: June 07, 2009, 01:56:29 PM »
(please don't skip any part, at least above ----, I think everything's important)

Hi,
there is a solution, you just have to think about it in a slightly different way - you just need to get a page and find a string there.
So we use some of the means to get the page, there is a number of them - CURL, file_get_contents, some socket things maybe...
I use CURL:
Code: [Select]
<?php
  $ch 
curl_init();
  
curl_setopt($chCURLOPT_URLadress);
  
curl_setopt($chCURLOPT_RETURNTRANSFER1);
  
curl_setopt($chCURLOPT_HEADER0);
  
$content=curl_exec($ch);
  
curl_close($ch);
?>

And then just use regular expressions to get what you need

However - this is an expensive one, worst case scenario retrieving 3000 pages and parsing 3000 pages...but I see no other way

Imho the only way is that you don't run this thing much often and you might need the priviledge to adjust max_execution_time as otherwise it might exceed it. I believe it's impossible to use it on every page load, that would be insane

Well...there actually might be a way...After parsing those pages you may save your results to a file - with a name stating what it contains (like /usernames/lt3000.txt like "less than id 3000") - so the procedure doesn't have to be run everytime....assuming the data does not change...
----


This is what databases are for, scaning pages is not really a good idea, but I guess the source pages are external and you can't do it the db way
Meet us at an IRC irc.freenode.net #bbg as well
https://vimeo.com/36579366 (a must-watch) | Join BOINC - no longer a hype, but you can help never the less

Offline JGadrow

  • Level 35
  • **
  • Posts: 1,133
  • Reputation: +23/-2
    • View Profile
Re: get a particular element
« Reply #2 on: June 07, 2009, 10:42:07 PM »
Do you run both of the servers in your example?

If you do, why don't you simply log into the server from both boxes? You may want to look into creating a web service to serve up this information if you're going to use it on multiple servers...

Not sure if I understood exactly what you're trying to do here. Just figured this made sense if you operate both servers. :)
Idiocy - Never underestimate the power of stupid people in large groups.


Offline Scion

  • Level 27
  • **
  • Posts: 402
  • Reputation: +11/-0
    • View Profile
Re: get a particular element
« Reply #3 on: June 08, 2009, 03:21:53 AM »
@Makari, no i think what he is wanting to do is scrape information of an external site for display on his own....

In which case your going to have to request each users page and then retrieve the required information from it....

The HTML content returned is in the end just a big long string.....and you can parse it by hand if you must.....but my sugestion...learn the XPath langauage now. dont bother with tring to parse the resultant HTML page yourself. XPath should allow you to access the desired values directly. Ive not personally had to use XPath in a PHP setting but i am sure that youll be able to find support for it somewhere.

How i would approach this is to consider the source website as a semi available data repository. Build a buch of helpers that can retrieve information from it for you. However because retrieving that information is very very expensive and it may not allways be available you want to cache it locally.... build up a local DB with the data that youve scrapped from the source website. This is also important because scraping content like this is a very brittle process....it only take the developers of the other site to change a few things and all of a sudden your no longer able to scrape new data....





« Last Edit: June 08, 2009, 03:25:01 AM by Scion »

Offline pavansss91

  • Level 18
  • *
  • Posts: 185
  • Reputation: +1/-0
    • View Profile
Re: get a particular element
« Reply #4 on: June 08, 2009, 04:19:14 AM »
@Nox
Thanks for your starting idea about curl_init()
I got this function to print an entire html page into text file.   

Code: [Select]
function vWritePageToFile( $sHTMLpage, $sTxtfile ) {
     $sh =          curl_init( $sHTMLpage );
     $hFile =                       FOpen( $sTxtfile, 'w' );
     curl_setopt( $sh, CURLOPT_FILE, $hFile );
     curl_setopt( $sh, CURLOPT_HEADER, 0 );
     curl_exec  ( $sh );
     $sAverageSpeedDownload = curl_getInfo( $sh, CURLINFO_SPEED_DOWNLOAD );
     $sAverageSpeedUpload   = curl_getInfo( $sh, CURLINFO_SPEED_UPLOAD );
     echo '<pre>';
     echo 'Average speed download == ' . $sAverageSpeedDownload . '<br>';
     echo 'Average Speed upload    == ' . $sAverageSpeedUpload   . '<br>';
     echo '<br>';
     $aCURLinfo = curl_getInfo( $sh );
     print_r( $aCURLinfo );
     echo '</pre>';
     curl_close(  $sh );
     FClose    (  $hFile );
     echo '(<b>See the file  "'.$sTxtfile.'"  in the same path of the hosting'.
          ' to where this script PHP</b>).<br>';
    }

@Scion
 
How can we stop this if somebody wants to do this to us ??

Now i need to learn about some str function's
Is there a default str function which gives me a part of string which is between two part of strings in big string
EX:
 
Code: [Select]
  $strmain="abcdefghij";
  $strp1="cde";
  $strp2="ij";
  $req=str_example($strmain,$strp1,$strp2);
  echo $req;
 
Output:
Code: [Select]
fgh
bbgFramework v0.1.3
Sun Database Class v0.3

Offline Nox

  • Level 35
  • **
  • Posts: 767
  • Reputation: +12/-2
    • View Profile
Re: get a particular element
« Reply #5 on: June 08, 2009, 04:32:48 AM »
Either use what Scion suggested - XPath (if I'm not mistaken there are also some similar ones), or learn to use regular expressions, which is extremely handy to know, although pretty difficult to learn, at least for me they were...
Google some tutorials/guides (don't know English ones so I'd google too)...other links I can think of (already somewhere here posted):
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
also this program is handy
http://www.weitz.de/regex-coach/

But don't print the page into a file, parse it in the script and you save one step and system resources
Meet us at an IRC irc.freenode.net #bbg as well
https://vimeo.com/36579366 (a must-watch) | Join BOINC - no longer a hype, but you can help never the less

Offline Scion

  • Level 27
  • **
  • Posts: 402
  • Reputation: +11/-0
    • View Profile
Re: get a particular element
« Reply #6 on: June 08, 2009, 06:12:23 AM »
Hmm...

to isolate you from potential changes to the source web page what i would do is create a function called....oh say GetUserInfo(int userId) now in your code you call getUserInfo anytime you need to refresh your data from the source website. The function can return either a User object if your doing OO...or an array/map of the user info fields your interested in.

Your application can now happily call getUserInfo whereever you like....you can even write a test version that returns hard-coded values if you like..

all the information about where on the source webpage the fields are is kept inside the getUserInfo function....if they change their layout or alter their markup then you only need to write a new version that retrieves from the new location and your away laughing again.

...final note XPath goes a long way to protecting you from simply layout changes.

 


SimplePortal 2.3.3 © 2008-2010, SimplePortal