Easily scrape and parse a table stored on a web page.
Author: Colin Tremblay
Date: Saturday, 16th November, 2013
Easily scrape and parse a table stored on a web page.
This project is still in ALPHA, meaning it is not fully functional!
The current version is .7
The project currently works on most HTML tables. Nested tables and tables containing malicious inputs have not been dealt with.
Features similar to those in the javascript version (https://github.com/lightswitch05/table-to-json) by @lightswitch05 are being added incrementally.
To get the parser, simply download the 4 php files in ‘src.’
To use, include HTMLTable2JSON.php in your php file, create a new HTMLTable2JSON object, and call tableToJSON($url);
firstColIsRowName
TRUE
tableID
''
ignoreColumns
array(0 => firstColToIgnore, 1 => secondColToIgnore)
OR array(firstIndex, secondIndex)
.NULL
headers
array(colNum1 => header1, colNum2 => header2)
.NULL
firstRowIsData
TRUE
treats the first row as data regardless of <th>
tags. DO NOT choose this if there are headers in the first row that you want to override.FALSE
onlyColumns
array(0 => firstColToInclude, 1 => secondColToInclude)
OR array(firstIndex, secondIndex)
.NULL
arrangeByRows
FALSE
treats cells as discrete objects. Cells are arranged in arrays by column, where each cell has properties of name, column title, row title, span number, and URL (if applicable).TRUE
treats each cell as a value for the attribute indicated in the column header. With this option, rows are arranged in an array, with column_title : cell_title
pairs as attributes.FALSE
ignoreHidden
style=\"display: none;
should appear in output.TRUE
will suppress hidden rows.FALSE
printJSON
FALSE
leaves the output in the hands of the caller. TRUE
creates a JSON file.TRUE
testingTable
url
.NULL
Note about php and optional arguments: If you wish to use an argument lower on the list, but not one higher, you must still fill in the higher values. To avoid changing the program, use NULL
as the argument for any options you do not wish to change.
sample.php has examples of the correct usage.
For support, feedback, suggestions etc. please email [email protected]
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^