Welcome,
Guest
|
|
I'm going to test on 1.7.3 tomorrow. As I'm on 2.5 b1.
Actually, even if the text seems indexed, I may only make research on title, subjet, description, author. It's not wotking on keywords. And the name of the document itself is not searched... Even in the body, I need to fill it myself to get content. If I do it automaticaly, I see the text of the document, but can't search within it... As I see encoded characters, it might be because of encoding ? Are the title, subjet, description, author,keywords, could be automaticaly populated ? (regarding tags within the original document ?) Thanks ! |
The administrator has disabled public write access.
|
|
Hi,
JiFile use IFile, this uses Lucene for indexing documents. JiFile defines the fields as follows (view IFile documentation, only in italian for now www.isapp.it/documentazione/IFile_Introduzione_1_1_2.pdf) : root: UnIndexed name: Keyword key: Keyword path: UnIndexed filename: UnIndexed extensionfile: UnIndexed body: UnStored introtext: UnIndexed title: Text subject: Text description: Text creator: Text keywords: Keyword created: UnIndexed modified: UnIndexed For more details on the type: framework.zend.com/manual/en/zend.search...standing-field-types Section: Understanding Field Types The "Keyword" type, not tokenized the terms. If string in the "keywords" field is "word1, word2, word3" you can't search for only term "word1" but for "word1, word2, word3". Script to perform some tests. <?php
/**
* Searching the "Terms" in documents
*
* Configuration:
* Analyzer: Utf8_CaseInsensitive (only terms is indexing lower case)
* Filter: none
*
* Fields file prova.pdf:
*
* created: 20120106104410
* creator: nome autore
* extensionfile: pdf
* filename: C:\Documents and Settings\losito\Desktop\prova.pdf
* introtext: Testo all'interno del file
* key: 0ab2fe5e884511562b5f1005ed090aa8
* keywords: parola1, parola2, parola3
* name: prova.pdf
* path: C:\Documents and Settings\losito\Desktop\prova.pdf
* root: C:\xampp\htdocs\IFile
* subject: campo oggetto
* body: testo all interno del file
*
*/
error_reporting(E_ALL);
// include IFileFactory
// if use Joomla and JiFile the path is:
// libraries/ifile/IFileFactory.php
require_once 'sf_ifile/IFileFactory.php';
// Path the index DIRECTORY
$index_path = 'ifile_index_PDF';
try {
// get IFileFactory
$IFileFactory = IFileFactory::getInstance();
// get Lucene Interface
$ifile = $IFileFactory->getIFileIndexing('lucene', $index_path);
// TEST Query
// This configuration is equal at "JiFile Plugin"
$ifileQueryRegistry = new IFileQueryRegistry();
// 1. search "prova.pdf": name file with extension
// This string in Field: "name" - Type:Keyword
$ifileQueryRegistry->setQuery('prova.pdf', null, IFileQuery::MATCH_OPTIONAL);
// "order by" - not configurated in JiFile Plugin
//$ifile->setSort('key', SORT_REGULAR, SORT_DESC);
// execute query
$result = $ifile->query($ifileQueryRegistry);
// Expected result: One Document
printResult("1. Search in all Field:prova.pdf", $result);
// This configuration is equal at "JiFile Plugin"
$ifileQueryRegistry = new IFileQueryRegistry();
// 2. search "prova": name file without extension
// This string (with extension) in Field: "name" - Type:Keyword
$ifileQueryRegistry->setQuery('prova', null, IFileQuery::MATCH_OPTIONAL);
// "order by" - not configurated in JiFile Plugin
//$ifile->setSort('key', SORT_REGULAR, SORT_DESC);
// execute query
$result = $ifile->query($ifileQueryRegistry);
// Expected result: empty
printResult("2. Search in all Field:prova", $result);
// This configuration is equal at "JiFile Plugin"
$ifileQueryRegistry = new IFileQueryRegistry();
// 3. search "parola1"
// This string (portion of the string) in Field: "keywords" - Type:Keyword
$ifileQueryRegistry->setQuery('parola1', null, IFileQuery::MATCH_OPTIONAL);
// "order by" - not configurated in JiFile Plugin
//$ifile->setSort('key', SORT_REGULAR, SORT_DESC);
// execute query
$result = $ifile->query($ifileQueryRegistry);
// Expected result: empty
printResult("3. Search in all Field:parola1", $result);
// This configuration is equal at "JiFile Plugin"
$ifileQueryRegistry = new IFileQueryRegistry();
// 4. search string "parola1, parola2, parola3"
// This string in Field: "keywords" - Type:Keyword
$ifileQueryRegistry->setQuery('parola1, parola2, parola3', null, IFileQuery::MATCH_OPTIONAL);
// "order by" - not configurated in JiFile Plugin
//$ifile->setSort('key', SORT_REGULAR, SORT_DESC);
// execute query
$result = $ifile->query($ifileQueryRegistry);
// Expected result: One Document
printResult("4. Search in all Field:parola1, parola2, parola3", $result);
// This configuration is equal at "JiFile Plugin"
$ifileQueryRegistry = new IFileQueryRegistry();
// 4. search string "testo"
// This string in Field: "introtext" - Type:UnIndexed
// This string in Field: "body" - Type:UnStored
$ifileQueryRegistry->setQuery('testo', null, IFileQuery::MATCH_OPTIONAL);
// "order by" - not configurated in JiFile Plugin
//$ifile->setSort('key', SORT_REGULAR, SORT_DESC);
// execute query
$result = $ifile->query($ifileQueryRegistry);
// Expected result: One Document
printResult("4. Search in all Field:testo", $result);
} catch (Exception $e) {
echo "Errore: ".$e->getMessage();
}
/**
* Print result.
*
* @param strint $type
* @param array $result_T
* @return
*/
function printResult($type, $result) {
echo "Search: ".$type;
if(!empty($result) && is_array($result)) {
echo "<br>Result Search:<br>";
foreach ($result as $hit) {
$doc = $hit->getDocument();
echo "File: ".$doc->name." - Key: ".$doc->key." - Score: ".$hit->score."<br>";
}
} else {
echo "<br>Not result<br>";
}
echo "End print - ($type)<br><br>";
}
?> If you want to change the type of index, you must edit the file: libraries\ifile\adapter\beans\LuceneDataIndexBean.php in the function: getLuceneDocument(). In the next version of IFile and JiFile, we want to configure types of fields, from panel configuration. JiFile for fields title, subjet, description, author, keywords, work in this way: - DOCX, XLSX, PPTX and PDF reads automaticaly information from file if exists tags title, subjet, description, author, keywords within the original document. - HTML reads only the information "title". - For others documents not reads automaticaly this information. For your problem with the search in the content (field "body") in automaticaly indexing, perhaps there is a problem of indexing the content. If you wont, you can send us one your file, for testing this. |
If you like, if it was useful, consider a donation, Thanks
Se vuoi, se ti siamo stati utili, considera una donazione, Grazie Help us by voting our extensions on Joomla.org: JiFile JoomPhoto Mobile Easy Language
Last Edit: 06 Jan 2012 17:33 by Giampaolo.Reason: Mancato allegato file PHP
The administrator has disabled public write access.
The following user(s) said Thank You: crony
|
|
Thanks for all these infos !
I'll make some further testing next week and let you know... As far as I understand, static methods for unindexed fieldtypes could be modified? Because the title not indexed (!!!) is just impossible for our needs... Searching by dates should be interesting too... |
The administrator has disabled public write access.
|
|
Hi, I'm having exactly the same problems. I followed and executed the thread, but now I still have to comment //chmod($path, $perm) (and I also had to download the 64-bit versions).
The problem I'm still having, is that the index does nothing: Total files included: 0 Total files indexed: 0 Total deleted files: 0 Optimization: Not necessary Could this have to do with the following message I receive? popen KO Not defined This function not exist in PHP |
The administrator has disabled public write access.
|
|
Hi,
for problem with server at 64-bit, we are working to release a new version that solves this problem. For now read this thread. For "popen" function , this function is very important for use the XPDF binary (trasforms pdf in txt). Some hosting disable the "popen" function. Contact your hosting for enable this function. If you do not want to index PDF files, then the function "popen" you do not need. |
If you like, if it was useful, consider a donation, Thanks
Se vuoi, se ti siamo stati utili, considera una donazione, Grazie Help us by voting our extensions on Joomla.org: JiFile JoomPhoto Mobile Easy Language
The administrator has disabled public write access.
|
|
Hi, my hosting provider enabled the popen function, so that should work now (it's green + OK).
I now get the same error message as was reported earlier: Error recovery document: Empty body. I've attached one of the documents and I hope you can see what the problem is.
Attachments:
|
Last Edit: 16 Jan 2012 11:35 by Hit Man.
The administrator has disabled public write access.
|
|
We tried your file works for us.
Can you tell me the information on your server? operating system version bits 64-32 all you know thanks |
If you like, if it was useful, consider a donation, Thanks
Se vuoi, se ti siamo stati utili, considera una donazione, Grazie Help us by voting our extensions on Joomla.org: JiFile JoomPhoto Mobile Easy Language
The administrator has disabled public write access.
|
|
Sure (note that I installed the 64-bit versions of X-pdf)
System Information Setting Value PHP Built On Linux <domain>.<ext> 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 Database Version 5.5.19-cll Database Collation utf8_general_ci PHP Version 5.2.17 Web Server Apache/2.2.0 (Fedora) WebServer to PHP Interface cgi-fcgi Joomla! Version Joomla! 1.7.3 Stable [ Ember ] 14-Nov-2011 14:00 GMT Joomla! Platform Version Joomla Platform 11.2.0 Stable+Modified [ Omar ] 27-Jul-2011 00:00 GMT Gecko/20100101 Firefox/8.0 |
The administrator has disabled public write access.
|
If you not change browser settings, you agree to it. Learn more