Decentralized Meta-Data Strategies: Effective Peer-to-Peer Search
Freenet forwards queries according to beliefs about the contents of other nodes; considering file similarity in terms of closeness in a "key-space" generated by a cryptographic hash. Users must know a file's key in order to retrieve it from the network. Files are inserted into particular locations (as opposed to just shared in the Gnutella network) and combined with aggressive caching activity the arrangement of files ends up reflecting that of the key-space.
Types of meta-data:
- Document Hash: id generated from the document contents via some hashing algorithm that ideally will be unique to each document;
- Document Id: id assigned arbitrarily to a document according to some scheme - different form a hash in that it must be generated by some authority;
- Statistical Representation: representation generated by performing a statistical operation on a document, that may involve statistics relating to a larger document collection, e.g. TFIDF.
- Human assigned: keywords or more complex statemens such as RDF.
Approaches to Markup:
- TFIDF (Term Frequency Inverse Document Frequency), an Information Retrieval approach of Salton e Yang's that rates the degree to which words are representative of a document. VSM (Vector Space Model) and LSI (Latent Semantic Indexing);
- XML and RDF.
Strategies:
- Semantic Routing: a query is routed according to the meta-data contained in that query.
- DHT (Distributed Hash Table): specifies a relation between entities (file, documents etc.) and a position in a distributed network.
