Does anyone know of an #OpenAccess full-text #PDF #search engine/tool using which I can search for relevant PDFs from a self-hosted #database?
Context: we have a curated database of #research articles but so far our search capability has been limited to tagged keywords or title and abstract field search only. We'd like to be able to search the entire PDF.
Side note: I know that PDFs are not a great way to store scientific information. I'd prefer not to use a proprietary #LLM if possible
[#]LexicalSearch #SemanticSearch #AskAcademia #academia #science #sciences #ScienceMastodon #AskFedi #OpenScience
=> More informations about this toot | More toots from manisha@neuromatch.social
@manisha Not sure if that really fits what you want but #Zotero does full-text search of PDFs: https://www.zotero.org/support/searching
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle thank you!! It looks like a good option. Someone else also suggested it. I've used it as a reference manager but never dived into its full-text indexing feature. Do you happen to know if Zotero libraries can be made public?
=> More informations about this toot | More toots from manisha@neuromatch.social
@manisha I haven't done it myself but it looks like it's just a matter of setting the library's visibility to "public": https://forums.zotero.org/discussion/88809/keep-public-library
=> More informations about this toot | More toots from elduvelle@neuromatch.social
@elduvelle
@manisha
Ya ya zotero would def be how I'd do it. Do you need it to be public so that it can be publicly full text searchable from a website or smth? Slightly different problem than being able to do it on your own zotero client but I'd still use zotero as the basis of the collection
=> More informations about this toot | More toots from jonny@neuromatch.social
@elduvelle thank you so much for that link!
@jonny yeah, need it to be publicly searchable which I think is doable...
=> More informations about this toot | More toots from manisha@neuromatch.social This content has been proxied by September (ba2dc).Proxy Information
text/gemini