Pubsub Cache Full-Text Search

submited by
Style Pass
2021-09-27 10:00:04

goffi 3 hours ago jabber-xmpp-en SàT Libervia project libre Libervia progress SàT progress

Hello, it's time for a new progress note. The work is currently focused on ActivityPub Gateway, and progress has been done on pubsub cache search and the base component. Pubsub Cache Full-Text Search Next to the pubsub cache implementation, it was necessary to have a good way to search among items. So far, Libervia was doing pubsub search using pubsub service's capabilities, and notably the XEP-0431(Full Text Search in MAM) implementation. This is working well (it's what is currently used on this very blog when you do use the search box), but has some pitfalls: the pubsub service must implement this XEP (and as far as I know, Libervia Pubsub is the only one which does it at the moment), the search can be done in a single node at a time only, each search request imply a new XMPP request to the pubsub service, and pubsub items must be in plain text (which is currently always the case, but pubsub end-to-end encryption is planned as second part of the granted NLNet project on which I'm working). In regard to that, a local search is necessary. SQLAlchemy doesn't really have Full-Text Search (or FTS) support for SQLite out of the box, but it allows to use any SQL directly, thus I could use the really nice FTS engine available within it (FTS5). This is an extension, but in practice it is already installed most of the time (it is part of the SQLite amalgamation). Thanks to the JSON support in SQLite, it is also possible to filter search requests on parsed data. That's really useful for features like blogs where you often want to do that (e.g. filtering on tags). The cache search can be operated on all data in cache, that means that you can do search on items coming from multiple nodes and even multiple services. That opens the door to features like hashtags or blog suggestions. Last but not least, search requests can be ordered by any parsed field. In other terms it will be possible to order a blog by declared publication date — which may be important if you want to import a blog —, or events by location. To have an idea of the possibilities, you can check the documentation of the CLI search command. Base ActivityPub Component Once the preparatory steps have been done, the ActivityPub component itself could be started. In short, for people not used to XMPP, a "component" is a kind of generic plugin to server. You declare it in your server configuration, choose a JID and a "shared secret" (a password), run it with those parameters, and voilà. For the AP gateway, Libervia runs the component. There is documentation to explain how to launch it, don't worry it's simple. As I've got questions about this, here is a small schema giving an overview on how the whole thing is working: I hope that it makes the whole thing more clear, otherwise don't hesitate to ask me for clarification. As you can see, the gateway includes an HTTP server to communicate with AP software, but in many cases there will already be an HTTP server (website, XMPP web client, etc.). In this case, you'll have to redirect /.well-known/webfinger and /_ap requests to the gateway server. For the development, I'm using Prosody as reference XMPP server implementation, and Mastodon as reference ActivityPub server implementation. I've set a local Mastodon installation, and I've chosen to use Docker for that, as it makes things easy to have a reproducible environment and to save and restore a specific state. It was not as trivial as I would expect to find the right configuration to use, I've found outdated tutorials, but I could manage to run the thing relatively easily. Because we work with HTTPS, I've made a custom docker image with locale certification authority, so Mastodon could validate my gateway HTTP server certificate. I'm already doing that for docker image used for end-to-end tests of Libervia, nothing difficult. Surprisingly though, Mastodon could not resolve my instance, when HTTPie running from the same container could do it flawlessly. I've quickly realised that Mastodon was not respecting hosts declared in /etc/hosts (and added via extra_hosts in Compose file) and found a relevant bug report on Mastodon tracker. That was annoying, and I had to find a way to work around that. I've done it by running a local DNS Server, and Twisted offers a nice built-in one. Twisted DNS can easily use /etc/hosts to direct my local domains to my local IP, it's just a one liner such as twistd3 -n dns --hosts-file=/etc/hosts -r. After that the domain was resolving, but to my surprise, Mastodon was still not able to communicate with my gateway, and even more bizarre my server was receiving no request at all. After a quick round of tcpdump/wireshark, I saw that indeed nothing was sent to my server. Thanks to the Libre nature of Mastodon, I could resolve this by reading the source code, the Mastodon::HostValidationError led me to a section that made the whole picture clear: my server is on a local IP and Mastodon by default refuses to reach it (to avoid the confused deputy attack). With the ALLOWED_PRIVATE_ADDRESSES setting I could finally make Mastodon communicate with my server. The How to implement a basic ActivityPub server tutorial made by Eugen Rochko (Mastodon original developer) is a nice article to start an ActivityPub implementation, it has been useful to build the base component (despite being a bit outdated, notably regarding signature). I have to rant a bit, though, as the ActivityPub specification are not available in EPUB or PDF, making it difficult to read on an e-book reader. I could overcome that thanks to pandoc (git clone then pandoc index.html --pdf-engine=xelatex -o activitypub.pdf), it's really more comfortable to keep the reference like this. So the base component is now available but only usable by developers (and only capable of sending message to ActivityPub for now). Things will be really exiting with the next 2 steps, as bidirectional communications will be available, and the gateway will be usable for early adopters. I don't expect those steps to be really long. Oh, and to answer another question that I've had, yes you can use the same ActivityPub actor identifier as your XMPP JID. I'll explain next time how everything is accessed. That's all for today.

it's time for a new progress note. The work is currently focused on ActivityPub Gateway, and progress has been done on pubsub cache search and the base component.

Leave a Comment