[Week 4] First Contact with Reality
This week was spent fighting the messiness of reality.
Summary
-
Fixed the normalizer used during search and updated tests
-
Began implementing a Storage scheme
Details
As planned in the last blog, I set about fixing the method used for normalization the search query. But soon my mentor @cvwright saw the blog post and my code and corrected me that there was no need to search for the sub-tokens. Additionally, I realized how cumbersome it would be if the user wanted to search for "user@example.com" and the client search for every single appearance of "com".
Hence I rewrote the normalizer method yet again and fixed the tests yet again.
Once that PR was approved and merged, I moved on to a helper class that needs to be merged before we can call the Search algorithm complete: IndexStorage.
As a part of the library's core functionality, we also need to divvy-up the
datastore so that the client can upload them to Matrix content repositories.
Ideally, each level of the datastore will be stored in a single file as is.
But in practice, content repositories usually have a file size limit. Which
throws a wrench into the plans.
The natural next step is to try storing each bucket as a separate file. Problem is, an unbelievably huge number of buckets are empty at the end of Setup. And this is intentional. So if we store bucket-wise, we'll have to store the following file content a billion times:
That's inefficient.
The good news is that I've got an idea. I have to talk to my mentor to confirm that it doesn't break the scheme entirely. If greenlit, I think it'll be easy enough to implement.
Comments
Post a Comment