[Week 4] First Contact with Reality

This week was spent fighting the messiness of reality.

Summary

  • Fixed the normalizer used during search and updated tests
  • Began implementing a Storage scheme

Details

As planned in the last blog, I set about fixing the method used for normalization the search query. But soon my mentor @cvwright saw the blog post and my code and corrected me that there was no need to search for the sub-tokens. Additionally, I realized how cumbersome it would be if the user wanted to search for "user@example.com" and the client search for every single appearance of "com".

Hence I rewrote the normalizer method yet again and fixed the tests yet again.

Once that PR was approved and merged, I moved on to a helper class that needs to be merged before we can call the Search algorithm complete: IndexStorage.

As a part of the library's core functionality, we also need to divvy-up the datastore so that the client can upload them to Matrix content repositories. Ideally, each level of the datastore will be stored in a single file as is. But in practice, content repositories usually have a file size limit. Which throws a wrench into the plans.

The natural next step is to try storing each bucket as a separate file. Problem is, an unbelievably huge number of buckets are empty at the end of Setup. And this is intentional. So if we store bucket-wise, we'll have to store the following file content a billion times:

That's inefficient.

The good news is that I've got an idea. I have to talk to my mentor to confirm that it doesn't break the scheme entirely. If greenlit, I think it'll be easy enough to implement.

Comments

Popular Posts