#lumosql Log

12:01:18 <danshearer> #startmeeting lumosql
12:01:18 <lumosql-meetbot> danshearer: Meeting started at 2022-03-18T12:01+0000
12:01:19 <lumosql-meetbot> danshearer: Current chairs: danshearer
12:01:20 <lumosql-meetbot> danshearer: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting
12:01:21 <lumosql-meetbot> danshearer: See also: https://hcoop-meetbot.readthedocs.io/en/stable/
12:01:31 <Labhraich> #here Claudio
12:01:33 <BKJ621> here björn
12:01:34 <danshearer> #here Dan Shearer
12:01:41 <rubdos[m]> #here Ruben
12:01:43 <gabby_bch> #here Gabby
12:01:43 <BKJ621> #here Björn
12:01:51 <rubdos[m]> here while eating lunch, but still here.
12:02:01 <danshearer> great. Welcome everyone
12:02:19 <Labhraich> I'll have to nip to the kitchen when coffee is ready, but won't take more than a few seconds
12:02:55 <danshearer> Ok so I thought maybe quickly running over the actions from last meeting, then what we have been up to.
12:03:02 <danshearer> And plans for Brussels.
12:03:08 <danshearer> checks Iowa timezone
12:03:44 <danshearer> Ok, its 0700 in Iowa so Bayes probably won't be joining us. Something to remember for future meetings perhaps.
12:03:57 <Labhraich> I was supposed to clean up the combined database, which I've done.  Also made changes as a consequence of doing that
12:04:03 <danshearer> woo
12:04:11 <danshearer> #topic Actions from last meeting
12:04:39 <danshearer> #info Claudio cleaned up combined database, and made some fixes that became obvious during that
12:05:33 <danshearer> #info Dan has written some talk abstracts but hasn't had them approved by the team yet.
12:06:13 <danshearer> #info First week of April is Brussels holidays, but some VUB staff are around on Monday 4th April
12:07:11 <danshearer> #action danshearer1 circulate talk abstracts/pull abstract from any volunteers, then publish asap
12:07:16 <rubdos[m]> Holiday for students, so professors tend to take a bit off too, but nothing of importance
12:07:38 <danshearer> Ok so does it matter if we do public talks Monday or Wednesday?
12:07:40 <Labhraich> I'm sure we'll cope without the students :-)
12:07:53 <rubdos[m]> Fewer students is better during that time :-)
12:08:00 <rubdos[m]> You can do talks whenever you please
12:08:03 <rubdos[m]> An is only around on Monday
12:08:07 <danshearer> I mean, if you work in some technical department in Brussels it won't make any difference?
12:08:20 <rubdos[m]> Indeed, it's only a student holiday week
12:08:29 <rubdos[m]> I won't take any time off
12:08:44 <danshearer> ok I misunderstood that. Perfect timing then. As designed and perfectly executed from months ago. Amazing.
12:08:56 <rubdos[m]> Exactly.
12:09:36 <danshearer> ok so I propose Wednesday. Because we can rewrite all the code in the preceding two days.
12:09:42 <danshearer> ok?
12:09:48 <rubdos[m]> sure
12:10:14 <danshearer> #accepted Talks in Brussels will be on Wednesday 6th April
12:11:14 <danshearer> #info I submitted some large cluster runs, and we noticed some limits as a result, but also there is some useful data runs too.
12:11:46 <rubdos[m]> Feel free to document said limits, I can see what I can do
12:11:51 <Labhraich> OK, so we arrive Sunday, spend 48 hours changing everything, then rewrite the talks overnight because they are no longer accurate, and give the talks on Wednesday?
12:12:08 <danshearer> gabby_bch, wanna say what you got done on graphs in the last two weeks? Because the overall result was a win as far as I can see.
12:12:12 <rubdos[m]> Isn't that how a hackathon/code sprint should work, Labhraich ?
12:12:28 <rubdos[m]> I don't recall anything different from other hackathons
12:12:54 <danshearer> Labhraich, I am predicting the future, and therefore everything I say is correct.
12:13:18 <Labhraich> As correct as weather forecast?
12:13:32 <danshearer> Not that good.
12:13:42 <Labhraich> "It's going to rain in Scotland and/or Ireland within the next 365 days"
12:13:44 <danshearer> But, we will definitely have a future. I guarantee it.
12:14:08 <BKJ621> Well that was deep
12:14:39 <Labhraich> OK, that's reassuring.  Is this future without putin?
12:14:53 <danshearer> It depends on the effort you put in
12:15:38 <Labhraich> Sorry for going off-topic, but one can't turn anywhere without hearing these things
12:15:56 <gabby_bch> I think i made them show up correcly for all sqlite versions, and I've now added tooltip that shows up when you hover above the line to make it easier to distinguish, I also want to ask what is the meaning of 1000,1 and 1,1000 datasize?
12:16:37 <Labhraich> It's a multiply factor on some operations
12:16:59 <danshearer> It was also me going batshit bananas on trying to break things.
12:17:15 <gabby_bch> yeah, and what does 1 indicate?
12:17:38 <Labhraich> 1000,1 means that reads are performed 1000 times more than the default, and writes are unchanged. 1,1000 means writes are multiplied by 1000.  1000 (which is an abbreviation for 1000,1000) means all such operations are multiplied
12:17:56 <Labhraich> That obviously has a large impact on the total timings
12:18:05 <gabby_bch> I see , cheers :)
12:18:59 <gabby_bch> I will add the explanation above the plot
12:19:15 <danshearer> From the docs: "DATASIZE option having two numbers, a read datasize and a write datasize;"  But I notice there should probably be a little more info in there about that.
12:19:34 <Labhraich> We used to have a single multiplication number affecting both reads and writes, so when I split into a separate "read,write" I had the single number remain as an abbreviation for the old behaviour - and old benchmarks don't need to be repeated just to fix that
12:19:48 <danshearer> gabby_bch, also I found the parameter to make the plot background black, not grey.
12:20:04 <Labhraich> My eyes thank you for that
12:20:33 <danshearer> #info Gabby and Dan finalised the recipe to make R Shiny output high contrast
12:21:14 <rubdos[m]> #info Ruben committed a BibLaTeX file to the lumodoc repo, which will be incrementally updated from now on.
12:21:15 <danshearer> I needed to install an R module from non-CRAN source for that, although there is infrastructure to make that easy and it wasn't a problem
12:21:25 <Labhraich> Does it (or can it) take the option from the URL and/or from a cookie?  So that people can set preferences and be happy?
12:23:28 <danshearer> Both yes and no. There are three components: outer HTML's CSS; the CSS in the inner iframe in which the R app runs; and ggplot. The first two can be switched. And so can the third, but it needs some R code to read the HTML environment and take action accordingly. The community in #R is very helpful and friendly.
12:24:20 <danshearer> #info Dan has researched how to automatically switch contrast depending on the user selection.
12:25:13 <danshearer> #action Dan to write up the R contrast research, and get comment/fixes from author of the ggplot high contrast module.
12:25:41 <danshearer> I think there is momentum within the R community to maintain this if I just pull it together a little.
12:26:39 <danshearer> gabby_bch, I think we need a section on benchmarking later on in this meeting
12:26:46 <danshearer> ok, anything else from last meeting?
12:26:51 <danshearer> oh yes
12:27:06 <danshearer> #info Dan has booked travel to Brussels
12:27:21 <danshearer> what other definite bookings?
12:27:32 <BKJ621> #info all has booked
12:27:46 <danshearer> great
12:27:59 <BKJ621> ie Gabby, Claudio & me
12:28:21 <danshearer> #info Gabby, Claudio and Björn also booked
12:28:53 <danshearer> I'd like to do a quick topic on infrastructure
12:29:00 <BKJ621> #info Björn will initiate payout of ms-part a for tickets
12:29:26 <danshearer> thanks for that Björn.
12:29:53 <danshearer> #topic LumoSQL infrastructure
12:30:00 <danshearer> Just briefly, so everyone is aware
12:32:15 <danshearer> #info r.lumosql.org is a VM with every conceivable development option and R environment etc. It has been our essential playground. We now have some R code in the main LumoSQL Fossil tree and that is where it needs to be maintained with everything else. So at somepoint r.lumosql.org will become more of a production machine, with https enabled. But it won't be the master for any data.
12:33:32 <danshearer> #info Ruben has enabled a second worker on the VUB cluster, so two LumoSQL jobs can run at once. Very handy if one of those jobs is busy writing extremely large amounts of data with DATASIZE=1,1000 or something
12:34:31 <Labhraich> ... which looks like it's only 19 hours from completing
12:34:41 <rubdos[m]> If you need more workers occasionally, I'll enable them. It's not difficult to scale out, but I just need to scale them back because of other people working on the machines (which shouldn't influence your measurements)
12:35:52 <danshearer> #info Ruben has improved the script in the repo kbench/work-loop.sh, and that is what is run on the VUB cluster when we submit jobs using 'curl' according to the documentation. Ruben has been administrating the cluster from our point of view, including liasing with the actual administrators. Thanks Ruben.
12:36:37 <rubdos[m]> Well, admin is on sick leave today, but usually he sits in the office next to me :-)
12:37:11 <danshearer> #info this meetbot is under active development, and already I hope today's meeting notes will be better as a result. Let me know if you think of things that would make it easier or better.
12:39:32 <danshearer> #info we do not have a LumoSQL buildfarm, but we need one. I have looked at two public buildfarms and applied for a team account at one of them. The purpose of a buildfarm is to compile our code on lots of different kinds of computer and operating system and report where it breaks. It would be good if we could recruit a volunteer who likes this sort of thing and I am sure we will at some point. But for now my goal is to get a buildfarm
12:39:32 <danshearer> happening, even just one.
12:40:00 <danshearer> #action Dan to test at least one buildfarm and discuss setting it up with the team
12:40:02 <rubdos[m]> I mean, I like that sort of thing, but I already have too much stuff on my hands :|
12:40:26 <Labhraich> I have my mini-buildfarm but won't have the time to look into another one
12:40:41 <danshearer> rubdos[m], exactly. And Claudio has a pretty decent buildfarm in his house, but a farm needs a gardener. And an electricity generator.
12:40:46 <rubdos[m]> If you find something that I can easily throw on Kubernetes, I could host some x86_64 stuff
12:40:57 <rubdos[m]> I have a farm against Gitlab.com with gitlab workers
12:41:01 <rubdos[m]> but I suspect that's out of scope
12:41:08 <Labhraich> You can also easily host ather architecture, as long as you can install qemu
12:41:28 <danshearer> rubdos[m], Labhraich noted. I will see what is on offer from funded buildfarms and report back.
12:42:09 <danshearer> If other people are dedicating resources to this and offering them for free then we should consider it. I will summarise.
12:42:27 <danshearer> ok I think that's everything on infrastructure
12:42:40 <danshearer> ten seconds to disagree...  :-)
12:42:41 <Labhraich> But something with real hardware would be better - otherwise we can't tell for sure if a problem is a Lumo bug or a qemu bug (although of course, qemu is well tested - but I've seen a VM failing to boot when emulated but the same image working fine on real hardware, so...)
12:42:58 <danshearer> Labhraich, right.
12:43:07 <rubdos[m]> We'll soon have a Pi4 farm on that cluster too...
12:43:37 <danshearer> #info Buildfarms are important, because LumoSQL is intended for embedded deployment, and that means mobile hardware, which means huge diversity but also huge investment in making buildfarms available.
12:43:59 <Labhraich> I'll have a pre-test on qemu-system-aarch64 just to make sure we can move to the pi4 farm
12:44:21 <danshearer> Labhraich, excellent
12:45:04 <danshearer> ok I propose moving on to benchmarking
12:45:12 <Labhraich> Need to install a dev environment, but that's just downloads (and I do have fibre now, so that won't take any time)
12:45:49 <danshearer> has configured gradle for automated android builds before...
12:46:08 <Labhraich> ack - benchmarking it is - I'll be away for 1 minute to top up with coffee though
12:46:25 <danshearer> ok me too actually
12:46:43 <danshearer> benchmarking includes gabby_bch's work and Bayes too
12:46:48 <danshearer> gets tea
12:49:22 <Labhraich> is back with coffee
12:52:49 <danshearer> back
12:53:09 <danshearer> #topic Benchmarking
12:53:14 <danshearer> well!
12:53:36 <danshearer> Thankyou Gabby for doing the work that got Bayes interested.
12:53:58 <danshearer> But overall:
12:54:53 <danshearer> #info Benchmarking tools and database now feels ready to be used by other people. The database design and tool approach has been reviewed by several people in #R and they think it is good.
12:55:33 <danshearer> so that validates Claudio. They call us 'scientists with good data'. whatever :-)
12:55:47 <danshearer> *validates Claudio's work, I meant.
12:57:07 <danshearer> #info Bayes committed some preliminary notes here: https://lumosql.org/src/lumosql/doc/tip/analysis/contrasts/README.md
12:57:33 <danshearer> which includes me and Claudio answering questions about what the data means.
12:58:10 <Labhraich> It may be worth extracting these questions from the IRC logs and putting them somewhere in the doc/ directory...
12:58:19 <danshearer> Gabby's work needs to be extended. What ever Bayes turns up with, it will need surrounding text and graphs and layout. I notice Bayes committed R code but I haven't looked at it at all.
12:59:03 <danshearer> Labhraich, do you mean in addition to the table that you and I filled in? Because that is already in the tree in the file notes.md
12:59:32 <Labhraich> Yes, because the table says what's important in the model, but doesn't say what these keys mean - which might be useful documentation
13:00:37 <danshearer> gabby_bch, you do seem to like R. Are you ok to track what Bayes is doing? I am sure he is running R locally and it makes sense for him. Maybe we should make sure his code is running on r.lumosql.org?
13:01:00 <danshearer> Labhraich, oh I see, yes definitely
13:01:17 <gabby_bch> sure ,  I will have a look
13:01:56 <danshearer> #action Dan to extract comments from irclog to doc directory regarding the meaning of fields in the benchmarking database
13:01:57 <Labhraich> Two weeks ago I said I might be persuaded to give a talk about how we run the benchmarks (somebody else will have to explain how we make sense of the results...) and if I am going to do that I might as well document the data we collect
13:02:40 <danshearer> #action Dan to ask Bayes if (a) he thinks attending meetings might be useful and (b) if so, what time in Iowa is good for him
13:02:41 <rubdos[m]> Do we want recording of the talks?
13:02:42 <Labhraich> In addition to the questions I've already answered which don't cover all the data
13:02:48 <danshearer> rubdos[m], yes please
13:03:39 <rubdos[m]> #action Ruben to find out how to record the talks.
13:05:37 <danshearer> rubdos[m], to illustrate, I gave a talk at LCA2020 on LumoSQL, with no preparation beyond a brief time to invent slides. That has seen 4000+ views, and I think the video is in other places too. https://www.youtube.com/watch?v=ukktq_79Z6Q
13:06:17 <danshearer> So people will be interested.
13:06:36 <BKJ621> marketing is needed
13:06:56 <danshearer> (the LCA talk was in Australia, and there were really terrible bushfires, and I took the place of someone who was fighting the fires.)
13:07:18 <danshearer> In fact I think I even included a slide on the fires at the end.
13:08:08 <danshearer> BKJ621, this is also marketing for VUB, and VUB is a university which means it is partly measured by academic marketing. So yes.
13:08:46 <BKJ621> Some marketing is good
13:08:51 <danshearer> rubdos[m], I wonder if Bayes might like to present from Iowa? That would be fun.
13:09:05 <danshearer> Claudio first, and then a giant head on the screen from America.
13:09:20 <danshearer> #action Dan to ask Bayes if he would like to be a giant head on the screen in Brussels.
13:09:33 <Labhraich> danshearer: who is this weird person?    https://video.fosdem.org/2020/K.1.105/lumosql.mp4
13:09:46 <rubdos[m]> We can certainly arrange such things
13:10:28 <rubdos[m]> I'll think next week about the practicalities of microphones and such
13:10:43 <danshearer> Labhraich, you're joking. I had forgotten that. Well that demonstrates a point - I have no idea how many people watched the copy you pointed at, but I see the youtube version had 600 views. I don't think ever in my life have I counted views before :-)
13:11:01 <danshearer> rubdos[m], ok.
13:11:17 <danshearer> thinks about asking D Richard Hipp to present from remote.
13:11:26 <danshearer> (Richard Hipp is the author of SQLite)
13:11:52 <Labhraich> Make sure you schedule these people in the afternoon, so it's a reasonable time in their timezone
13:11:56 <BKJ621> Do not forget  US is on DST now. Europe goes on DST next weekend
13:13:01 <danshearer> And Australia, being as strange as its marsupial life, is also somewhat on DST. Where states that are *vertically* above each other North-South are up to *two hours different*
13:13:23 <danshearer> TheMeta-Zucc[m], thankyou for the reminder. I must move this meeting on.
13:14:05 <rubdos[m]> I've silenced the warning again until Monday 3pm cest
13:14:08 <rubdos[m]> or utc
13:14:09 <rubdos[m]> hmm
13:14:14 <danshearer> #action Dan to invite other people to present.
13:14:26 <danshearer> ok I think we are done on benchmarking.
13:14:58 <danshearer> ohhh I forgot an infrastructure thing. I will edit the notes to put it in the correct place
13:16:06 <danshearer> #info Dan has tested an ircv3 bouncer called soju, https://soju.im . This will keep a log of channels for anyone with an account, so BKJ621 and anyone else can always check in and see what is going on.
13:16:26 <BKJ621> great
13:16:50 <Labhraich> That may be useful.  Although being connected from 2 places at once (one in the UK and one in Holland) I am not likely to miss anything :-)
13:17:23 <danshearer> ircv3 is updating irc for the modern age. It isn't dead yet. And I'm the one trying to do something with Signal, ha !
13:18:23 <danshearer> scrolls back to look for the next topic
13:19:30 <danshearer> #topic Where LumoSQL is up to
13:19:43 <rubdos[m]> (Is this were encryption stuff goes in?)
13:19:57 <danshearer> no. This is:
13:20:04 <danshearer> #topic LumoSQL Encryption
13:20:06 <danshearer> go for it
13:20:12 <rubdos[m]> Ha.
13:20:15 <danshearer> steps away from the keyboard
13:20:40 <danshearer> gabby - we will get to docs after Ruben has waved his wand and Claudio taken a bow
13:20:50 <Labhraich> bows
13:21:18 <rubdos[m]> So. I have shared with some of you already a draft of the steps to be taken to have encryption in LumoSQL. Row-based encryption will be the most difficult (as expected), but might even be more difficult than I initially judged.
13:21:29 <rubdos[m]> Read: it's gonna be very interesting, also from academic POV, so that's awesome.
13:22:23 <rubdos[m]> I also claimed an intern/master student with background in mathematics to write a bit of code for me. We'll work together towards an SSS-based ABE scheme (which is not collusion resistent, but we don't care at this point), implement it, and see how it works with SQL-style roles
13:22:34 <rubdos[m]> I also concluded that SQL
13:22:38 <danshearer> Just to be clear: the difficulty is because of the ambition to have multiple levels of read/write within one row?
13:23:05 <rubdos[m]> -style privileges are pretty simple if you consider the whole ABE thingy; ABE is much more capable than what a standard PostgreSQL offers in terms of privileges.
13:23:21 <rubdos[m]> danshearer: It's mostly difficult because of indexing
13:23:26 <rubdos[m]> that's going to be the real academic challenge
13:23:38 <rubdos[m]> privileges and multiple read/write access levels is easy enough
13:23:59 <Labhraich> And the actual encryption of row data (without an index) is easy enough
13:23:59 <rubdos[m]> Someone once thought me to explain abbreviations before dumping them. SSS = Shamir secret sharing
13:24:09 <rubdos[m]> yes
13:24:21 <rubdos[m]> the ABE-based encryption is easy enough too. Just a bit of reading and fumbling
13:24:50 <danshearer> ok. So you are taking a solution I proposed to delivery privacy and transparency and portability and security.... and actually making it useful, because everything ultimately goes into a database. Just to be clear.
13:25:11 <rubdos[m]> If everything goes well, we'd have some ABE scheme and its interface around the meeting time. If it doesn't go well (which means time constraints at my side, and not enough programming experience by the student), it's gonna be a bit later.
13:25:42 <rubdos[m]> danshearer: Well, that was the goal, right?
13:25:50 <Labhraich> I better get on with metadata stuff then...
13:26:06 <danshearer> rubdos[m], yes, translating anything I say into something that is useful as opposed to hot air is a Very Good Thing.
13:26:32 <rubdos[m]> Well, my goal is not to implement everything against Lumo already; I want first the primitives and the cipher suite system in place, then the linking with Lumo, and then we do row-based and indexing :)
13:26:54 <rubdos[m]> danshearer: Ah, that way :-)
13:27:23 <danshearer> the entire point of portability/transparency in privacy is that a Potato-shaped Piece of Data can go from my mobile phone to your banking system.
13:27:49 <rubdos[m]> Oh, that too.
13:27:55 <rubdos[m]> The SSS-based system should be portable enough
13:28:03 <danshearer> However, your banking system will want to index that data, and if it can't get any more info than that it is a Potato-shaped glob, well, that isn't so useful.
13:28:25 <danshearer> Therefore, this is exactly what is needed, at multiple levels, and not LumoSQL-specific in that sense.
13:28:47 <rubdos[m]> So what you're saying is that a lumion should carry index-related metadata
13:29:00 <rubdos[m]> I'll store that in my head.
13:29:01 <danshearer> I am saying it should not be index-resistent.
13:29:15 <danshearer> I think that is a more useful academic property
13:29:16 <rubdos[m]> It should however not leak anything either, if it doesn't want to.
13:29:29 <rubdos[m]> Lumions are sentient already. I like it.
13:30:34 <danshearer> Therefore we have just in the last two minutes tied together another couple of bits of this work. Indexability is essential within a LumoSQL database, and a Lumion row needs to have the ability to be indexable wherever it travels.
13:30:36 <danshearer> Well done.
13:30:47 <rubdos[m]> If it wants to!
13:30:50 <danshearer> *optional abilirty
13:30:53 <rubdos[m]> You have to respect the Lumion's privacy.
13:30:55 <danshearer> yes
13:31:04 <rubdos[m]> Seeing it that way makes a lot of sense imo.
13:31:08 <rubdos[m]> Thanks for the insight!
13:31:09 <rubdos[m]> waves.
13:31:43 <rubdos[m]> (still sounds like OPE now that I think about it, but we'll figure something out.)
13:31:48 <Labhraich> I guess the Lumion will carry around not just the data but also the indexing of the data - the bit which it's prepared to tell
13:31:57 <danshearer> (at worst, a Lumion has the GUID which is going to be indexable, but we don't mean that.)
13:33:09 <rubdos[m]> waves away the afterthought.
13:33:33 <danshearer> Ok would you care to make some #info points about that?
13:33:39 <rubdos[m]> ack
13:33:41 <danshearer> *about all of the above
13:33:56 <Labhraich> So when we take a benchmark result as an example of Lumo, it'll contain a list of key,value pairs, and some of these pairs will be marked as index data (whatever helps searching for results - which seen in this light has nothing to do with the index inside the benchmark database, which is there to make sqlite work less)
13:34:13 <rubdos[m]> #info Ruben et al are working actively at a proof-of-concept ABE library that will plug into lumoSQL.
13:34:31 <rubdos[m]> #info First version will not be collusion resistant and will probably have other issues (performance), but it should be secure and interesting.
13:34:51 <Labhraich> "an example of Lumion".  Not of "Lumo".
13:35:36 <rubdos[m]> #info Indexing is going to be the most difficult part, and the most interesting, and will be postponed to one of the last phases.
13:35:50 <danshearer> Labhraich, yes. This is a database design overlaid on the happy coincidence that it is implemented in a database. A bit like Fossil, which implements a completely non-relational tree data design in what happens to be a relational rows and columns database.
13:36:08 <rubdos[m]> #info In context of a Lumion, that means the Lumion will probably have to carry metadata for populating the index at the receiving side.
13:36:17 <rubdos[m]> I think that's most of it?
13:36:42 <Labhraich> Seems like a good summary to me
13:37:24 <danshearer> rubdos[m], and when we meet you and your colleague(s) will perhaps be presenting on this topic on Wednesday?
13:37:45 <rubdos[m]> We might, if we have something interesting to present already by then.
13:38:54 <danshearer> Will you have questions by then? Because the questions are the hardest bit, really. I mean, if we ask the correct questions then ultimately we can find cryptographers to answer them. But if we never ask the right questions...
13:39:43 <rubdos[m]> If there are questions, I will probably have them by then
13:40:16 <danshearer> If you don't know what questions Lumions answer, then we have a problem. But I know you know that. Ok, moving on.
13:41:01 <danshearer> #topic Documentation
13:41:18 <danshearer> This will be brief I suppose. Basically, Gabby has been working and I have been slack.
13:41:57 <danshearer> #action Dan to review Gabby's doc changes and proposed tree consolidation
13:42:00 <danshearer> Gabby
13:42:07 <danshearer> can you do a phonecall after this meeting?
13:42:27 <gabby_bch> yes that would be good
13:42:40 <danshearer> Can you summarise where you are up to?
13:43:51 <gabby_bch> there,s still a lot to be done, Ive mainly written an about page that's more comprehensive
13:44:23 <gabby_bch> and extended the benchmark discussion
13:44:28 <danshearer> we need to sort out the toolchain too
13:45:11 <danshearer> however, about now is the perfect time to really get going on it again because:
13:45:36 <danshearer> * we have a way of storing references. Ruben isn't the only one with references, there are many dozens of them in the existing docs.
13:46:10 <danshearer> * we have a much better idea of what LumoSQL is for :-)
13:46:43 <danshearer> * we have a little teeny-tiny userbase, and they need some better docs.
13:47:17 <danshearer> so that our teeny-tiny userbase can become a much bigger one, of course
13:48:08 <danshearer> * we have some infrastructure to write docs about now too. cluster benchmarking. cluster building. pulling results in and making graphs.
13:49:33 <danshearer> I frankly don't even know if it should be in its own tree. That was my original concept, and certainly we need to get it much more tidy where it is. But my original idea was that the toolchain would be so different it should be another tree. Maybe that was a good decision, but I don't know.
13:50:29 <danshearer> Perhaps it would be better if it was in one place and then tiny updates could happen
13:50:36 <danshearer> as the developers notice things.
13:50:46 <danshearer> Also, I welcome our distinguished Iowan guest.
13:51:04 <Bayes> lol
13:51:19 <Bayes> thanks, sorry I'm very late
13:51:55 <danshearer> ok Bayes we can run a couple of things past you in a minute.
13:52:09 <danshearer> is there anything more on documentation? Happy to take all comments/suggestions...
13:52:59 <Bayes> I couldn't make much progress since I pushed the last notes to the repo. That said, I'll try to interact more with gabby_bch so we can start "merging" the tools, if/when it makes sense to do so
13:53:46 <danshearer> That's great. When the notes of this meeting get published (in a few minutes) you'll see some of us have action items that might be relevant to you.
13:54:06 <danshearer> That would be great
13:54:31 <danshearer> gabby_bch was going to look at making your code run on r.lumosql.org
13:54:34 <Bayes> What I'm doing right now is I'm trying to design a model, so far a linear model with random effects and all that, to analyze the benchmarks. This model should be able to answer questions such as "how faster is version A over B", "does the backend affect timing overall?", "how faster is NVME vs ssd", etc. Basically, it'd let us quantify differences
13:54:34 <Bayes> in all the configurations tried
13:55:13 <Bayes> oh my code is very much WIP so there's nothing to have on the shiny app _yet_, mostly cause I started with documenting the model rather than getting straight into coding it
13:56:38 <Bayes> 1. The docs I've read from you all make sense to me, so no comment on that
13:56:38 <Bayes> 2. At some point, maybe next week, I'll try to start gathering from you all the kind of questions you'd like to get answered with this model
13:57:16 <danshearer> ok well thankyou very much
13:58:02 <danshearer> moving on to the final catch-everything topic...
13:58:25 <danshearer> #topic All other things we forgot above
13:58:40 <gabby_bch> Bayes , are you putting your code to the same github repo as the docs?
13:59:08 <danshearer> https://lumosql.org/src/lumosql/dir?ci=tip&name=analysis/contrasts/R
14:00:29 <danshearer> Bayes, are you finding Fossil ok?
14:00:59 <Bayes> Fossil is alright yep
14:01:19 <danshearer> okidoke
14:01:30 <danshearer> well under "stuff we didn't cover earlier"
14:02:32 <Bayes> gabby_bch no, I moved from github to Fossil completely. I figured we could share the "analysis" folder (https://lumosql.org/src/lumosql/dir?name=analysis&ci=tip), maybe you and me could have many subfolders with different "analyses". In my case, I created the contrasts subfolder since mostly I'm setting up a model to run contrasts between sqlite
14:02:33 <Bayes> versions and such
14:03:37 <Bayes> maybe as an action item for me (dunno how to add this to the meeting notes), I could improve the README in those folders to make the structure more evident
14:04:07 <Labhraich> Well, we've had the impression that some versions of sqlite weren't really faster than previous versions, although the overall tendency is to have newer=faster.  So that's definitely one question which came up - compare different sqlite versions.  And same question for LMDB
14:04:12 <danshearer> #action Bayes to improve the READMEs under analyses/ to describe the structure
14:04:37 <Bayes> thanks danshearer I was scrolling up trying to find the right pattern ^_^
14:05:10 <danshearer> benchmarking is totally vital, in fact steering the whole direction in many ways. However, I think after 10k benchmark runs we would know already if LMDB was always massively faster under all circumstances. And it is not. There is a lot to unpick and the correct modelling will help us.
14:05:31 <danshearer> So we can make a reasonable guess from this...
14:05:38 <Bayes> Labhraich that's clearly some insight I'm missing here, so maybe next will I'll set up a document where everyone can chime in and add their questions or hypothesis. I'll collect them and use the model (which isn't finished yet tbh) to answer them.
14:06:22 <Labhraich> THis may be useful if you need to find a command for meetbot:  https://hcoop-meetbot.readthedocs.io/en/stable/
14:06:45 <danshearer> #info it seems plausible that the SQLite key-value pair store is at least reasonably competitive.
14:06:49 <Bayes> sweet, thanks Labhraich
14:07:08 <gabby_bch> there is one more thing, I was talking to Matthew Croughan (we live together) he is sometimes in this chat and he's interested in coming to Brussels for the meetup, he knows a lot about software and will definitely be more useful than me, he can cover his travel expences I'm sure.
14:12:06 <danshearer> I had some discussion with Matthew, and just like anyone else if he has something specific he can contribute then that would be great and he would be welcome. Matthew expressed interest in contributing a NixOS flake, which is great, because Nix is something that relies on SQLite and has some weaknesses as a result.
14:13:57 <danshearer> The discussion with Matthew was also useful regarding docs, because he found it hard to understand the technical basis of what we were doing or why, and that is because I did not communicate clearly enough in the documentation. Did I already say I need to work together with Gabby on the docs? :-)
14:14:19 <Labhraich> I suppose at some point we'd have an installable version of the benchmarking, with a command to run rather than "make benchmark LOTS_OF_OPTIONS" but that's never looked like a priority
14:14:45 <danshearer> Very much so.
14:14:47 <Labhraich> And a configuration file which isn't just a makefile fragment (also not quite looked like a priority...)
14:17:19 <danshearer> #action Dan to discuss with matthewcroughan attending Brussels
14:17:37 <danshearer> getting back to my point above...
14:17:43 <Labhraich> Also when we do that it may change the way we decide what to run (because "looking for benchmarks in the source tree" would no longer be the thing), so for now I'm pretending we aren't interested in installing anything
14:17:48 <rubdos[m]> danshearer: keep me posted on the people count if possible.
14:18:06 <rubdos[m]> Not that we cannot host many people, but just so I size everything accordingly.
14:18:36 <rubdos[m]> I don't want FOSDEM-style crowded rooms.
14:19:09 <danshearer> #action As soon as it is confirmed that the SQLite k-v store is at minimum reasonably competitive, approach Richard Hipp about his existing approval in principle to make an API for the SQLite k-v store (which will then instantly become the world's most deployed k-v store if it is part of SQLite)
14:19:16 <Labhraich> Hmmm, don't remind me of these queues at the door extending until after the next talk :-)
14:19:38 <danshearer> rubdos[m], there are two kinds of headcount
14:19:46 <rubdos[m]> I'm aware
14:20:04 <danshearer> 1. headcount for LumoSQL team and related folk
14:20:16 <Labhraich> Are the talks on Wednesday public?
14:20:27 <danshearer> 2. headcount attending talks that are not yet published but will be asap after this talk and people approve the abstracts
14:20:37 <danshearer> Labhraich, that was my idea indeed
14:20:52 <Labhraich> Then that headcount is just going to be a best guess?
14:21:29 <danshearer> Yes, but if there are no students and we ask ruben nicely we might be able to have a lecture theatre big enough that we will almost be lost :-)
14:22:32 <danshearer> So I am guessing 5-7 people core most of the time, and Wednesday afternoon some tens of people (but if more than some tens of people actually come then VUB has a giant room so it won't matter???)
14:22:49 <Labhraich> I kind of assumed that finding a lecture theatre in a university wouldn't be a major issue, if there are no actual lectures that week
14:23:10 <rubdos[m]> danshearer: If you say we need to host >20 I'll have to hire an auditorium
14:23:14 <danshearer> And I am really hoping that the 5-7 people core do not have to be in a very small room
14:23:31 <rubdos[m]> With a bit of luck, I don't have to explain that we don't want to pay :'-)
14:23:32 <danshearer> rubdos[m], what does "hire" mean, and can VUb cover it with internal budgetting?
14:23:44 <rubdos[m]> I'll find that out if necessary
14:24:07 <BKJ621> As NLnet said: keep it frugal
14:24:17 <rubdos[m]> I have the key to two rooms that I can just open up and sit in.
14:24:19 <danshearer> ok, that sounds like the kind of university thinking I am used to. Good. Maybe if we tell them your FOSDEM-compatible neighbours are offering a room they will quickly say "no no have it for free"
14:24:27 <rubdos[m]> Both rooms can host 10 people comfortably
14:24:40 <danshearer> rubdos[m], so for 80% of the time those two rooms sound totally perfect.
14:24:54 <rubdos[m]> yes. And they host 20 people seated too.
14:24:59 <Labhraich> That sounds like the way the ScotLUG (Scottish Linux User Group) met in a university for years - without the university ever noticing :-)
14:25:15 <rubdos[m]> That's exactly the plan as long as we're <20 people, Labhraich ;-)
14:25:21 <Labhraich> Somebody was a lecturer there and just happened to have a key to a room
14:25:31 <rubdos[m]> These are the rooms we use for our research group, so I can have them.
14:26:01 <rubdos[m]> and that's with permission of my two direct supervisors.
14:26:18 <danshearer> fantastisch
14:26:40 <Labhraich> Well, I think these rooms would be large enough for us - but hopefully not for the talks (if they are large enough, we haven't been generating enough interest)
14:27:13 <rubdos[m]> OK, so
14:27:24 <rubdos[m]> #action Ruben to find an auditorium for Wednesday
14:27:42 <danshearer> It also means that we can conveniently have a context switch, and also, if I am talking about Why LumoSQL Should Be Green and Yellow, then Labhraich can escape and put his headphones on in another room.
14:28:21 <rubdos[m]> Let me know if you're going >40 attendees, because then I'll have to get the big boy rooms.
14:28:34 <rubdos[m]> Different buildings and stuff like that
14:28:58 <danshearer> rubdos[m], this is where you and me and your administration need to have a chat about the advertising. Can we do that today please.
14:29:19 <rubdos[m]> me and you, certainly. Administration, not sure.
14:29:41 <rubdos[m]> For one, I have no idea who that might be, but I can ask around.
14:29:41 <danshearer> #action Dan and Ruben to discuss advertising in Brussels and what to write on the advertising in terms of buildings
14:30:35 <danshearer> Because if the bloke in Scotland invites people in Brussels security community to "come to a random place in VUB" that will not go well.
14:31:03 <danshearer> ok are we close to done?
14:31:06 <rubdos[m]> hehe :'-)
14:31:06 <danshearer> anyone else?
14:31:29 <BKJ621> i am done thanks
14:31:30 <Labhraich> has nothing to add at this point
14:31:43 <rubdos[m]> <danshearer> "#action As soon as it is..." <- I kind-of interrupted here
14:32:19 <danshearer> ah yes
14:32:24 <danshearer> This is pretty significant.
14:33:50 <danshearer> Because LMDB is massively deployed around the world, it basically replaced BDB (as I very painfully researched and documented in the BDB wikipedia article.) LMDB has some interesting properties and has a deserved good reputation. LMDB *should* be amazing, but I suspect it is less amazing than it could be.
14:34:57 <danshearer> Some of that may be how LumoSQL is using LMDB. Someone has been in touch suggesting how LMDB can be done better, and I need to look at that carefully.
14:35:30 <Bayes> has nothing else
14:36:28 <danshearer> Nevertheless. Since I have agreement in principle from the author of SQLite that he would be willing to make the SQLite k-v store available as an official API (noting that he can't give it the super-strict compatibility guarantees that SQLite has) then that could change embedded and system software development options quite a lot.
14:36:32 <Labhraich> One issue with the way we are using LMDB is that there's a certain incompatibility in basic design between LMDB and sqlite.  What sqlite expects from a K-V store doesn't match what LMDB provides (which is what openLDAP expects from it)
14:36:41 <Labhraich> This is never going to show LMDB to its full advantage
14:37:08 <danshearer> I mean "options available to the alleged million or so developers worldwide who write code at this level"
14:37:30 <danshearer> Labhraich, yes quite.
14:38:10 <danshearer> On the other hand, LMDB v1.0 comes with page-level encryption built in, and that is relevant. We can switch it on (depending on APIs etc etc) and SQLite does not have that at all.
14:39:12 <danshearer> So I just wanted to alert you to the context here, and that once again we are playing around with things that are deployed at such scale that if people start using it even just a little bit that will be pretty large.
14:39:43 <danshearer> rubdos[m], this is relevant to the Rust community too. More on that outside this meeting perhaps.
14:39:47 <danshearer> ok I think that is everything.
14:39:49 <danshearer> ?
14:40:06 <rubdos[m]> stays silent
14:40:12 <danshearer> #endmeeting