u1F93A

joined 1 year ago
[–] u1F93A 1 points 1 year ago

Rust

Seems I'm a bit late to the party so here's something I quickly slapped together. The problem appears to be asking us to generate all n-ary words in the Dyck language D3 with 𝛴 = {(, ), [, ], {, }} and 𝓑 = {((, )), ([, ]), ({, })}. This grows in the order of 3ⁿ⸍²𝓒ₙ⸝₂, where 𝓒ₖ is the k-th Catalan number, since it combines the regular up/down decision problem at each lattice step with a 3-way decision problem of picking a bracket type. That quickly becomes A Lotβ„’ β€” napkin math tells me it would take ~21GB to store all possible strings of length n = 20, delimited by newlines, as UTF-8 bytes β€” which made me think that I would want to do this in constant space instead of recursively, so I quickly Googled to see what prior art there was on generating functions for generalized Dyck languages and found a nice and short paper [1] claiming to be linear in the length of the words (not the quantity of them, that's still A Lotβ„’) and using constant space. I cribbed their algorithm and reduced it to the specifics of D3 with the given 𝛴 and 𝓑, and then used the fact that it can generate the next word using only the previous one to split the word-space into three equal parts that can be generated on as many threads. I chose three because the pattern required to recognize the end of each third was already the same one being used by the paper's algorithm, (and because three seemed nice considering it's D3 and |𝓑| = 3 and such), but you could ostensibly split the space into as many chunks as you have processors without penalty (beyond the overhead of the threads themselves plus fighting for the mutex lock on stdout) if you can recognize the chunk-ending patterns fast enough. After tweaking the buffer size to match my system setup (Arch on an i7 Kaby Lake), I can get it to pipe ~2GB/s to wc -l (which is ~11s for that n = 20 example). I'm sure there's probably a bunch more that I could optimize but I'm afraid of SIMD and don't want to think too hard about the algorithm, so I'll just leave it here and let someone else tell me how I'm wrong / what I missed :)

[1]: Liebehenschel, J., 2003. Lexicographical generation of a generalized dyck language. SIAM Journal on Computing, 32(4), pp.880-903.