Improve creation of character list

- Remove sequences containing dead keys from the list - Change psv creation to account for bogus spaces
2020-09-18 23:04:56 +02:00 · 2020-09-18 23:04:56 +02:00 · 6cc15f513f
commit 6cc15f513f
parent 6d7ffb62ef
4 changed files with 2167 additions and 3006 deletions
--- a/sequences/Compose.db3
+++ b/sequences/Compose.db3
--- a/sequences/Compose.psv
+++ b/sequences/Compose.psv
--- a/sequences/Compose.usable
+++ b/sequences/Compose.usable
--- a/sequences/README.txt
+++ b/sequences/README.txt
@ -4,24 +4,21 @@ Download the X11 compose-sequences for en_US.UTF-8¹ and get rid of some things:
 - Comments (XCOMM)
 - Sequences not started my Multi_key
 - Sequences containing non-standard characters
 - Sequences containing other dead keys
 ¹) en_US.UTF-8 seems to be quite complete and also available on other locales.
 ```
 $ wget https://cgit.freedesktop.org/xorg/lib/libX11/plain/nls/en_US.UTF-8/Compose.pre
-$ grep -i '^<multi' Compose.pre | grep -v '<U....>' | grep -v 'U.....>'  > Compose.usable
+$ grep -i '^<multi' Compose.pre | grep -v '<dead' | grep -v '<U....>' | grep -v '<U.....>' > Compose.usable
 ```
 Convert to psv
 --------------
-Plus-separated-values because + is not used in the file.
+- Plus-separated-values because + is not used in the file.
-Regex to extract the fields then remove tabs and squish spaces.
+- Remove tabs and squish spaces.
-TODO: There must be a better regex for this. Also, this leaves begin-of-field spaces.
+- Regex to extract the fields into psv.
-```
+$ cat Compose.usable | tr -d "\t" | tr -s " " | sed --regexp-extended 's/(\S*)\s*: \"(.*)\"\s*(\S*)\s*#\s*(.*)/\1+\2+\3+\4/' > Compose.psv
 $ sed --regexp-extended 's/(.*): \"(.*)\"(.*)#(.*)/\1+\2+\3+\4/' Compose.usable | tr -d "\t" | tr -s " " > Compose.psv
 ```
 Load into SQLite
 ----------------
@ -40,13 +37,13 @@ $ sqlite3 Compose.db3
 How to get random, unique entries that do not repeat
 ----------------------------------------------------
-Create a table for ids we already sent.
+Create a table for characters we already sent.
 	CREATE TABLE "alreadySent" (
 		"keySequenceROWID" INTEGER,
 		"timestamp"        INTEGER
 	);
-Create a view with yet unsent rows
+Create a view with yet unsent characters
 	CREATE VIEW stillAvailable (
 		keySequenceROWID,
 		keySequence,
@ -59,7 +56,7 @@ Create a view with yet unsent rows
 		WHERE ROWID NOT IN (
 			SELECT keySequenceROWID
 			FROM alreadySent
-		)
+		);
 Add some phrases to start the toot with
 ---------------------------------------