refactor(emoji): rewrite script w/ Python and update emojis (#8069)
Closes #8069
This commit is contained in:
parent
02d8cf6e9a
commit
0613232202
7 changed files with 32840 additions and 2608 deletions
|
@ -10,7 +10,7 @@ This plugin provides support for working with Unicode emoji characters in `zsh`
|
|||
|
||||
Variable | Description
|
||||
----------------- | --------------------------------
|
||||
$emoji | Maps emoji names to characters
|
||||
$emoji | Maps emoji names to characters (except flags)
|
||||
$emoji_flags | Maps country names to flag characters (using region indicators)
|
||||
$emoji_groups | Named groups of emoji. Keys are group names; values are whitespace-separated lists of character names
|
||||
|
||||
|
@ -55,10 +55,8 @@ The defined group names can be found with `echo ${(k)emoji_groups}`.
|
|||
To list all available emoji with their names, use:
|
||||
```
|
||||
$> display_emoji
|
||||
$> display_emoji fruits
|
||||
$> display_emoji animals
|
||||
$> display_emoji vehicles
|
||||
$> display_emoji faces
|
||||
$> display_emoji people
|
||||
```
|
||||
|
||||
To use emoji in a prompt:
|
||||
|
@ -73,13 +71,13 @@ PROMPT="$surfer > "
|
|||
|
||||
The emoji names and codes are sourced from Unicode Technical Report \#51, which provides information on emoji support in Unicode. It can be found at https://www.unicode.org/reports/tr51/index.html.
|
||||
|
||||
The group definitions are added by this OMZ plugin. They are not based on external definitions. (As far as I can tell. -apjanke)
|
||||
The group definitions are added by this OMZ plugin. They are not based on external definitions.
|
||||
|
||||
The values in the `$emoji*` maps are the emoji characters themselves, not escape sequences or other forms that require interpretation. They can be used in any context and do not require escape sequence support from commands like `echo` or `print`.
|
||||
|
||||
The emoji in the main `$emoji` map are standalone character sequences which can all be output on their own, without worrying about combining characters. The values may actually be multi-code-point sequences, instead of a single code point, and may include combining characters in those sequences. But they're arranged so their effects do not extend beyond that sequence.
|
||||
|
||||
The exception to this is the skin tone variation selectors. These are included in the main `$emoji` map because they can be displayed on their own, as well as used as combining characters. (If they follow a character that is not one of the emoji characters they combine with, they are displayed as color swatches.)
|
||||
The exception to this is the skin tone / hair style variation selectors. These are included in the main `$emoji` map because they can be displayed on their own, as well as used as combining characters. (If they follow a character that is not one of the emoji characters they combine with, they are displayed as color swatches.)
|
||||
|
||||
|
||||
## Experimental Features
|
||||
|
@ -90,7 +88,6 @@ Variables:
|
|||
|
||||
Variable | Description
|
||||
----------------- | --------------------------------
|
||||
$emoji2 | Auxiliary and combining characters
|
||||
$emoji_skintone | Skin tone modifiers (from Unicode 8.0)
|
||||
|
||||
|
||||
|
@ -105,31 +102,26 @@ The "variation selectors" are combining characters which change the appearance o
|
|||
The `$emoji_skintone` associative array maps skin tone IDs to the variation selector characters. To use one, output it immediately following a smiley or other human emoji.
|
||||
|
||||
```
|
||||
echo "$emoji[smiling_face_with_open_mouth]$emoji_skintone[4]"
|
||||
echo $emoji[waving_hand]$emoji_skintone[5]
|
||||
```
|
||||
|
||||
Note that `$emoji_skintone` is an associative array, and its keys are the *names* of "Fitzpatrick Skin Type" groups, not linear indexes into a normal array. The names are `1_2`, `3`, `4`, `5`, and `6`. (Types 1 and 2 are combined into a single color.) See the [Diversity section in Unicode TR 51](https://www.unicode.org/reports/tr51/index.html#Diversity) for details.
|
||||
|
||||
#### Gemoji support
|
||||
|
||||
The [gemoji project](https://github.com/github/gemoji) seems to be the de facto main source for short names and other emoji-related metadata that isn't included in the official Unicode reports. So, our list of emojis incorporates some of their aliases to make your life more convenient:
|
||||
|
||||
```
|
||||
echo $emoji[grinning_face_with_smiling_eyes]
|
||||
echo $emoji[smile]
|
||||
```
|
||||
|
||||
These two commands yield the same emoji (😄). The first name is the official one, in the Unicode reference, and the second one is the alias that was in Gemoji's database.
|
||||
|
||||
## TODO
|
||||
|
||||
These are things that could be enhanced in future revisions of the plugin.
|
||||
|
||||
* Incorporate CLDR data for ordering and groupings
|
||||
* Short :bracket: style names (from gemoji)
|
||||
* Incorporate `gemoji` data
|
||||
* Country codes for flags
|
||||
* ZWJ combining function?
|
||||
|
||||
#### Gemoji support
|
||||
|
||||
The [gemoji project](https://github.com/github/gemoji) seems to be the de facto main source for short names and other emoji-related metadata that isn't included in the official Unicode reports. (I'm saying this just from looking at the google results for "emoji short names" and related searches. -apjanke)
|
||||
|
||||
If this plugin is updated to provide short names, CLDR sorting data, and similar stuff, it should probably be changed to use the Gemoji project, and the `update_emoji.pl` script be rewritten in Ruby so it can use the Gemoji library directly instead of parsing its data files.
|
||||
|
||||
This does *not* mean that it should use Gemoji at run time. None of the `zsh` plugin stuff should call Gemoji or Ruby code. Rather, the "build time" `update_emoji.pl` script should be rewritten to use Gemoji to generate a pure-native-`zsh` character definition file which would be checked in to the repo and can be called by OMZ users without having Gemoji installed.
|
||||
|
||||
#### ZWJ combining function
|
||||
|
||||
One of the newer features of Unicode emoji is the ability to use the "Zero-Width Joiner" character to compose multiple emoji characters in to a single "emoji ligature" glyph. For example, this is [how Apple supports "family" emoji with various genders and skin tones](https://www.unicode.org/reports/tr51/index.html#ZWJ_Sequences).
|
||||
|
||||
These are a pain to write out (and probably worse to read), and it might be convenient to have a couple functions for concisely composing them, if wider support for them appears.
|
||||
|
|
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
@ -15,9 +15,6 @@ _omz_emoji_plugin_dir="${0:h}"
|
|||
|
||||
local LC_ALL=en_US.UTF-8
|
||||
|
||||
typeset -gAH emoji_groups
|
||||
typeset -gAH emoji_con
|
||||
typeset -gAH emoji2
|
||||
typeset -gAH emoji_skintone
|
||||
|
||||
source "$_omz_emoji_plugin_dir/emoji-char-definitions.zsh"
|
||||
|
@ -30,7 +27,6 @@ unset _omz_emoji_plugin_dir
|
|||
# The digits 0-9 are already in the emoji table as keycap_digit_<N>, keycap_ten, etc.
|
||||
# It's unclear whether this should be in the $emoji array, because those characters are all ones
|
||||
# which can be displayed on their own.
|
||||
#emoji[combining_enclosing_keycap]="\U20E3"
|
||||
|
||||
emoji[regional_indicator_symbol_letter_d_regional_indicator_symbol_letter_e]=$'\xF0\x9F\x87\xA9\xF0\x9F\x87\xAA'
|
||||
emoji[regional_indicator_symbol_letter_g_regional_indicator_symbol_letter_b]=$'\xF0\x9F\x87\xAC\xF0\x9F\x87\xA7'
|
||||
|
@ -43,209 +39,12 @@ emoji[regional_indicator_symbol_letter_i_regional_indicator_symbol_letter_t]=$'\
|
|||
emoji[regional_indicator_symbol_letter_u_regional_indicator_symbol_letter_s]=$'\xF0\x9F\x87\xBA\xF0\x9F\x87\xB8'
|
||||
emoji[regional_indicator_symbol_letter_r_regional_indicator_symbol_letter_u]=$'\xF0\x9F\x87\xB7\xF0\x9F\x87\xBA'
|
||||
|
||||
# Nonstandard alias names
|
||||
emoji[vulcan_salute]=$'\U1F596'
|
||||
|
||||
|
||||
# Emoji combining and auxiliary characters
|
||||
|
||||
# "Variation Selectors" for controlling text vs emoji style presentation
|
||||
# These apply to the immediately preceding character
|
||||
emoji2[text_style]=$'\UFE0E'
|
||||
emoji2[emoji_style]=$'\UFE0F'
|
||||
# Joiner that indicates a single combined-form glyph (ligature) should be used
|
||||
emoji2[zero_width_joiner]=$'\U200D'
|
||||
# Skin tone modifiers
|
||||
emoji2[emoji_modifier_fitzpatrick_type_1_2]=$'\U1F3FB'
|
||||
emoji2[emoji_modifier_fitzpatrick_type_3]=$'\U1F3FC'
|
||||
emoji2[emoji_modifier_fitzpatrick_type_4]=$'\U1F3FD'
|
||||
emoji2[emoji_modifier_fitzpatrick_type_5]=$'\U1F3FE'
|
||||
emoji2[emoji_modifier_fitzpatrick_type_6]=$'\U1F3FF'
|
||||
# Various other combining characters. (Incomplete list; I selected ones that sound useful)
|
||||
emoji2[combining_enclosing_circle]=$'\U20DD'
|
||||
emoji2[combining_enclosing_square]=$'\U20DE'
|
||||
emoji2[combining_enclosing_diamond]=$'\U20DF'
|
||||
emoji2[combining_enclosing_circle_backslash]=$'\U20E0'
|
||||
emoji2[combining_enclosing_screen]=$'\U20E2'
|
||||
emoji2[combining_enclosing_keycap]=$'\U20E3'
|
||||
emoji2[combining_enclosing_upward_pointing_triangle]=$'\U20E4'
|
||||
|
||||
# Easier access to skin tone modifiers
|
||||
emoji_skintone[1_2]=$'\U1F3FB'
|
||||
emoji_skintone[3]=$'\U1F3FC'
|
||||
emoji_skintone[4]=$'\U1F3FD'
|
||||
emoji_skintone[5]=$'\U1F3FE'
|
||||
emoji_skintone[6]=$'\U1F3FF'
|
||||
|
||||
# Emoji groups
|
||||
# These are stored in a single associative array, $emoji_groups, to avoid cluttering up the global
|
||||
# namespace, and to allow adding additional group definitions at run time.
|
||||
# The keys are the group names, and the values are whitespace-separated lists of emoji character names.
|
||||
|
||||
emoji_groups[fruits]="
|
||||
tomato
|
||||
aubergine
|
||||
grapes
|
||||
melon
|
||||
watermelon
|
||||
tangerine
|
||||
banana
|
||||
pineapple
|
||||
red_apple
|
||||
green_apple
|
||||
peach
|
||||
cherries
|
||||
strawberry
|
||||
lemon
|
||||
pear
|
||||
"
|
||||
|
||||
emoji_groups[vehicles]="
|
||||
airplane
|
||||
rocket
|
||||
railway_car
|
||||
high_speed_train
|
||||
high_speed_train_with_bullet_nose
|
||||
bus
|
||||
ambulance
|
||||
fire_engine
|
||||
police_car
|
||||
taxi
|
||||
automobile
|
||||
recreational_vehicle
|
||||
delivery_truck
|
||||
ship
|
||||
speedboat
|
||||
bicycle
|
||||
helicopter
|
||||
steam_locomotive
|
||||
train
|
||||
light_rail
|
||||
tram
|
||||
oncoming_bus
|
||||
trolleybus
|
||||
minibus
|
||||
oncoming_police_car
|
||||
oncoming_taxi
|
||||
oncoming_automobile
|
||||
articulated_lorry
|
||||
tractor
|
||||
monorail
|
||||
mountain_railway
|
||||
suspension_railway
|
||||
mountain_cableway
|
||||
aerial_tramway
|
||||
rowboat
|
||||
bicyclist
|
||||
mountain_bicyclist
|
||||
sailboat
|
||||
"
|
||||
|
||||
emoji_groups[animals]="
|
||||
snail
|
||||
snake
|
||||
horse
|
||||
sheep
|
||||
monkey
|
||||
chicken
|
||||
boar
|
||||
elephant
|
||||
octopus
|
||||
spiral_shell
|
||||
bug
|
||||
ant
|
||||
honeybee
|
||||
lady_beetle
|
||||
fish
|
||||
tropical_fish
|
||||
blowfish
|
||||
turtle
|
||||
hatching_chick
|
||||
baby_chick
|
||||
front_facing_baby_chick
|
||||
bird
|
||||
penguin
|
||||
koala
|
||||
poodle
|
||||
bactrian_camel
|
||||
dolphin
|
||||
mouse_face
|
||||
cow_face
|
||||
tiger_face
|
||||
rabbit_face
|
||||
cat_face
|
||||
dragon_face
|
||||
spouting_whale
|
||||
horse_face
|
||||
monkey_face
|
||||
dog_face
|
||||
pig_face
|
||||
frog_face
|
||||
hamster_face
|
||||
wolf_face
|
||||
bear_face
|
||||
panda_face
|
||||
rat
|
||||
mouse
|
||||
ox
|
||||
water_buffalo
|
||||
cow
|
||||
tiger
|
||||
leopard
|
||||
rabbit
|
||||
cat
|
||||
dragon
|
||||
crocodile
|
||||
whale
|
||||
ram
|
||||
goat
|
||||
rooster
|
||||
dog
|
||||
pig
|
||||
dromedary_camel
|
||||
"
|
||||
|
||||
emoji_groups[faces]="
|
||||
grinning_face_with_smiling_eyes
|
||||
face_with_tears_of_joy
|
||||
smiling_face_with_open_mouth
|
||||
smiling_face_with_open_mouth_and_smiling_eyes
|
||||
smiling_face_with_open_mouth_and_cold_sweat
|
||||
smiling_face_with_open_mouth_and_tightly_closed_eyes
|
||||
winking_face
|
||||
smiling_face_with_smiling_eyes
|
||||
face_savouring_delicious_food
|
||||
relieved_face
|
||||
smiling_face_with_heart_shaped_eyes
|
||||
smirking_face
|
||||
unamused_face
|
||||
face_with_cold_sweat
|
||||
pensive_face
|
||||
confounded_face
|
||||
face_throwing_a_kiss
|
||||
kissing_face_with_closed_eyes
|
||||
face_with_stuck_out_tongue_and_winking_eye
|
||||
face_with_stuck_out_tongue_and_tightly_closed_eyes
|
||||
disappointed_face
|
||||
angry_face
|
||||
pouting_face
|
||||
crying_face
|
||||
persevering_face
|
||||
face_with_look_of_triumph
|
||||
disappointed_but_relieved_face
|
||||
fearful_face
|
||||
weary_face
|
||||
sleepy_face
|
||||
tired_face
|
||||
loudly_crying_face
|
||||
face_with_open_mouth_and_cold_sweat
|
||||
face_screaming_in_fear
|
||||
astonished_face
|
||||
flushed_face
|
||||
dizzy_face
|
||||
face_with_medical_mask
|
||||
"
|
||||
|
||||
}
|
||||
|
||||
# Prints a random emoji character
|
||||
|
@ -264,7 +63,11 @@ function random_emoji() {
|
|||
[[ $list_size -eq 0 ]] && return 1
|
||||
local random_index=$(( ( RANDOM % $list_size ) + 1 ))
|
||||
local name=${names[$random_index]}
|
||||
echo ${emoji[$name]}
|
||||
if [[ "$group" == "flags" ]]; then
|
||||
echo ${emoji_flags[$name]}
|
||||
else
|
||||
echo ${emoji[$name]}
|
||||
fi
|
||||
}
|
||||
|
||||
# Displays a listing of emoji with their names
|
||||
|
@ -281,12 +84,26 @@ function display_emoji() {
|
|||
fi
|
||||
# The extra spaces in output here are a hack for readability, since some
|
||||
# terminals treat these emoji chars as single-width.
|
||||
local counter=1
|
||||
for i in $names; do
|
||||
printf '%s ' "$emoji[$i]"
|
||||
if [[ "$group" == "flags" ]]; then
|
||||
printf '%s ' "$emoji_flags[$i]"
|
||||
else
|
||||
printf '%s ' "$emoji[$i]"
|
||||
fi
|
||||
# New line every 20 emoji, to avoid weirdnesses
|
||||
if (($counter % 20 == 0)); then
|
||||
printf "\n"
|
||||
fi
|
||||
let counter=$counter+1
|
||||
done
|
||||
print
|
||||
for i in $names; do
|
||||
echo "${emoji[$i]} = $i"
|
||||
if [[ "$group" == "flags" ]]; then
|
||||
echo "${emoji_flags[$i]} = $i"
|
||||
else
|
||||
echo "${emoji[$i]} = $i"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
|
|
21538
plugins/emoji/gemoji_db.json
Normal file
21538
plugins/emoji/gemoji_db.json
Normal file
File diff suppressed because it is too large
Load diff
|
@ -1,113 +0,0 @@
|
|||
#!/usr/bin/perl -w
|
||||
#
|
||||
# update_emoji.pl
|
||||
#
|
||||
# This script generates the emoji.plugin.zsh emoji definitions from the Unicode
|
||||
# character data for the emoji characters.
|
||||
#
|
||||
# The data file can be found at https://unicode.org/Public/emoji/latest/emoji-data.txt
|
||||
# as referenced in Unicode TR51 (https://www.unicode.org/reports/tr51/index.html).
|
||||
#
|
||||
# This is known to work with the data file from version 1.0. It may not work with later
|
||||
# versions if the format changes. In particular, this reads line comments to get the
|
||||
# emoji character name and unicode version.
|
||||
#
|
||||
# Country names have punctuation and other non-letter characters removed from their name,
|
||||
# to avoid possible complications with having to escape the strings when using them as
|
||||
# array subscripts. The definition file seems to use some combining characters like accents
|
||||
# that get stripped during this process.
|
||||
|
||||
use strict;
|
||||
use warnings;
|
||||
use 5.010;
|
||||
use autodie;
|
||||
|
||||
use Path::Class;
|
||||
use File::Copy;
|
||||
|
||||
# Parse definitions out of the data file and convert
|
||||
sub process_emoji_data_file {
|
||||
my ( $infile, $outfilename ) = @_;
|
||||
my $file = file($infile);
|
||||
my $outfile = file($outfilename);
|
||||
my $outfilebase = $outfile->basename();
|
||||
my $tempfilename = "$outfilename.tmp";
|
||||
my $tempfile = file($tempfilename);
|
||||
my $outfh = $tempfile->openw();
|
||||
$outfh->print("
|
||||
# $outfilebase - Emoji character definitions for oh-my-zsh emoji plugin
|
||||
#
|
||||
# This file is auto-generated by update_emoji.pl. Do not edit it manually.
|
||||
#
|
||||
# This contains the definition for:
|
||||
# \$emoji - which maps character names to Unicode characters
|
||||
# \$emoji_flags - maps country names to Unicode flag characters using region indicators
|
||||
|
||||
# Main emoji
|
||||
typeset -gAH emoji
|
||||
# National flags
|
||||
typeset -gAH emoji_flags
|
||||
# Combining modifiers
|
||||
typeset -gAH emoji_mod
|
||||
|
||||
");
|
||||
|
||||
my $fh = $file->openr();
|
||||
my $line_num = 0;
|
||||
while ( my $line = $fh->getline() ) {
|
||||
$line_num++;
|
||||
$_ = $line;
|
||||
# Skip all-comment lines (from the header) and blank lines
|
||||
# (But don't strip comments on normal lines; we need to parse those for
|
||||
# the emoji names.)
|
||||
next if /^\s*#/ or /^\s*$/;
|
||||
|
||||
if (/^(\S.*?\S)\s*;\s*(\w+)\s*;\s*(\w+)\s*;\s*(\w+)\s*;\s*(\w.*?)\s*#\s*V(\S+)\s\(.*?\)\s*(\w.*\S)\s*$/) {
|
||||
my ($code, $style, $level, $modifier_status, $sources, $version, $keycap_name)
|
||||
= ($1, $2, $3, $4, $5, $6, $7);
|
||||
#print "code=$code style=$style level=$level modifier_status=$modifier_status sources=$sources version=$version name=$keycap_name\n";
|
||||
my @code_points = split /\s+/, $code;
|
||||
my @sources = split /\s+/, $sources;
|
||||
|
||||
my $flag_country = "";
|
||||
if ( $keycap_name =~ /^flag for (\S.*?)\s*$/) {
|
||||
$flag_country = $1;
|
||||
}
|
||||
|
||||
my $zsh_code = join '', map { "\\U$_" } @code_points;
|
||||
# Convert keycap names to valid associative array names that do not require any
|
||||
# quoting. Works fine for most stuff, but is clumsy for flags.
|
||||
my $omz_name = lc($keycap_name);
|
||||
$omz_name =~ s/[^A-Za-z0-9]/_/g;
|
||||
my $zsh_flag_country = $flag_country;
|
||||
$zsh_flag_country =~ s/[^\p{Letter}]/_/g;
|
||||
if ($flag_country) {
|
||||
$outfh->print("emoji_flags[$zsh_flag_country]=\$'$zsh_code'\n");
|
||||
} else {
|
||||
$outfh->print("emoji[$omz_name]=\$'$zsh_code'\n");
|
||||
}
|
||||
# Modifiers are included in both the main set and their separate map,
|
||||
# because they have a standalone representation as a color swatch.
|
||||
if ( $modifier_status eq "modifier" ) {
|
||||
$outfh->print("emoji_mod[$omz_name]=\$'$zsh_code'\n");
|
||||
}
|
||||
} else {
|
||||
die "Failed parsing line $line_num: '$_'";
|
||||
}
|
||||
}
|
||||
$fh->close();
|
||||
$outfh->print("\n");
|
||||
$outfh->close();
|
||||
|
||||
move($tempfilename, $outfilename)
|
||||
or die "Failed moving temp file to $outfilename: $!";
|
||||
}
|
||||
|
||||
my $datafile = "emoji-data.txt";
|
||||
my $zsh_def_file = "emoji-char-definitions.zsh";
|
||||
process_emoji_data_file($datafile, $zsh_def_file);
|
||||
|
||||
print "Updated definition file $zsh_def_file\n";
|
||||
|
||||
|
||||
|
213
plugins/emoji/update_emoji.py
Normal file
213
plugins/emoji/update_emoji.py
Normal file
|
@ -0,0 +1,213 @@
|
|||
"""
|
||||
Update Emoji.py
|
||||
Refeshes OMZ emoji database based on the latest Unicode spec
|
||||
"""
|
||||
import re
|
||||
import json
|
||||
|
||||
spec = open("emoji-data.txt", "r")
|
||||
|
||||
# Regexes
|
||||
# regex_emoji will return, respectively:
|
||||
# the code points, its type (status), the actual emoji, and its official name
|
||||
regex_emoji = r"^([\w ].*?\S)\s*;\s*([\w-]+)\s*#\s*(.*?)\s(\S.*).*$"
|
||||
# regex_group returns the group of subgroup that a line opens
|
||||
regex_group = r"^#\s*(group|subgroup):\s*(.*)$"
|
||||
|
||||
headers = """
|
||||
# emoji-char-definitions.zsh - Emoji definitions for oh-my-zsh emoji plugin
|
||||
#
|
||||
# This file is auto-generated by update_emoji.py. Do not edit it manually.
|
||||
#
|
||||
# This contains the definition for:
|
||||
# $emoji - which maps character names to Unicode characters
|
||||
# $emoji_flags - maps country names to Unicode flag characters using region
|
||||
# indicators
|
||||
# $emoji_mod - maps modifier components to Unicode characters
|
||||
# $emoji_groups - a single associative array to avoid cluttering up the
|
||||
# global namespace, and to allow adding additional group
|
||||
# definitions at run time. The keys are the group names, and
|
||||
# the values are whitespace-separated lists of emoji
|
||||
# character names.
|
||||
|
||||
# Main emoji
|
||||
typeset -gAH emoji
|
||||
# National flags
|
||||
typeset -gAH emoji_flags
|
||||
# Combining modifiers
|
||||
typeset -gAH emoji_mod
|
||||
# Emoji groups
|
||||
typeset -gAH emoji_groups
|
||||
"""
|
||||
|
||||
#######
|
||||
# Adding country codes
|
||||
#######
|
||||
# This is the only part of this script that relies on an external library
|
||||
# (country_converter), and is hence commented out by default.
|
||||
# You can uncomment it to have country codes added as aliases for flag
|
||||
# emojis. (By default, when you install this extension, country codes are
|
||||
# included as aliases, but not if you re-run this script without uncommenting.)
|
||||
# Warning: country_converter is very verbose, and will print warnings all over
|
||||
# your terminal.
|
||||
|
||||
# import country_converter as coco # pylint: disable=wrong-import-position
|
||||
# cc = coco.CountryConverter()
|
||||
|
||||
# def country_iso(_all_names, _omz_name):
|
||||
# """ Using the external library country_converter,
|
||||
# this funciton can detect the ISO2 and ISO3 codes
|
||||
# of the country. It takes as argument the array
|
||||
# with all the names of the emoji, and returns that array."""
|
||||
# omz_no_underscore = re.sub(r'_', r' ', _omz_name)
|
||||
# iso2 = cc.convert(names=[omz_no_underscore], to='ISO2')
|
||||
# if iso2 != 'not found':
|
||||
# _all_names.append(iso2)
|
||||
# iso3 = cc.convert(names=[omz_no_underscore], to='ISO3')
|
||||
# _all_names.append(iso3)
|
||||
# return _all_names
|
||||
|
||||
|
||||
#######
|
||||
# Helper functions
|
||||
#######
|
||||
|
||||
def code_to_omz(_code_points):
|
||||
""" Returns a ZSH-compatible Unicode string from the code point(s) """
|
||||
return r'\U' + r'\U'.join(_code_points.split(' '))
|
||||
|
||||
def name_to_omz(_name, _group, _subgroup, _status):
|
||||
""" Returns a reasonable snake_case name for the emoji. """
|
||||
def snake_case(_string):
|
||||
""" Does the regex work of snake_case """
|
||||
remove_dots = re.sub(r'\.\(\)', r'', _string)
|
||||
replace_ands = re.sub(r'\&', r'and', remove_dots)
|
||||
remove_whitespace = re.sub(r'[^\#\*\w]', r'_', replace_ands)
|
||||
return re.sub(r'__', r'_', remove_whitespace)
|
||||
|
||||
shortname = ""
|
||||
split_at_colon = lambda s: s.split(": ")
|
||||
# Special treatment by group and subgroup
|
||||
# If the emoji is a flag, we strip "flag" from its name
|
||||
if _group == "Flags" and len(split_at_colon(_name)) > 1:
|
||||
shortname = snake_case(split_at_colon(_name)[1])
|
||||
else:
|
||||
shortname = snake_case(_name)
|
||||
# Special treatment by status
|
||||
# Enables us to have every emoji combination,
|
||||
# even the one that are not officially sanctionned
|
||||
# and are implemeted by, say, only one vendor
|
||||
if _status == "unqualified":
|
||||
shortname += "_unqualified"
|
||||
elif _status == "minimally-qualified":
|
||||
shortname += "_minimally"
|
||||
return shortname
|
||||
|
||||
def increment_name(_shortname):
|
||||
""" Increment the short name by 1. If you get, say,
|
||||
'woman_detective_unqualified', it returns
|
||||
'woman_detective_unqualified_1', and then
|
||||
'woman_detective_unqualified_2', etc. """
|
||||
last_char = _shortname[-1]
|
||||
if last_char.isdigit():
|
||||
num = int(last_char)
|
||||
return _shortname[:-1] + str(num + 1)
|
||||
return _shortname + "_1"
|
||||
|
||||
########
|
||||
# Going through every line
|
||||
########
|
||||
|
||||
group, subgroup, short_name_buffer = "", "", ""
|
||||
emoji_database = []
|
||||
for line in spec:
|
||||
# First, test if this line opens a group or subgroup
|
||||
group_match = re.findall(regex_group, line)
|
||||
if group_match != []:
|
||||
gr_or_sub, name = group_match[0]
|
||||
if gr_or_sub == "group":
|
||||
group = name
|
||||
elif gr_or_sub == "subgroup":
|
||||
subgroup = name
|
||||
continue # Moving on...
|
||||
# Second, test if this line references one emoji
|
||||
emoji_match = re.findall(regex_emoji, line)
|
||||
if emoji_match != []:
|
||||
code_points, status, emoji, name = emoji_match[0]
|
||||
omz_codes = code_to_omz(code_points)
|
||||
omz_name = name_to_omz(name, group, subgroup, status)
|
||||
# If this emoji has the same shortname as the preceding one
|
||||
if omz_name in short_name_buffer:
|
||||
omz_name = increment_name(short_name_buffer)
|
||||
short_name_buffer = omz_name
|
||||
emoji_database.append(
|
||||
[omz_codes, status, emoji, omz_name, group, subgroup])
|
||||
spec.close()
|
||||
|
||||
########
|
||||
# Write to emoji-char-definitions.zsh
|
||||
########
|
||||
|
||||
# Aliases for emojis are retrieved through the DB of Gemoji
|
||||
# Retrieved on Aug 9 2019 from the following URL:
|
||||
# https://raw.githubusercontent.com/github/gemoji/master/db/emoji.json
|
||||
|
||||
gemoji_db = open("gemoji_db.json")
|
||||
j = json.load(gemoji_db)
|
||||
aliases_map = {entry['emoji']: entry['aliases'] for entry in j}
|
||||
all_omz_names = [emoji_data[3] for emoji_data in emoji_database]
|
||||
|
||||
# Let's begin writing to this file
|
||||
output = open("emoji-char-definitions.zsh", "w")
|
||||
output.write(headers)
|
||||
|
||||
emoji_groups = {"fruits": "\n", "vehicles": "\n", "hands": "\n",
|
||||
"people": "\n", "animals": "\n", "faces": "\n",
|
||||
"flags": "\n"}
|
||||
|
||||
# First, write every emoji down
|
||||
for _omz_codes, _status, _emoji, _omz_name, _group, _subgroup in emoji_database:
|
||||
|
||||
# One emoji can be mapped to multiple names (aliases or country codes)
|
||||
names_for_this_emoji = [_omz_name]
|
||||
|
||||
# Variable that indicates in which map the emoji will be located
|
||||
emoji_map = "emoji"
|
||||
if _status == "component":
|
||||
emoji_map = "emoji_mod"
|
||||
if _group == "Flags":
|
||||
emoji_map = "emoji_flags"
|
||||
# Adding country codes (Optional, see above)
|
||||
# names_for_this_emoji = country_iso(names_for_this_emoji, _omz_name)
|
||||
|
||||
# Check if there is an alias available in the Gemoji DB
|
||||
if _emoji in aliases_map.keys():
|
||||
for alias in aliases_map[_emoji]:
|
||||
if alias not in all_omz_names:
|
||||
names_for_this_emoji.append(alias)
|
||||
|
||||
# And now we write to the definitions file
|
||||
for one_name in names_for_this_emoji:
|
||||
output.write(f"{emoji_map}[{one_name}]=$'{_omz_codes}'\n")
|
||||
|
||||
# Storing the emoji in defined subgroups for the next step
|
||||
if _status == "fully-qualified":
|
||||
if _subgroup == "food-fruit":
|
||||
emoji_groups["fruits"] += f" {_omz_name}\n"
|
||||
elif "transport-" in _subgroup:
|
||||
emoji_groups["vehicles"] += f" {_omz_name}\n"
|
||||
elif "hand-" in _subgroup:
|
||||
emoji_groups["hands"] += f" {_omz_name}\n"
|
||||
elif "person-" in _subgroup or _subgroup == "family":
|
||||
emoji_groups["people"] += f" {_omz_name}\n"
|
||||
elif "animal-" in _subgroup:
|
||||
emoji_groups["animals"] += f" {_omz_name}\n"
|
||||
elif "face-" in _subgroup:
|
||||
emoji_groups["faces"] += f" {_omz_name}\n"
|
||||
elif _group == "Flags":
|
||||
emoji_groups["flags"] += f" {_omz_name}\n"
|
||||
|
||||
# Second, write the subgroups to the end of the file
|
||||
for name, string in emoji_groups.items():
|
||||
output.write(f'\nemoji_groups[{name}]="{string}"\n')
|
||||
output.close()
|
Loading…
Reference in a new issue