PHP, Elasticsearch, Symfony, Emoji
:-) :D :-( ^_^ ;-) <3
En 1998, un opérateur Japonais introduit des icônes colorées dans les messages pour se démarquer de la concurrence.
176 pictogrammes de 12x12 pixels
🍢 🍤 🍥 🏯 👘 🎌 🗻 🎏
😑 😬 💩 🦄 🐙 🚙 🌦 🎸
Les emoji sont comme du texte. C'est votre système d'exploitation qui est capable de les afficher en image.
PUT meal/_doc/1
{
"title": "pizza"
}
GET meal/_search
{
"query": {
"match": {
"title": "🍕"
}
}
}
Aucun résultat
Extrait des tests de Lucene :
/** simple emoji */
public void testEmoji() throws Exception {
BaseTokenStreamTestCase.assertAnalyzesTo(
a,
"💩 💩💩",
new String[] {"💩", "💩", "💩"},
new String[] {"<EMOJI>", "<EMOJI>", "<EMOJI>"}
);
}
Le commit qui a tout changé : "support Unicode UTS#51 v11.0 Emoji tokenization."
GET _analyze
{
"text": ["🍕"],
"analyzer": "standard"
}
Token 🍕
GET _analyze
{
"text": ["🍕"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🍕 => 🍕, cheese, pizza, slice"
]
}
]
}
Tokens 🍕 cheese pizza slice
Tokens 🍕 cheese pizza slice dans l'index
🐻 => 🐻, bear, bear face, face 🐨 => 🐨, bear, koala 🐼 => 🐼, face, panda, panda face 🐾 => 🐾, feet, paw, paw prints, print 🦃 => 🦃, bird, turkey 🥵 => 🥵, feverish, heat stroke, hot, hot face, red-faced, sweating 🥶 => 🥶, blue-faced, cold, cold face, freezing, frostbite, icicles 🐣 => 🐣, baby, bird, chick, hatching, hatching chick 🐤 => 🐤, baby, baby chick, bird, chick 🐥 => 🐥, baby, bird, chick, front-facing baby chick 🤞 => 🤞, cross, crossed fingers, finger, hand, luck 🤟 => 🤟, hand, ILY, love-you gesture 🌾 => 🌾, ear, grain, rice, sheaf of rice 👱♀ => 👱♀, blond-haired woman, blonde, hair, woman, woman: blond hair 🧓 => 🧓, adult, gender-neutral, old, older person, unspecified gender 👴 => 👴, adult, man, old 🧝♀ => 🧝♀, magical, woman elf 🧞 => 🧞, djinn, genie ...
GET _analyze
{
"text": ["🏳🌈"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🏳🌈 => 🏳🌈, pride, rainbow, rainbow flag"
]
}
]
}
Tokens 🏳🌈 pride rainbow flag
La jointure est bien connue d'Elasticsearch
GET _analyze
{
"text": ["🇪🇺"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🇪🇺 => 🇪🇺, flag: European Union"
]
}
]
}
Tokens 🇪🇺 flag European Union
Combinaisons lues par Elasticsearch
GET _analyze
{
"text": ["🏴"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🏴 => 🏴, flag, Scotland",
"🏴 => 🏴, flag, Bretagne, BZH"
]
}
]
}
Tokens 🏴 flag Scotland
Très bon support
GET _analyze
{
"text": ["🤘🏽"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🤘🏽 => 🤘🏽, horns, medium skin tone"
]
}
]
}
Tokens 🤘🏽 horns medium skin tone
Supporté aussi mais synonyme à affiner
GET _analyze
{
"text": ["🏳️🌈" , "❤️"],
"tokenizer": "standard",
"filter": [
{
"type": "synonym",
"synonyms": [
"🏳🌈 => 🏳🌈, pride, rainbow, rainbow flag",
"❤ => ❤, heart, red heart"
]
}
]
}
Tokens 🏳️🌈 ❤️
Nos synonymes ne matchent pas !
On supprime les variateurs AVANT le filtre de synonyme
{
"type": "pattern_replace",
"pattern": "\\uFE0E|\\uFE0F",
"replace": ""
}
Tokens 🏳️🌈 ❤️ pride rainbow heart...
GET _analyze
{
"text": ["🤌🏽", "🥷🏿"]
}
Tokens 🤌 🏽 🥷 🏿 séparés !
Mais globalement c'est très fonctionnel.
PUT /tweets
{
"settings": {
"analysis": {
"filter": {
"english_emoji": {
"type": "synonym",
"synonyms_path": "analysis/cldr-emoji-annotation-synonyms-en.txt"
},
"emoji_variation_selector_filter": {
"type": "pattern_replace",
"pattern": "\\uFE0E|\\uFE0F",
"replace": ""
}
},
"analyzer": {
"english_with_emoji": {
"tokenizer": "standard",
"filter": [
"lowercase",
"emoji_variation_selector_filter",
"english_emoji"
]
}
}
}
}
}
GET tweets/_analyze
{
"analyzer": "english_with_emoji",
"text": "🍕 🍍"
}
🍕 cheese pizza slice 🍍 fruit pineapple
Merci pour votre attention 🤘
@damienalexandre
https://github.com/jolicode/emoji-search/