About Me

I am an expert in data science, machine learning, natural language processing and software development.
My skills include data analysis, training ML and DL models, designing and writing efficient code, writing papers and team leading. Also I teach a lot and like to be a public speaker.
My scientific adviser is Konstantin Vorontsov.

Scopus ID: 57022229800
WoS ResearcherID: R-9415-2016

Education

  1. Lomonosov Moscow State University / Russian Academy of Sciences

    Moscow, 2017 — 2020

    CMC Faculty, MMF Department, graduated from postgraduate studies early.
    Federal Research Center «Informatics and Management», defended PhD thesis in Computer Science on topic «Effective Implementation of Topic Modeling Algorithms with Additive Regularization» (in Russian) PhD Page

  2. Lomonosov Moscow State University

    Moscow, 2015 — 2017

    CMC Faculty, MMF Department, Master of Applied Mathematics and Computer Science, graduated with honors

  3. Lomonosov Moscow State University

    Moscow, 2011 — 2015

    CMC Faculty, MMF Department, Bachelor of Applied Mathematics and Computer Science

  4. Maykop Gymnasium No. 22

    Maykop, 2009 — 2011

    Physics and mathematics class, graduated with a silver medal

  5. Republican Natural-Mathematical School at Adyghe State University

    Maykop, 2007 — 2011

    Specialty «Physics», graduated with honors

  6. Maykop Programming School «Turbo»

    Maykop, 2008 — 2010

    Specialty «System programming», graduated with honors

Resume

Full-time Jobs

  1. Samokat.Tech, Search Tech Lead

    February 2024 — Nowadays
  2. SberDevices, Lead Data Scientist

    May 2023 — February 2024
    1. Research and experiments with LLM training, fine-tuning and measuring in GigaChat project (Python / NLP / Research)
    2. Long-range models evaluation and datasets analysis with publication of reviews on habr.com (Python / NLP / Research)
  3. Just AI, NLP R&D Team Lead

    October 2021 — April 2023
    1. NLP core of cloud text processing engine (Team leading / Project managment / NLP / Python)
    2. Project on training and distilling models for text vectorization and classification (NLP / Python)
    3. Project on training models for controllable text paraphrasing (NLP / Research / Python)
    4. Extensive research on LLM development with the publication of a review on habr.com (NLP / Research)
  4. TalkMart42, Data Scientist / Co-founder

    April 2020 — October 2022
    1. Prototype of voice control system for self-service checkout for retail company (Python / NLP)
    2. English neural QA-system for specific domain for advertising agency (Python / NLP)
    3. Advertisements classification system for advertising and media research company (Python / NLP)
    4. Voice games for Yandex Alice and Sber Salut (Python)
    5. Voice and visual business apps in Sber Salut (TypeScript / Python)
  5. Alfa Bank, Senior Data Scientist

    August 2021 — October, 2021
    1. ML and DL models for chat-bot intent detection (Python / NLP)
    2. Project managment of several NLP tasks with reporting to senior officials (Team leading / Project managment)
  6. Digital Decisions (Aithea), NLP Team Lead / Senior Data Scientist

    April 2019 — April 2020
    1. Spell checker for government organization (C++ / Python / NLP).
    2. English NER in application onboarding for B2C startup (Project managment)
    3. Search and recommender system for English scientific articles (Project managment / Team leading)
    4. Trend detection in Twitter and YouTube data for Instinct BBDO (Python / NLP / Team leading)
    5. Collecting NLP-data for Huawei voice assistant (Python / NLP / Team leading)
    6. Tickets ordering chatbot for B2C company (NLP / Team leading / Project managment)
  7. Yandex Zen, Middle ML Engineer

    September 2017 — April 2019
    1. System for evergreen content detection within Zen recsys (Java / Python / SQL / ML)
    2. System for one-day/yellow/black content classification within Zen recsys (Java / Python / NLP)
    3. Zen recsys channel catalog ranking system (Java / SQL)
    4. Zen recsys fresh content usage strategies algorithms (Java / SQL)
  8. Yandex Search, Junior Software Engineer

    April 2016 — September 2017
    1. Yandex Search antirobot development (C++)
  9. Computing Centre of Russian Academy of Sciences, Software Engineer

    July 2014 — April 2016
    1. BigARTM - open-source library for topic modeling (C++ / Python / NLP)
    2. Ethnic content exploration in social networks data (Python / NLP)

Freelance Projects

  1. As Data Scientist

    1. Clustering and sentiment analysis of bank employees reviews (Python / NLP)
    2. Retail data analysis and parsing for scientific nutritional project (Python / NLP)
  2. As ML Developer

    1. Development of recommender system for online food store (Python)
    2. Development prompter system for bank call center (Python / NLP)

Teaching

Academic Courses

  1. Moscow Institute of Physics and Technology

    Moscow, 2019 — Nowadays

    Department of Intelligent Systems

    Lecturer of semester Python course (in Russian)
  2. Lomonosov Moscow State University

    Moscow, 2017 — Nowadays

    CMC Faculty, MMF Department

    Lecturer of semester NLP course, earlier — assistant (in Russian)
  3. Moscow Institute of Physics and Technology

    Moscow, 2018 — 2022

    Department of Intelligent Systems

    Lecturer of semester NLP course (in Russian)
  4. Online Education Company Netology

    Online, 2022

    Lecturer of DL course (in Russian)

  5. Higher School of Economics

    Online, 2020

    Faculty of Computer Science

    Lecturer of online NLP course (in Russian)
  6. Yandex School of Data Analysis

    Moscow, 2016 — 2019

    Assistant in annual ML course (in Russian)

  7. Higher School of Economics

    Moscow, 2017 — 2018

    Faculty of Computer Science

    Assistant in semester NLP course (in Russian)

One-time lessons

  1. SberDevices

    Moscow, 18.04.2024

    External seminar on LLM training (extended version, in Russian) Slides

  2. Skoltech

    Moscow, 29.02.2024

    External seminar on LLM training (in Russian) Slides

  3. SberDevices

    Moscow, 08.12.2023

    Internal seminar on positional encoding in Transformer models (in Russian) Slides Video

  4. SberDevices

    Moscow, 15.09.2023

    Internal seminar on multilodal Transformer models (in Russian) Slides Video

  5. SberDevices

    Moscow, 07.07.2023

    Internal seminar on long context processing in Transformer models (in Russian) Slides

  6. SberDevices

    Moscow, 16.06.2023

    Internal seminar on diffusion models in text generation (in Russian) Slides

  7. Online Education Company Netology

    Online, 22.02.2022

    Discussion of Data Science issues with students (in Russian)

  8. Just AI

    Online, 21.02.2022

    Internal seminar on introduction in NLP (in Russian) Slides

  9. Higher School of Economics / Alliance for Artificial Intelligence

    Online, 09.02.2022

    Seminar within education course for university teachers on the requirements for NLP specialists in the industry (in Russian) Slides

  10. Online Education Company SkillFactory

    Moscow, 16.04.2021

    Interview about industrial cases in NLP (in Russian)

  11. Moscow Institute of Physics and Technology

    Moscow, Many times since 2015

    Lection on parallel, distributed and online algorithms for topic models training (in Russian)

    Slides
  12. Sirius School

    Adler, 14.11.2018 — 17.11.2018

    Four lections ML mini-course (in Russian)

    1st Lection Slides 3rd Lection Slides
  13. Higher School of Economics

    Moscow, 25.09.2017

    Lection and seminar within School of Linguistics on topic modeling and BigARTM library (in Russian)

  14. Education Company NewProLab

    Moscow, 01.06.2017

    Lection within BigData course on topic modeling in BigARTM library (in Russian)

  15. Lomonosov Moscow State University

    Moscow, 17.03.2017

    Lection on modeling and regularization strategies in BigARTM library (in Russian) Slides

  16. Higher School of Economics

    Moscow, 17.03.2017

    Lection on topic modeling (in Russian) Slides Video

  17. Lomonosov Moscow State University

    Moscow, 30.11.2016

    Seminar on topic modeling experiments methodology using BigARTM library (in Russian) Slides

  18. Lomonosov Moscow State University

    Moscow, 16.10.2015

    Seminar on topic modeling about BigARTM library Python API (in Russian) Slides

  19. Moscow Institute of Physics and Technology

    Moscow, 25.09.2015

    Lection on topic modeling (in Russian) Article

  20. Yandex School of Data Analysis

    Moscow, 30.09.2014

    Scientific seminar on topic modeling about BigARTM library (in Russian) Slides

Activities

Public Events

  1. YNDX Family Meetup

    Online, 02.04.2024

    Report (in Russian)
    Topic: How to train large language models (extended version) Slides

  2. FPMI ML Meetup

    Moscow, 30.01.2024

    Report from SberDevices (in Russian)
    Topic: How to train large language models Slides

  3. HighLoad++ Conference

    Moscow, 27.11.2023

    Working at a booth as an expert from SberDevices (in Russian)
    Topic: General topics within LLMs subject, GigaChat presentation and discussion

  4. OpenTalks.AI Conference

    Yerevan, 07.03.2023

    Report from Just AI (in Russian)
    Topic: On the way to industrial NLP-platform: transformers, microservices, architecture Slides

  5. Conversations Conference

    Moscow, 02.12.2022

    Report from Just AI (in Russian)
    Topic: On the way to industrial NLP-platform: transformers, microservices, architecture Related Article

  6. Hottcast Podcast

    Moscow, 31.03.2021

    Discussion from TalkMart42 (in Russian)
    Topic: Billboard in New Zealand and Questions About the Future for the Neural Network at safertomorrow.online

  7. DataStart Conference

    Online, 23.04.2020

    Report from Digital Decisions, together with Roxana Bushkova from Instinct, BBDO (in Russian)
    Topic: User Content Analysis in the Task of Trends Detection for Situational Marketing

  8. DataStart Conference

    Online, 23.04.2020

    Report from Digital Decisions, together with Irina Piontkovskaya from Huawei (in Russian)
    Topic: Preparing an Industrial Model for a Voice Assistant at Minimal Cost

  9. FRUCT International Conference

    Online, 23.04.2020

    Report (in English)
    Topic: Learning Topic Models with Arbitrary Loss Slides

  10. OpenTalks.AI Conference

    Moscow, 20.02.2020

    Report from Digital Decisions, together with Irina Piontkovskaya from Huawei (in Russian)
    Topic: Preparing an Industrial Model for a Voice Assistant at Minimal Cost

  11. Data Culture Hack Hackathon

    Moscow, 30.11.2019

    Tutorial (in Russian)
    Topic: Extracting Discussed Topics from a Text Corpora Slides

  12. ICML International Conference

    Stockholm, 12.07.2018

    Working at a booth as an expert from Yandex (in English)
    Topic: Set of Yandex products presentation and discussion

  13. Student Conference Lomonosov-2016

    Moscow, 14.04.2016

    Report, section «Computational Mathematics and Cybernetics», subsection «Machine Learning» (in Russian)
    Topic: Additive Regularization of Topic Models in the Problem of Ethnosocial Discourse Analysis Slides

  14. AIST International Conference

    Yekaterinburg, 08.04.2016

    Report (in Russian)
    Topic: Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling Slides

  15. All-Russian Engineering Competition

    Moscow, 21.10.2015

    Report (in Russian)
    Topic: Open-Source Software for Topic Modeling of Large Text Collections Slides

  16. Student Conference Lomonosov-2015

    Moscow, 16.04.2015

    Report, section «Computational Mathematics and Cybernetics», subsection «Programming» (in Russian)
    Topic: Implementing Multimodal Regularized Topic Models in the Open-Source Library BigARTM Slides

Competitions and Awards

  1. Yandex Alice Urban Skills Online Hackathon

    Moscow, 2021

    3rd place winner in the TalkMart42 team (together with Roman Ishchenko and Sergey Chernov)

  2. «VirusHack» Hackathon

    Moscow, 2020

    Winner of the «Megapolis-Moscow» track as NLP-engineer in the Buckwheat42 team (together with Roman Ishchenko and Sergey Chernov)

  3. Ilya Segalovich Award in Computer Science of Yandex

    Moscow, 2019

    Finalist

  4. AIST International Conference

    Yekaterinburg, 2016

    Best Paper (co-author) and Best Report Awards in NLP section

  5. All-Russian Engineering Competition

    Moscow, 2015

    Winner of the Competition of Individual Research Projects in the section «Informatics and Computer Science»

  6. All-Russian Television Humanitarian Olympics «Умники и умницы»

    Moscow, 2011

    Finalist

  7. Republican Competition of Internet Resources

    Maykop, 2010

    Absolute winner

Publications

Web Articles

Research Papers

  1. Evgeny Orlov, Murat Apishev. Paraphrasers and Classifiers: Controllable Text Generation for Text Style Transfer // Analysis of Images, Social Networks and Texts. AIST 2023

  2. M. Apishev. Effective Implementations of Topic Modeling Algorithms // Programming and Computer Software, Vol. 47, No. 7, pp. 483–492, 2021

  3. E. Artemova, M. Apishev, V. Sarkisyan, S. Aksenov, D. Kirjanov, O. Serikov. Teaching a Massive Open Online Course on Natural Language Processing // Proceedings of the Fifth Workshop on Teaching NLP @ NAACL, 2021, pp. 13-27

  4. M. Apishev, K. Vorontsov, Learning Topic Models With Arbitrary Loss // Proceedings of the 26th Conference of FRUCT Association, 2020, pp. 30-37

  5. Апишев М. А. Эффективные реализации алгоритмов тематического моделирования // Труды ИСП РАН, том 32, вып. 1, 2020 г., стр. 137–152

  6. Жариков И. Н., Апишев М. А., Воронцов К. В. Гиперграфовые многомодальные вероятностные тематические модели транзакционных данных // Интеллектуализация обработки информации (ИОИ-2018): Тезисы докл. — Москва: Торус Пресс, 2018. С.148–149

  7. Denis Kochedykov, Murat Apishev, Lev Golitsyn, Konstantin Vorontsov. Fast and Modular Regularized Topic Modelling // Proceedings of the 21st conference of FRUCT association, 2017, pp. 182-193

  8. Апишев М. А., Кольцов С. Н., Кольцова О. Ю., Николенко С. И., Воронцов К. В. Аддитивная регуляризация тематических моделей для поиска этничного дискурса в социальных медиа // Интеллектуализация обработки информации (ИОИ-2016): Тезисы докл. — Москва: Торус Пресс, 2016. С.170–171

  9. Apishev M., Koltcov S., Koltsova O., Nikolenko S., Vorontsov K. Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts // Advances in Computational Intelligence, 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Cancún, Quintana Roo, Mexico, October 23 to 29, 2016. Proceedings, Part I. Lecture Notes in Artificial Intelligence, Volume 10061, pp. 166–181

  10. Apishev M., Koltcov S., Koltsova O., Nikolenko S., Vorontsov K. Mining Ethnic Content Online with Additively Regularized Topic Models // Computación y Sistemas, Vol. 20, No. 3, 2016, pp. 387–403

  11. Апишев М. А. Аддитивная регуляризация тематических моделей в задаче анализа этносоциального дискурса // Сборник тезисов XXIII Международной научной конференции студентов, аспирантов и молодых учёных «Ломоносов-2016», секция «Вычислительная математика и кибернетика» - Москва: МАКС Пресс, 2016, с. 117–119

  12. Oleksandr Frei and Murat Apishev. Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling // Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, pp. 132–144

  13. K. Vorontsov, O. Frei, M. Apishev., P. Romov, M. Suvorova, A. Yanina. Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large Collections // Topic Models: Post-Processing and Applications, CIKM 2015 Workshop, October 19, 2015, Melbourne, Australia. ACM, New York, NY, USA. pp. 29–37

  14. Воронцов К. В., Фрей А. И., Ромов П. А., Янина А. О., Суворова М. А., Апишев М. А. BigARTM: библиотека с открытым кодом для тематического моделирования больших текстовых коллекций // Аналитика и управление данными в областях с интенсивным использованием данных. XVII Международная конференция DAMDID/RCDL’2015, Обнинск, 13-16 октября 2015. — С.28–36

  15. Апишев М. А. Реализация мультимодальных регуляризованных тематических моделей в библиотеке с открытым кодом BigARTM // Сборник тезисов XXII Международной научной конференции студентов, аспирантов и молодых учёных «Ломоносов-2015», секция «Вычислительная математика и кибернетика» - Москва: МАКС Пресс, 2015, с. 91–92

  16. Vorontsov K. V., Frei O. I., Apishev M. A., Romov P. A., Dudarenko M. A. BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections // AIST’2015, Analysis of Images, Social networks and Texts. Springer International Publishing Switzerland, 2015. Communications in Computer and Information Science (CCIS), pp. 370–384

  17. Воронцов К. В., Фрей А. И., Апишев М. А., Дойков Н. В., Суворова М. А. Регуляризация тематических моделей в библиотеке с открытым кодом BigARTM // Математические методы распознавания образов: 17-ая Всеросс. конф.: Докл. М.: Торус, 2015. С. 222–223

  18. Воронцов К. В., Потапенко А. А., Фрей А. И., Апишев М. А., Дойков Н. В., Шапулин А. В., Чиркова Н. А. Многокритериальные и многомодальные вероятностные тематические модели коллекций текстовых документов // Интеллектуализация обработки информации (ИОИ-2014): Тезисы докл. — Москва: Торус Пресс, 2014. С. 198–199