Machine Learning and Large Language Models for Harnessing Unstructured Data using Stata/ Python

Machine Learning and Large Language Models for Harnessing Unstructured Data using Stata/ Python

A 3-day intensive in-person school

This intensive three-day workshop brings together two core pillars of contemporary research in data science: Large Language Models (LLMs) to work with unstructured text analysis and feature engineering, and Machine Learning (ML) & AI modeling in Stata and Python for predictive analytics. Participants will gain the ability to design full research pipelines, starting from messy raw text, structuring it into analyzable features, and applying supervised/unsupervised ML models to generate predictions, improve data insights, and produce reliable classifications.

Inscreva-se aqui
Inscreva-se aqui
US$ 540,00
Guaranteed safe and secure checkout
3 Days
London
Stata

Overview

This intensive three-day workshop brings together two core pillars of contemporary research in science: Large Language Models to work with unstructured text analysis and feature engireeing, and Machine Learning & AI modeling in Stata and Python for predictive analytics.

 

Participants will gain the ability to design full research pipelines, starting from messy raw text, structuring it into analyzable features, and applying supervised/ unsupervised ML models to generate predictions, improve data insights, and produce reliable classifications. The course belnds theory, methodological insights, and hands-on coding in Stata/ Python and LLMs. Case studies across the social sciences, economics, health, and humanities illustrate practical relevance.

The course is ideal for PhD students, early-career researchers, data scientists, and applied professionals interested in test-driven machine learning applications.

 

 

Key Skills Acquired

By the end of the course, students will understand:

  • How to clean, preprocess, and structure raw text data, turning it into feature sets suitable for machine learning

  • How to train, validate, and evaluate machine learning models on text-derived features, balancing predictive accuracy and interpretability.

  • How to integrate advanced neural models into their text analytics workflows, critically evaluate applications across domains, and build responsible, reproducible piplines

 

Learning Outcomes
  • Master the entire pipeline from raw text to validated ML model

  • Gain hands-on experience in Python LLMs/ NLP and ML libraries

  • Understand the trade-offs between traditional ML and neural AI models

  • Be equipped to apply these techniques in their own domain

  • Develop critical awareness of ethical challenges, data privacy, and responsible AI practices

 

Course Structure

The workshops is organised across three progessive days:

  • Day 1 introduces the foundations of LLMs text analytics, covering text preprocessing, natural language processing (NLP) techniques, and feature extraction from raw data

  • Day 2 focuses on machine learning fundamentals for text-based research, including model construction, validation techniques, classical algorithms, and dimensionality reduction

  • Day 3 explores advanced LLMs and AI integration through neural networks and transforormer-based models, followed by domain-specific applications, ethical considerations, and a hands-on capstone project

 

Across all sessions, participants work directly in Stata and Python to develop reproducible text-analytic workflows and gain practical algorithmic experience.

Agenda

Day 1

Morning Session: Understanding Text as Data
Afternoon Session: Preprocessing and Feature Extraction
Day 2

Morning Session: Model Building and Validation
Afternoon Session: Core ML Algorithms with Text Features
Day 3

Morning Session: Deep Learning and Neural Approaches to Text
Afternoon Session: Applications, Ethics, and Capstone Project

Prerequisites

No prior knowledge of LLM or Python required

Basic Regression and Statistics knowledge

Working knowledge of Stata

 

 

 

Course Timetable

Subject to minor changes

Day Morning Session Afternoon Session (including tutorial)
Day One 10am-12pm (London time) 1pm-3pm (London time)
Day Two 9am-11pm (London time) 12pm-2pm (London time)

Entregue por