Michael Rossetti s2t2

Michael J Rossetti

Bio

Michael Rossetti is a data scientist, software developer, and machine learning researcher. He has worked as a polling data analyst for a winning US Presidential campaign, a data analytics director for a Silicon Valley startup, and a technology consultant for the US Government. He teaches courses in data science, computer science, and software development, and conducts research in applied machine learning.

Machine Learning Research Experience

Machine learning researcher with extensive experience in industry and academia. Proficient in supervised methods including regression and classification, and unsupervised methods including dimensionality reduction and clustering. Familiar with reinforcement learning, deep learning, and neural networks, including convolutional and recurrent networks. Experienced in model training, validation, and optimization, including hyperparameter tuning through techniques like grid search with cross-validation. Specializes in natural language processing (NLP), content recommendation systems, and development of novel applications for large language models (LLMs), including retrieval augmented generation (RAG).

Machine Learning Research Projects

"Political Debate Assessment Tool / AI Voter Survey" (2024)

Created AI tools in Python to provide automated and impartial assessments of US Presidential debates. Performed retrieval augmented generation using a large language model (Claude from Anthropic). Leveraged AI to identify memorable quotes, assess moderator fairness, assess which candidate won the debate, and assess how likely certain demographic groups are to vote for each candidate.

Code
Slides

"Homework Grading AI Agent" (2023)

Created AI tools in Python to automate the grading of student homework documents. Performed retrieval augmented generation using large language models (ChatGPT from OpenAI, and LLaMA from Meta). Used prompt engineering to improve the agent’s grading performance. Validated the proof of concept.

Code

"Text Embeddings for User Classification in Social Networks" (2023)

Classified users in social networks based on the content of their posts. Trained classification models (Logistic Regression, Random Forest, and XGBoost), to classify whether or not a given user is an automated "bot". Achieved 95% F1 score and 98% ROC-AUC score on test data. Compared results using different text embedding methods (TF-IDF, Word2Vec, and models from OpenAI).

"Hashtag Similarity Mapping" (2022)

Assessed the similarity of hashtags in a given Twitter discussion, based on hashtag co-occurrence in user profiles, to identify and monitor disinformation related content on social networks. Applied dimensionality reduction methods (PCA, T-SNE, and UMAP) and clustering methods (HDBSCAN), to identify groups of related hashtags, including obscure hashtags associated with disinformation campaigns.

"Artist Similarity Mapping / Music Information Retrieval" (2022)

Assessed the similarity of artists and songs, for music recommendation purposes. Wrote Python code to download audio files from YouTube, and extract audio features such as tempo. Applied dimensionality reduction methods (PCA, T-SNE, and UMAP) to identify related artists and songs.

"Bots, Disinformation, and the First Trump Impeachment" (2020 - 2023)

Analyzed the role of automated accounts called "bots" in spreading disinformation across social networks. Developed Python scripts to extract hundreds of millions of data points from the Twitter API. Architected Google BigQuery databases and ETL pipelines to store large scale data. Wrote SQL queries and Python scripts to perform data analysis and conduct statistical tests. Trained, evaluated, and deployed natural language processing models to classify a user’s political sentiments based on their social media posts. Achieved 88% accuracy on test data using benchmark models (Logistic Regression and Naive Bayes), and 96% accuracy using BERT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly