Skip to content
View s2t2's full-sized avatar

Organizations

@codeforamerica @codeforutah @sfbrigade @data-creative @challenger-research @slco-2016 @prof-rossetti @rossetti-gov @gwuniversity

Block or report s2t2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
s2t2/README.md

Michael J Rossetti

Bio

Michael Rossetti is a data scientist, software developer, and machine learning researcher. He has worked as a polling data analyst for a winning US Presidential campaign, a data analytics director for a Silicon Valley startup, and a technology consultant for the US Government. He teaches courses in data science, computer science, and software development, and conducts research in applied machine learning.

Machine Learning Research Experience

Machine learning researcher with extensive experience in industry and academia. Proficient in supervised methods including regression and classification, and unsupervised methods including dimensionality reduction and clustering. Familiar with reinforcement learning, deep learning, and neural networks, including convolutional and recurrent networks. Experienced in model training, validation, and optimization, including hyperparameter tuning through techniques like grid search with cross-validation. Specializes in natural language processing (NLP), content recommendation systems, and development of novel applications for large language models (LLMs), including retrieval augmented generation (RAG).

Machine Learning Research Projects

"Political Debate Assessment Tool / AI Voter Survey" (2024)

Created AI tools in Python to provide automated and impartial assessments of US Presidential debates. Performed retrieval augmented generation using a large language model (Claude from Anthropic). Leveraged AI to identify memorable quotes, assess moderator fairness, assess which candidate won the debate, and assess how likely certain demographic groups are to vote for each candidate.

"Homework Grading AI Agent" (2023)

Created AI tools in Python to automate the grading of student homework documents. Performed retrieval augmented generation using large language models (ChatGPT from OpenAI, and LLaMA from Meta). Used prompt engineering to improve the agent’s grading performance. Validated the proof of concept.

"Text Embeddings for User Classification in Social Networks" (2023)

Classified users in social networks based on the content of their posts. Trained classification models (Logistic Regression, Random Forest, and XGBoost), to classify whether or not a given user is an automated "bot". Achieved 95% F1 score and 98% ROC-AUC score on test data. Compared results using different text embedding methods (TF-IDF, Word2Vec, and models from OpenAI).

"Hashtag Similarity Mapping" (2022)

Assessed the similarity of hashtags in a given Twitter discussion, based on hashtag co-occurrence in user profiles, to identify and monitor disinformation related content on social networks. Applied dimensionality reduction methods (PCA, T-SNE, and UMAP) and clustering methods (HDBSCAN), to identify groups of related hashtags, including obscure hashtags associated with disinformation campaigns.

"Artist Similarity Mapping / Music Information Retrieval" (2022)

Assessed the similarity of artists and songs, for music recommendation purposes. Wrote Python code to download audio files from YouTube, and extract audio features such as tempo. Applied dimensionality reduction methods (PCA, T-SNE, and UMAP) to identify related artists and songs.

"Bots, Disinformation, and the First Trump Impeachment" (2020 - 2023)

Analyzed the role of automated accounts called "bots" in spreading disinformation across social networks. Developed Python scripts to extract hundreds of millions of data points from the Twitter API. Architected Google BigQuery databases and ETL pipelines to store large scale data. Wrote SQL queries and Python scripts to perform data analysis and conduct statistical tests. Trained, evaluated, and deployed natural language processing models to classify a user’s political sentiments based on their social media posts. Achieved 88% accuracy on test data using benchmark models (Logistic Regression and Naive Bayes), and 96% accuracy using BERT.

Pinned Loading

  1. prof-rossetti/intro-to-python prof-rossetti/intro-to-python Public

    An Introduction to Programming in Python

    Jupyter Notebook 97 246

  2. langchain-ta langchain-ta Public

    Jupyter Notebook 1 2

  3. tic-tac-toe-py tic-tac-toe-py Public

    a basic adversarial game

    Python 32 201

  4. openai-embeddings-2023 openai-embeddings-2023 Public

    Classifying users on social media, using text embeddings from OpenAI and others

    HTML 4

  5. impeachment-web-react impeachment-web-react Public

    JavaScript 1 3

  6. tweet-analysis-2023 tweet-analysis-2023 Public

    Python 1