I'm Gordon Tsai.

I'm a California based Machine Learning Engineer, currently working at TikTok Pte. Ltd. perform data analytic and statistic skill with proficiency in a variety of data programming and statistic tools such as SQL, Python to transform complicated data into more clear way and draw a conclusion thru those methods.

Resume Contact

About me

I am data scientist and start-up founder with a passion for machine learning and real application of tech in our society. With both a bachelor and master degree, I have a strong background in the machine learning and data science field.

Location:Bay Area, CA
Age:25
Nationality:Hong Kong, China
Interests:Motorcycles, Muay Thai, Banjos
Study:Data Science
Employment:TikTok Pte. Ltd.

Education

M.S.&E. Data science

Stanford University•Jun 2024

GPA: 3.84 / 4.0

B.A., Data science

University of California, Berkeley•May 2022

GPA: 3.78 / 4.0

Work

Machine Learning Engineer

Tiktok Pte. Ltd.•Aug 2024 - Present

ML/NLP Data Scientist Intern

Geico•Jun 2023 – Sep 2023

1. Tasked with enhancing GEICO's chatbot efficiency to reduce response time and improve customer service by addressing existing system limitations.
2. Built the development of advanced NLP algorithms using Transformers and LSTM models, and fine-tuned GPT-2, to accurately understand and categorize customer intent.
3. Containerized chatbot using Docker, implemented CI/CD pipelines with GitHub, and deployed solutions to cloud providers (e.g., AWS), ensuring consistent and scalable deployments.

Data Scientist

Berkeley Data Analytics Group LLC•Aug 2021 – Aug 2022

1. Cleaned, transformed, and merged 100,000+rows of data and implemented regression model and feature engineering from scratch for the fleet management optimization of electric trucks.
2. Used principal component analysis (PCA) to reduce the feature space, and leverage multiple linear regression to forecast the battery level for truck and using logistic regression to predict the probability of brake failure.
3. Optimized Fleet Management Using battery level forecasts and warning level prediction to reduce the operation.

Data Science Researcher

Data Analytics Research Lab | University of California, Berkeley•July 2021 – Dec 2021

1. Started with the item-based collaborative filtering for movie recommendation, which is based on KNN and Cosine similarity for nearest neighbor search to avoid “Curse of Dimensionality”.
2. Improved the movie recommender by utilizing Alternating Least Square (ALS) Matrix Factorization to overcome the shortcoming of popularity bias and item cold-start problem.
3. Conducted model hyper-parameters tuning with ML cross-evaluation toolbox and monitored data processing performance via Python.

Data Scientist Intern. Database Intelligence Department

Hong Kong Trade Development Council•Jun 2019 – Aug 2019

1. Started with the item-based collaborative filtering for movie recommendation, which is based on KNN and Cosine similarity for nearest neighbor search to avoid “Curse of Dimensionality”.
2. Improved the movie recommender by utilizing Alternating Least Square (ALS) Matrix Factorization to overcome the shortcoming of popularity bias and item cold-start problem.
3. Conducted model hyper-parameters tuning with ML cross-evaluation toolbox and monitored data processing performance via Python.

Skills

Spoken languages

English

Cantonese

Mandarin

Programming

Python

Java

Machine Learning

Large Language Model

Decision-Tree

RNN, CNN & LSTM

Statistics Analysis

Statistical Testing

Time Series

Game Theory

Projects

Causal Inference in (Brazil family income cause Educational Level)

1. OLS we will report confidence intervals and heteroscedasticity robust standard errors for our treatment variable and all other pre-treatment variables. We will also report the p-value for the treatment variable.
2. Use Double Lasso and Elastic Net, we will report confidence intervals and standard errors. For Double Lasso, we will report which variables had non zero coefficients.
3. Find out that higher dropout rates could be correlated with lower income areas where financial education would make a bigger difference.

MOMENTUM TRADING STRATEGY WITH LSTM

1. Built Fama-French 3 factors model using size of firms, book-to-market values, and market premium to predict S&P 500 stock return.
2. Backtest strategy performance by constructing a monthly rebalanced long only portfolio.
3. Test the traditional momentum model as an additive factor on the Fama-French model.
4. Using CAPM model to construct an alternative momentum factor to isolate beta effect and capture idiosyncratic stock performance.
5. Rerun back-testing and Sharpe Ratio increased from 1.04 to 1.28, with significantly lower vol and drawdown.
6. Applied Long Short Term Memory (LSTM) Model to predict the relative performance of each sector based on the historical pricing data.
7. Further improve the strategy by taking the sector allocation effect into consideration.

Credit Card Approval Prediction From Historical data with Machine Learning

1. Performed imbalance data cleaning and feature selection on origination and recurring factors, including delinquency, FICO score, and DTI ratio, etc. to determine the main causes of the default activities.
2. Constructed a variety of machine learning models, such as Logistic Regression, Decision Tree, Random Forest, Adaboost, SVM, and Neural Networks, to forecast loan default using Python (Sklearn).
3. Applied cross-validation to ensure model robustness and limit the risk of over-fitting, and compared model performance based on the ROC curve and AUC. The model has an 89% accuracy in the out-of-sample test.

Causal Inference In Biden Elections

1. Applied causal observational study to discover the causal relationship between the number of unique donations a presidential candidate received and the proportion of the vote they received in the primary.
2. Constructed Causal Directed Acyclic Graphs (DAG) to reveal the causal relationship between the treatment and the outcome, and used the Pearl’s Back-Door criterion to determine the minimum set of confounding factors.
3. Performed OLS regression adjustment, inverse probability weighting and propensity score matching to tear out the effect of confounding factors, and estimated the causal impact of number of donations on proportion of vote.

Get in touch.

Reach out to me for any inquiries or jobs opportunities.

Email: tsaiwaiyamgordon1126@gmail.com
Location: Bay Area, CA
Github: Gordon-Tsai