Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.
- Ingest and sample large network datasets with Polars
- Transform raw flow logs into feature-rich tabular format
- Develop modular ETL pipeline for local or streamed flow data
- Integrate anomaly detection and classification models
(e.g. Isolation Forest, LOF, Random Forest, LGBM) - Evaluate under real-world class imbalance
and that's Ritchie Vink - creator of Polars with my graffiti:
-
All of my projects are available at https://github.com/anopsy
-
How to reach me madkowalczuk@gmail.com
-
Fun fact π¨ I paint graffiti portraits
π¨ Selected Projects β£ββ Data Science Content Intern at NannyML: β β£ββ πPost-Deployment Data Science blogs β β β£ββ πData Quality and Covariate Shift β β βββ πModels aren't Forever β β£ββ contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets β βββ contributed to docs β β£ββ PyData and PyLadies Con speaker and volunteer at: β β£ββ π½PyData Amsterdam 2024 Talk-Alice in Open Source Land β β£ββ π€PyLadiesCon 2024 Talk β βββ πPyData Open Source Sprint β β£ββ Contributed to OSS at: β β£ββ π§±scikit-lego β β β£ββ contributed to docs β β βββ made ColumnSelector dataframe agnostic using Narwhals β βββ π³π¦narwhals β β β£ββ worked on pyarrow/dask backend implementation β β βββ contributed to docs and tests β βββ π‘embetter β β£ββ deprecated a method β βββ added pre-commit hooks β β£ββ Juniors_vs_ChatGPT β - Did ChatGPT replaced Juniors and Interns? β β£ββ data cleaning β β£ββ data wrangling β β£ββ data analysis β β£ββ modeling β βββ pythonπ/API/polarsπ»ββοΈ/hvplotπ β β£ββ Compensation Prediction β - How much do Engineers earn? β β£ββ data modeling β β£ββ model evaluation β β£ββ containerization using docker β β£ββ building streamlit app β βββ pythonπ/scikit-learn/streamlitπ/dockerπ¦ β β£ββ MaskMap: Decoding the Hidden Spectrum β - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking β β£ββ data scraping β β£ββ data cleaning β β£ββ modeling β β£ββ deploying β βββ pythonπ/pandasπΌ/FastAPI β β£ββ Equity in Healthcare: Women in Data Science Datathon 2024 β - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer β β£ββ data cleaning β β£ββ data wrangling β β£ββ data analysis β β£ββ modeling β βββ pythonπ/pandasπΌ/ensembleπ³/kerasπ§ β β£ββ Relative Search Volumes Analysis β - Search Volumes for Autism vs Autism Spectrum Disorder around the world β β£ββ data scraping β β£ββ data cleaning β β£ββ modeling WIP β βββ pythonπ/pandasπΌ β β£ββ Steelplate Defect Visual EDA β - Colorful joyplots for Visual EDA β β£ββ data visualization β β£ββ ensemble β βββ pythonπ/pandasπΌ/xgbπ³/seabornπ¨ β β£ββ hossenfelder - π¦ΊWIP β - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel β β£ββ data scraping β β£ββ data cleaning β β£ββ modeling WIP β βββ pythonπ/pandasπΌ β βββ MyFalaClassifier - π¦ΊWIP - Detector of surfable waves β£ββ live-stream scraping β£ββ image processing β£ββ transfer learning β£ββ deploying βββ pythonπ/kerasπ§


