Skip to content
View anopsy's full-sized avatar

Organizations

@narwhals-dev

Block or report anopsy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
anopsy/README.md

Hi πŸ‘‹, I'm Magdalena Kowalczuk

--- Interested in understanding and improving digital systems. Open source minded.

🚧 I’m currently working on

πŸ›°οΈπŸ€– Flow-Based Bot Detection Pipeline

Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.


πŸ”§ Key Components

  • Ingest and sample large network datasets with Polars
  • Transform raw flow logs into feature-rich tabular format
  • Develop modular ETL pipeline for local or streamed flow data
  • Integrate anomaly detection and classification models
    (e.g. Isolation Forest, LOF, Random Forest, LGBM)
  • Evaluate under real-world class imbalance

I’m currently learning ** Polars**

and that's Ritchie Vink - creator of Polars with my graffiti:

Ritchie Vink

🎨 Selected Projects
┣━━ Data Science Content Intern at NannyML:
┃   ┣━━ πŸ“ˆPost-Deployment Data Science blogs
┃   ┃   ┣━━ πŸ“‰Data Quality and Covariate Shift 
┃   ┃   ┗━━ πŸŒ€Models aren't Forever 
┃   ┣━━ contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets  
┃   ┗━━ contributed to docs  
┃ 
┣━━ PyData and PyLadies Con speaker and volunteer at:
┃   ┣━━ πŸ’½PyData Amsterdam 2024 Talk-Alice in Open Source Land
┃   ┣━━ πŸ€–PyLadiesCon 2024 Talk
┃   ┗━━ πŸƒPyData Open Source Sprint 
┃ 
┣━━ Contributed to OSS at:
┃   ┣━━ 🧱scikit-lego
┃   ┃   ┣━━ contributed to docs  
┃   ┃   ┗━━ made ColumnSelector dataframe agnostic using Narwhals 
┃   ┗━━ πŸ³πŸ¦„narwhals 
┃   ┃   ┣━━ worked on pyarrow/dask backend implementation  
┃   ┃   ┗━━ contributed to docs and tests   
┃   ┗━━ πŸ’‘embetter
┃       ┣━━ deprecated a method  
┃       ┗━━ added pre-commit hooks  
┃ 
┣━━ Juniors_vs_ChatGPT 
┃   - Did ChatGPT replaced Juniors and Interns? 
┃   ┣━━ data cleaning
┃   ┣━━ data wrangling
┃   ┣━━ data analysis
┃   ┣━━ modeling
┃   ┗━━ python🐍/API/polarsπŸ»β€β„οΈ/hvplotπŸ“Š
┃ 
┣━━ Compensation Prediction 
┃   - How much do Engineers earn? 
┃   ┣━━ data modeling
┃   ┣━━ model evaluation
┃   ┣━━ containerization using docker
┃   ┣━━ building streamlit app
┃   ┗━━ python🐍/scikit-learn/streamlitπŸ“ˆ/dockerπŸ“¦
┃  
┣━━ MaskMap: Decoding the Hidden Spectrum  
┃   - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking
┃   ┣━━ data scraping
┃   ┣━━ data cleaning
┃   ┣━━ modeling
┃   ┣━━ deploying
┃   ┗━━ python🐍/pandas🐼/FastAPI
┃  
┣━━ Equity in Healthcare: Women in Data Science Datathon 2024 
┃   - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer
┃   ┣━━ data cleaning
┃   ┣━━ data wrangling
┃   ┣━━ data analysis
┃   ┣━━ modeling
┃   ┗━━ python🐍/pandas🐼/ensemble🌳/keras🧠
┃  
┣━━ Relative Search Volumes Analysis  
┃   - Search Volumes for Autism vs Autism Spectrum Disorder around the world
┃   ┣━━ data scraping
┃   ┣━━ data cleaning
┃   ┣━━ modeling WIP
┃   ┗━━ python🐍/pandas🐼
┃  
┣━━ Steelplate Defect Visual EDA  
┃   - Colorful joyplots for Visual EDA
┃   ┣━━ data visualization
┃   ┣━━ ensemble
┃   ┗━━ python🐍/pandas🐼/xgb🌳/seaborn🎨
┃  
┣━━ hossenfelder - 🦺WIP  
┃ - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel
┃   ┣━━ data scraping
┃   ┣━━ data cleaning
┃   ┣━━ modeling WIP
┃   ┗━━ python🐍/pandas🐼
┃  
┗━━ MyFalaClassifier - 🦺WIP  
- Detector of surfable waves
    ┣━━ live-stream scraping
    ┣━━ image processing
    ┣━━ transfer learning
    ┣━━ deploying
    ┗━━ python🐍/keras🧠

Languages and Tools:

pandas polars scikit_learn python seaborn bash git postgresql tensorflow go gcp

Connect with me:

anopsy madkowalczuk anopsy anopsy_amsterdam @anopsy28

anopsy

Β anopsy

anopsy

anopsy

anopsy

Pinned Loading

  1. Juniors_vs_ChatGPT Juniors_vs_ChatGPT Public

    Inspired by personal curiosity and a 2023 Hackathon challenge (won in the β€˜Most Polished’ category). This project investigates the impact of large language models like ChatGPT on entry-level roles …

    Jupyter Notebook 2

  2. Compensation-prediction Compensation-prediction Public

    An integrated data modeling and model experimentation project, packaged as a Streamlit app for predicting estimated compensation in engineering jobs

    Jupyter Notebook 2

  3. MaskMap MaskMap Public

    Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking (AM) and help medical staff and patients differentiate between anxiety, depression, and the lon…

    Jupyter Notebook

  4. Equity_in_Healthcare Equity_in_Healthcare Public

    Predicitng a timely diagnosis in metastatic cancer patients. Data cleaning, feature engineering and hyperparams tuning of classification model ensemble

    Jupyter Notebook 1

  5. koaning/scikit-lego koaning/scikit-lego Public

    Extra blocks for scikit-learn pipelines.

    Python 1.4k 124

  6. narwhals-dev/narwhals narwhals-dev/narwhals Public

    Lightweight and extensible compatibility layer between dataframe libraries!

    Python 1.5k 179