As a data scientist you will be responsible for ensuring that our current data science pipelines run smoothly over time with the best performance, as well as developing new machine learning pipelines and algorithms by:
- Creating datasets from our huge data lake of products and social media data, selecting the most relevant items for your use case, cleaning the datasets and ensuring the data quality
- Designing machine learning models according to the specifications of the use case and considering the state of the art, performance and KPIs defined by our business teams
- Deploy the models in production, looking for the best performance taking into consideration the huge amount of information processed by our systems daily
Your Skills
In order to be successful, you will need to bring your skills and knowledge around these technologies:
- 3+ years of experience as a production level Python developer, as it is the base of our tech stack at many levels.
- Deep learning frameworks: Keras or Pytorch
- Machine learning and Python data libraries like scikit-learn, pandas or numpy
- Experience with SQL databases: preferably PostgreSQL
- Linux shell command line.
- Version Control in a collaborative environment with Git
- Strong communication skills (written and oral) in English and Spanish
Bonus Points
You will learn on the job many of the other tech pieces we use. Of course it will be easier if you are already familiar with any them:
- Django or any other Python ORM
- Image processing libraries like OpenCV or Pillow
- NLP processing libraries such as Spacy or NLTK
- RabbitMQ, Celery
- System monitoring with InfluxDB, Grafana
The Team
The Data Science team is responsible for enriching the data that our crawlers collect massively from fashion related websites. For that, you will have to use our machine learning models. The data pipeline starts with hundreds of spiders (running in python with scrapyd) that continuously crawl websites. It continues with a series of data quality and data enrichment processes (python, celery, rabbitMQ, SQL, Keras, OpenCV) and ends dumping clean, validated and normalized product information for our customers to be consumed.
One of the most important pieces in this pipeline is enriching the data with machine learning models. This adds information such as the categories (clothing, footwear, beauty…), genders, attributes, colors, etc of the fashion items. . The database already contains more than 500 millions of products (growing daily) and we process 1-2M new products every week.
The Company
Our motto is "We love data". And we love technology that deals with data because it enables us to do incredible things... things that are valuable for our customers and that can sustain a business.
StyleSage was established 8 years ago and our offices are in New York and Madrid. Madrid is the home for our core technical team, while NY hosts the business team. It's an open, diverse and inclusive team of very skilled and talented individuals that are happy to collaborate, share knowledge and enjoy building great software together. We are looking forward to welcoming additional members for this team.
What We Offer
- First and foremost: permanent contract and competitive salary.
- Teams are made of people, not resources.
- Open, diverse and inclusive environment.
- A challenging and fun project to work and grow with, with the latest technologies, best practices and light speed evolution, all in a friendly, relaxed and positive environment.
- Fixed yearly training budget to spend on english classes, courses, books, or conferences.
- Your brand new laptop with OS of your choice (we recommend MacOSX or any flavor of Linux).
- A team of colleagues that will share a lot of knowledge with you (we have weekly in-depth internal talks).
- Fully remote position. Our office in Madrid is in a co-working space quite close to Avenida de América metro station, always available for you if you want to meet your colleagues in person and enjoy the fruit/coffee/tea we stock in there.