Centric Pricing™ (formerly StyleSage), is an AI driven competitive assortment benchmarking and market trend insights solution for fashion, beauty and home goods brands and retailers.
We are a key innovation partner for iconic and emerging brands across the world.
Our Platform is able to analyze the info of more than 1.000 retailers, processing data from more than 600.000 brands, tracking millions of products!
The Data Science team is responsible for enriching the data that our crawlers collect massively from fashion related websites with our own machine learning models. Our models add information to the existing products such as the categories (clothing, footwear, beauty…), genders, attributes, colors, bounding boxes, etc. The database already contains more than 500 millions of products (growing daily) and we process 1-2M new products every week.
To do that, you will use the latest and best open-source technologies out there. We code in Python (and we love it, you may want to come to the PyCon Spain conference with us!), using Keras as our main Deep Learning framework (although we are starting using Pytorch for certain projects) along with the other machine learning and computer vision libraries like scikit-learn or OpenCV.
In the engineering side, we use Django as our main framework for accessing the data. We are a cloud-native company, so our code runs in AWS. Our massive amount of data lives in PostgreSQL databases and we keep an eye on all this using observability tools like Grafana, Influxdb and Telegraf.
If you do not know a lot about some of those technologies, worry not, our engineers will be happy to support you while you are on your journey to becoming an expert in them.
Your Job
As a data scientist you will be responsible of ensuring that our current data science pipelines run smoothly over time with the best performance, as well as developing new machine learning pipelines and algorithms by:
- Creating datasets from our huge data lake of products and social media data, selecting the most relevant items for your use case and ensuring the data quality
- Hands-on train, deploy, productionize and operate Machine Learning models and pipelines at scale, including both batch and real-time use cases.
- Contribute to expanding and improving the infrastructure to support all stages of the machine learning model lifecycle, including feature engineering, feature store, model training, testing, monitoring, and deployment in a production environment.
- Proactively identify, and implement internal process improvements including automating manual work, optimizing data delivery, re-designing infrastructure for greater scalability.
- Stay up-to-date with the latest industry trends and technologies to ensure our ML capabilities remain competitive and cutting-edge.
- Onboard and enable Data Scientists with different levels of engineering expertise
Your Skills
- 3+ years of experience working as a software engineer.
- Bachelor’s degree in Computer Science, Engineering or related field
- 3+ years of experience as a production level Python developer and Deep learning frameworks: Tensorflow, Keras or Pytorch
- Machine learning and Python data libraries like scikit-learn, pandas or numpy
- Experience with Computer Vision
- Experience with SQL databases: preferably PostgreSQL
- Django ORM
- Linux shell command line.
- Version Control in a collaborative environment with Git
- Strong communication skills (written and oral) in English
Bonus Points
Additionally, it would be nice if you are familiar with:
- Image processing libraries like OpenCV or Pillow
- MLOps frameworks like MLFlow
- Docker
- NLP processing libraries such as Spacy or NLTK
- Asynchronous processes with RabbitMQ and Celery
- System monitoring with InfluxDB, Grafana
- Working knowledge of containers (Docker)
- Experience working with cloud based infrastructures (AWS, Azure…)
Centric Software provides equal employment opportunities to all qualified applicants without regard to race, sex, sexual orientation, gender identity, national origin, color, age, religion, protected veteran or disability status or genetic information.