Earlier this year I released RecipeSnap, a simple web app creating a digital interface for cookbooks and recipe cards to make them easier to use in the kitchen.
The initial version utilized a PyTorch model to finetune LayoutLMv3. While this worked well, finetuning the model introduced quite a bit of overhead for an app that has small volume. The model had to be created, maintained, hosted, monitored and improved. All of this was a bit much for a small side project.
In November 2023 OpenAI released the preview of their GPT4-V model that allows you to incorporate vision capabilities into your request. After some initial testing the gpt-4-vision-preview
model performed really well extracting information from images of recipes. In this update I've rewritten the app to replace the PyTorch model with GPT-4V
to reduce the maintenance overhead of this small app. So far I'm happy with these results. I was able to save costs by turning off my model server. I also don't need to invest a ton of time into retraining the model to improve the performance. In this small use case GPT-4V
performs well enough.
OpenAI also introduced JSON mode to force the model to generate strings that can be easily parsed into valid JSON objects. Since all of the responses are being parsed to JSON anyways to fill the UI enabling this has simplified the process of extracting the information from the model. For an added layer of validation I also defined Pydantic models to validate the JSON responses. This has made the app more robust and easier to maintain.
On the usability front some of the UI components were shuffled. I usually reference my laptop or iPad when I cook. Previously the app displayed the ingredients in a component above the instructions. While cooking I was doing a lot of scrolling. Hands get dirty while cooking so I wanted to minimize the amount of buttons that needed to be clicked. To fix this the ingredients and instructions are displayed side-by-side in separate columns. This makes it easier to cross reference.
Finally, I added some basic logging. I know the app doesn't get a lot of traffic, but I've been curious if anyone else out there is using the app. Having some basic logging in place will help me understand how the app is being used and if there are any issues. The logging is very basic. Streamlit doesn't have a mechanism to reliably track users, so there is not user specific info. Instead I'm generating a random session ID at start-up and tracking actions for the session. It's crude, but it's a starting point.
If you haven't go try RecipeSnap and let me know what you think! There are a few ideas I'm considering for next steps, but I'd love to hear what you think about the kinds of features you'd like to see!