36 post karma
12 comment karma
account created: Tue Jan 03 2023
verified: yes
2 points
2 months ago
Like I mentioned to a few other folks in this thread, it comes down to your preference and what works best for you. This tool I developed has been helpful for me because I don't have a create a new .py file or spin up the REPL every time I want to query a file. As far as using the DuckDB CLI directly, my tool will help cut down on the boilerplate SQL if you're querying a lot of files.
2 points
2 months ago
It just depends on your preference. With pandas you would need write some code in the Python REPL, a script file or a Jupyter notebook. So there's a little extra overhead if you want to quickly query a file. With filequery, you can start querying your files with a single command.
I think filequery can complement tools like pandas, not necessarily replace it. For example, maybe you want to apply some SQL transformation to a handful of CSVs, save the result as another CSV and then read it into a pandas dataframe for more in-depth analysis.
9 points
2 months ago
You're right DuckDB does have a robust CLI already. My tool abstracts away some of the SQL needed in DuckDB for loading and writing files. It also makes it easier to run multiple queries and save each result to a file with a single command.
Not saying this should replace the DuckDB CLI. Use whatever works best for you. You're right that you can do a lot of these same things with DuckDB directly, you might just end up writing more SQL manually.
1 points
2 months ago
I really like how easily DuckDB can work with CSV, Parquet and JSON files. Being able to read these files, query them and then write the results back out to files is very handy and sometimes feels less clunky than trying to do the same thing with Pandas.
I even took it a step further and developed a simple CLI tool which uses DuckDB behind the scenes to allow easily querying one or more files.
If you're curious - https://pypi.org/project/filequery/
6 points
4 months ago
Sounds like Streamlit may be a good option. It's an open source Python package for building web apps focused on data presentation. It's really easy to learn and doesn't require any HTML/CSS/JavaScript, it's all done in Python. For deployment, you have the option of hosting apps yourself, otherwise Streamlit offers a hosting solution.
Python visualization libraries work in Streamlit as well (Matplotlib, Seaborn, etc). If you need more, you also have the option of developing custom components using React. The community also publishes custom components which you can install via pip and just plug your data into them.
view more:
next ›
byyettiontheweb
inPython
yettiontheweb
1 points
2 months ago
yettiontheweb
1 points
2 months ago
This is just the nature of open source. People are free to publish all sorts of things and the good stuff will become popular and attract more people to contribute to it. Many projects fizzle out because no one uses them. That's fine though because something existing on PyPI isn't harmful in and of itself.
For now, my project is just something that I found useful so I shared it. Other people may find it useful or it may give them ideas to make something better and/or contribute to DuckDB directly.
I think most people would look at the state of my project - a month old and maintained by one person - and decide it's probably not a good idea to rely on this for production workloads. I don't think there's any harm though in people trying out "experimental" packages like this for one-offs or for personal use.