ANN: Tabular Asa (dataframes for Racket) : Racket

submited by

Style Pass

2021-08-09 13:00:10

Tabular Asa is a column-oriented, efficient, immutable, dataframe implementation for Racket. It has support for: b-tree indexes (and scanning), generic sorting, joining (inner and outer), grouping, and aggregating. It can also read and write CSV and JSON (columns, records, and lines).

I plan on adding some more features in the near future, but it's at a good, stable place and thought others in the community might find it useful.

This kind of thing has to be very performant to be useful for data sets that are only a few GB in size. Is it pure Racket? Have you benchmarked against pandas?

Yes, this is 100% written in Racket. The purpose of that is many-fold, one of which is the next step in this endeavor: using it as an instructional template for people interested in learning how to implement DataFrames (read: a series of blog posts). For many programmers - especially young ones - libraries like Pandas and Spark are black magic. Revealing what's behind the curtain and why certain decisions are made (algorithmic, cache usage, better parallelization, etc.) is very beneficial to them down the road.

I use Pandas, R, and Spark every day at work; I agree that performance is critical for packages like this. Being pure Racket, I wouldn't expect it to compete in the performance department. But - for Racket - it's not bad. On average it's roughly 3-6x slower than Pandas currently (some ops are 6x slower, others are only 2x). There's plenty of improvements to be made when it comes to parallelizing many of the operations.

Europe’s first billion-dollar education start-up is a tutoring site backed by SoftBank, Tencent

Comment

ANN: Tabular Asa (dataframes for Racket) : Racket

Leave a Comment

Related Posts

Europe’s first billion-dollar education start-up is a tutoring site backed by SoftBank, Tencent

Recent Posts

Search code, repositories, users, issues, pull requests...

Allstate indicates resuming new California policies amid insurance crisis

Search code, repositories, users, issues, pull requests...

Do you have a digital or social media will? Who will maintain your life online when you're dead? - Scott Hanselman's Blog

One Login: Towards a Single Fediverse Identity on ActivityPub

Increasing EV Powertrain Efficiency Without Rare-Earth Materials

From disaster zone to living laboratory: Chernobyl provides test bed for UGA researchers

Russia’s Ideological Construction in the Context of the War in Ukraine Russie.Eurasie.Reports, No. 46, Ifri, March 2024

Fear Factory’s Demanufacture: the future-metal classic that rewired the 90s

Baltimore HS athletic director used AI to make fake clip of principal spouting racist rhetoric: police

GraphQL Growth Explodes but so Do Problems Federated Graphs Solve

E-Scooters: First Thoughts - Mr. Max Mautner

Camus, Albert and the Anarchists

Tech brands are forcing AI into your gadgets—whether you asked for it or not

Tesla’s 2 million car Autopilot recall is now under federal scrutiny

Lost opportunity: We could’ve started fighting climate change in 1971

Hidden Gems of Tailwind CSS

'Grading for Equity': Promoting Students by Banning Grades of Zero and Leaving No Class Cut-Ups Behind

Cybersecurity firm Darktrace agrees $5.3bn sale to US private equity business

Inside the Brutal Business Practices of Amazon—And How It Became “Too Toxic to Touch”