The End of the CSV Nightmare; a story of working efficiently

The engineer simply couldn’t take any more.

Throwing up her hands and pushing back from her IKEA desk with a clearly audible sigh of frustration, she stalked off to the coffee machine for the third (or was it fourth?) refill of her beloved Chewbacca mug.

“Hey, everything all right?” I asked.

“Column BG of this CSV file that I’m working on has this mysterious date value,” the engineer replied, with a worried look on her face. “The column header says ‘SysDate_42’ and the values are sometimes empty. I have no idea what it means!”

Although the above scene is fictional, it’s (unfortunately) a bit too close to home for Ellevation and many other companies working in the educational technology space. In other versions of this story, perhaps it’s tab-delimited instead of CSV, or “Battlestar Galatica” instead of “Star Wars,” but the main theme is all too common: devoting hours and hours to interpreting, mapping and transforming educational data of all formats.

At Ellevation, we live this tale every day. The success of our district partners and their EL students relies on our ability to effectively and correctly aggregate many disparate datasets that enable educators to quickly access important information. Our data challenges are both horizontal: we need to support the different use cases and business rules of over 300 partners in 30+ states, and we also require a comprehensive collection of student data including EL statuses, designations and milestones, assessment scores, extended demographic data, and much more.

Marry these challenges and we find ourselves trying to manage a complex matrix of data based on many different Student Information Systems and enough CSV files to choke an Imperial Star Destroyer. We’ve made significant investments to build a suite of internal data manipulation tools and frameworks that help with the transformation of these heterogeneous files. And these investments have paid off – we have learned how to do amazing work for our district partners. But scaling these resources to support so many different formats, and devoting the time to interpret and glean the business rules from unstructured data files, remains challenging.

Fortunately, we see a light on the horizon – the Ed-Fi Data Standard. The Ed-Fi Data Standard, aligned with the Common Education Data Standard (CEDS) also includes many resources for software vendors and educational institutions that will one day help to end the CSV nightmare. A few of the highlights:

The Ed-Fi Data Standard is continuously maintained and improved with input from the Ed-Fi community. Today marks a major milestone with release of next generation – the Ed-Fi Data Standard Version 2.0. We are very excited about the impact that these new features will have on our products that ultimately improve student achievement in the classroom.
Growing adoption of the Ed-Fi Data Standard means we can focus more time on building great tools and services that leverage this important data instead of wading our way through a flood of delimited data files. I know the same holds true for our peers in the Ed-Tech community, whether it is an LMS, IEP, assessment or online gradebook. Districts will continue to select “best-of-breed” online tools to complement their SIS and the ability of having these systems “all talk to each other” requires adoption of a standard data format.

It’s especially encouraging to see increasing support among SIS platforms that export data using the Ed-Fi Data Standard, saving districts from having to generate yet another specialized CSV export based on a custom query.
We strongly recommend that any district vetting a SIS platform make compliance with Ed-Fi technology a requirement, including support for exporting comprehensive educational data in this format (again, more than just student name, grade level and school). Standards-based XML, provided by Ed-Fi technology, is simply better suited to be able to correctly define the complex relationships in educational data. For more information on vetting an education technology vendor, take a look at these 10 questions.
Chances are your SIS vendor is already planning to implement some early version Ed-Fi technology, as many state education agencies, such as Texas’ TEA and their state-wide data system, begin to accept or even require compliance reporting data uploads to be in the Ed-Fi Data Standard format.

At Ellevation, we try to keep our evolving system architecture directionally correct, making use of as many effective and supported standards as possible. Realizing we can’t overhaul everything at once, we’ve started to carve out specific areas of our platform and refactor them to work with the Ed-Fi Data Standard. We’ve started a pilot program with a number of our partner districts in which we transition their existing CSV/tab-delimited/Excel-based automated uploads of student and staff data to use the Ed-Fi Data Standard instead.

So far, the results are promising — fewer custom CSV exports for our partners to implement, and instead of the time-consuming task of mapping and transforming these files, we’ve modified our tools to parse several interchanges from the Ed-Fi Data Standard. This re-use enables us to focus on what’s most important — the content — rather than the issues of non-structured format, such as chasing down the exact meaning of the values in Column BG.

Moreover, we continue to aggressively adopt Ed-Fi technology internally. Our standardized test schema mirrors assessments defined in the Ed-Fi Data Standard. And when it came time to refactor our student schedule and teacher/staff modules, we based our data model directly on the Ed-Fi Data Standard. This will greatly streamline our data integration efforts in the months to come.

To be clear: we’re still in the early stages of adoption. Continuing to maintain momentum across SIS vendors and educational technology vendors is critical. In the short term, it can be difficult to allocate resources to enable support for Ed-Fi technology instead of developing a shiny new feature. Providing data that adheres to the strict Ed-Fi Data Standard can be a challenge, especially for proprietary relational database models. Over time, these investments will allow vendors to get back to developing new features and focus less on custom installations.

In order for the standard to be most effective, both the data publishers and the subscribers have to be on board — your usual chicken and egg scenario. However, at Ellevation we continue to be quite optimistic about the potential of Ed-Fi technology and eagerly await the day when our coffee machine conversations revolve around Star Wars rather than CSV files.

Eric Wong, VP of Engineering, Ellevation
Eric Wong has over 15 years of experience in SaaS, data integration, enterprise and web application development. As Vice President of Engineering at Ellevation, he is responsible for the company’s technical and engineering operations. Prior to joining Ellevation, Eric was a software architect at NaviNet, the country’s largest real-time healthcare communications network. He has also held technical leadership and software engineering roles at several organizations, including at EF Education, FairMarket (acquired by eBay) and Monster.com. Eric holds a Bachelor’s of Science degree in mechanical engineering from Cornell University.