Installation

xlsxtream is a module for reading tabulated data from .xlsx (and, in general, OOXML) workbooks using node.js streams API.

Installation

npm install xlsxtream

Usage

const xlsx = require ('xlsxtream')

const wb = await xlsx.open ('/path/to/book.xlsx')

const ws = wb.sheetByName.Sheet1
// const ws = workbook.sheets [0] // WARNING: physical order

// just measuring 
const lastRow = await ws.getLastRow ()
if (lastRow.index > 10000) throw Error ('Size limit exceeded')

// extracting text, easy
const stringArrays = await ws.getObjectStream ()
for await (const [A, B, C] of stringArrays) {
  console.log (`A=${A}, B=${B}, C=${C}`)
  console.log (` (read ${worksheet.position} of ${worksheet.size} bytes)`)
}

// advanced processing
const rows = await ws.getRowStream ()
for await (const row of rows) {    // see https://github.com/do-/xlsxtream/wiki/Row
//console.log (`Logical #${row.index}, physical #${row.num}`)
  for (const cell of row.cells) {  // see https://github.com/do-/xlsxtream/wiki/Cell
    if (cell === null) {
      // empty cell, missing from XML
    }
    else {
      // process cell.index, cell.valueOf () etc.
    }
  }
}

Rationale

Though exceljs is the standard de facto for working with .xlsx files in the node.js ecosystem, it appears to be a bit too resource greedy in some applications. Specifically, when it takes to perform a simple data extraction (like converting to CSV) from a huge workbook (i. e. millions of rows).

The original exceljs API is synchronous and presumes the complete data tree to be loaded in memory, which is totally inappropriate for such tasks. Since some point in time, the Streaming I/O is implemented, but, as of writing this line, it still operates on a vast amount of data that may not be used at all.

The present module, xlsxtream takes the opposite approach: it uses the streams API from the ground up, focuses on keeping the memory footprint as little as possible and letting the developer avoid every single operation not required by the task to perform. To this end, xlsxtream:

reads individual files from ZIP archives with node-stream-zip;
transforms them into filtered streams of XML nodes with xml-toolkit;
provides lazy iterators to access just the necessary cells' data.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
__data__		__data__
__tests__		__tests__
badges		badges
lib		lib
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Usage

Rationale

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

do-/xlsxtream

Folders and files

Latest commit

History

Repository files navigation

Installation

Usage

Rationale

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages