Use a database as a backend for JabRef library management

**Is your suggestion for improvement related to a problem? Please describe.**

Currently, JabRef struggles with libraries that have over 1000 entries (https://github.com/JabRef/jabref/issues/10209).

Short reason and solution: JabRef stores all information in RAM. JabRef needs a mechanism to manage lots of data. This is a perfect use case for databases!

Longer issue description: look at how JabRef manages libraries and entries:

1. Load `.bib` file.
2. Convert `.bib` file into `BibDatabase` (with `BibDatabaseContext`) and `BibEntry`. Those are Java objects that are stored in RAM.
3. Manipulate library with those objects.
4. Save those objects into a `.bib` file.

So, JabRef's original philosophy is to be **a file editor**. However, when you have a giant library, you just don't have enough JVM heap. It is limited.

**Describe the solution you'd like**

JabRef should have a mechanism for managing a lot of data and use it for storing and manipulating libraries.

This is the purpose of databases! A DBMS will also cache data: a typical DBMS stores data in pages. Some pages are stored in RAM, some are offloaded to disk. This is a perfect solution for giant libraries, as now you are not limited to RAM space, but to space on your HDD/SDD! 

Moreover, DBMS allows you to query data fast and powerful. Here is one place where SQL can be used: <https://github.com/JabRef/jabref/issues/10209#issuecomment-2376534212>. Search functionality is also a perfect case for databases.

**Additional context**

This is planned as a GSoC project. Beware, while this project is quite important for JabRef, it might turn out to be very complex.

We aim for a Relational DBMS like SQLite, DuckDB, Postgres. Especially, we want a database to be embedded.

In fact, we **want Postgres to be our backend**, as Postgres has powerful capabilities for search. It can be used as an embedded database, actually; checkout this library: <https://github.com/zonkyio/embedded-postgres>.

Here are some materials for this project:
- Postgres: <https://www.postgresql.org/>.
- Other databases you might consider (though, Postgres is preferable):
    - DuckDB: <https://duckdb.org/> -- seems promising too. Can do JNI and thus could save a process: https://github.com/duckdb/duckdb-java/blob/main/src/jni/duckdb_java.cpp#L25
    - SQLite: <https://www.sqlite.org/>.
    - H2: <https://www.h2database.com/html/main.html>.
    - HSQLDB: <https://hsqldb.org/>.
- BibTeX and BibLaTeX (you can use this information to design the schema of the DB):
    - Internals of BibTeX: <https://polish-mirror.evolution-host.com/ctan/biblio/bibtex/base/btxdoc.pdf>.
    - Internals of BibLaTeX: <https://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf>.
- How Zotero internally stores data: <https://github.com/zotero/zotero/blob/main/resource/schema/userdata.sql>.
- Use Postgres as an embedded database: <https://github.com/zonkyio/embedded-postgres>.
- Take a look at JabRef's code:
    - Search functionality: <https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/model/search/PostgreConstants.java#L6>. (It already uses embedded Postgres).
    - Shared database: <https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/shared/PostgreSQLProcessor.java> (schemas, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use a database as a backend for JabRef library management #12708

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Use a database as a backend for JabRef library management #12708

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions