- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 3k
 
Description
Is your suggestion for improvement related to a problem? Please describe.
Currently, JabRef struggles with libraries that have over 1000 entries (#10209).
Short reason and solution: JabRef stores all information in RAM. JabRef needs a mechanism to manage lots of data. This is a perfect use case for databases!
Longer issue description: look at how JabRef manages libraries and entries:
- Load 
.bibfile. - Convert 
.bibfile intoBibDatabase(withBibDatabaseContext) andBibEntry. Those are Java objects that are stored in RAM. - Manipulate library with those objects.
 - Save those objects into a 
.bibfile. 
So, JabRef's original philosophy is to be a file editor. However, when you have a giant library, you just don't have enough JVM heap. It is limited.
Describe the solution you'd like
JabRef should have a mechanism for managing a lot of data and use it for storing and manipulating libraries.
This is the purpose of databases! A DBMS will also cache data: a typical DBMS stores data in pages. Some pages are stored in RAM, some are offloaded to disk. This is a perfect solution for giant libraries, as now you are not limited to RAM space, but to space on your HDD/SDD!
Moreover, DBMS allows you to query data fast and powerful. Here is one place where SQL can be used: #10209 (comment). Search functionality is also a perfect case for databases.
Additional context
This is planned as a GSoC project. Beware, while this project is quite important for JabRef, it might turn out to be very complex.
We aim for a Relational DBMS like SQLite, DuckDB, Postgres. Especially, we want a database to be embedded.
In fact, we want Postgres to be our backend, as Postgres has powerful capabilities for search. It can be used as an embedded database, actually; checkout this library: https://github.com/zonkyio/embedded-postgres.
Here are some materials for this project:
- Postgres: https://www.postgresql.org/.
 - Other databases you might consider (though, Postgres is preferable):
- DuckDB: https://duckdb.org/ -- seems promising too. Can do JNI and thus could save a process: https://github.com/duckdb/duckdb-java/blob/main/src/jni/duckdb_java.cpp#L25
 - SQLite: https://www.sqlite.org/.
 - H2: https://www.h2database.com/html/main.html.
 - HSQLDB: https://hsqldb.org/.
 
 - BibTeX and BibLaTeX (you can use this information to design the schema of the DB):
- Internals of BibTeX: https://polish-mirror.evolution-host.com/ctan/biblio/bibtex/base/btxdoc.pdf.
 - Internals of BibLaTeX: https://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf.
 
 - How Zotero internally stores data: https://github.com/zotero/zotero/blob/main/resource/schema/userdata.sql.
 - Use Postgres as an embedded database: https://github.com/zonkyio/embedded-postgres.
 - Take a look at JabRef's code:
- Search functionality: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/model/search/PostgreConstants.java#L6. (It already uses embedded Postgres).
 - Shared database: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/shared/PostgreSQLProcessor.java (schemas, etc.)
 
 
Metadata
Metadata
Assignees
Labels
Type
Projects
Status