Skip to content
This repository was archived by the owner on Jan 3, 2018. It is now read-only.
This repository was archived by the owner on Jan 3, 2018. It is now read-only.

Quick feedback on Git lesson on 'open science' #712

@cboettig

Description

@cboettig

Hi SWC,

Was just reading through the Open Science lesson under git and thought I would make a few notes to myself about things that could be improved. I can try and get around to a pull request on this, so this is somewhat of a to-do list of things I might tackle in the pull request. Other suggestions, background or push-back is (of course) welcome! It will help guide me before I make too many foolish edits.

This lesson appears to be about open source licensing and hosting, not about "Open Science" per se. (With the exception of the opening vignette that compares two workflows, highlighting all kinds of issues that will not actually be covered in this lesson.) Perhaps the title could be revised to reflect the focus on software licensing and hosting issues (which fits most naturally under the "git" section anyway). Consider:

  • changing the title
  • focusing the introductory vignette to highlight the value of open source licenses

Licensing

This section jumps in a bit too quickly for me. I'd suggest first

  • defining copyright,
  • stating that copyright exists independent of any license or claim about it,
  • and then making it clear that open source licensing is a process of waiving certain rights. This section must also first address the issue of who holds the copyright. For instance, in an academic setting, faculty and students typically own their copyrights, while staff researchers do not. These terms are set by the individual's contract, and they may limit what licenses they can do (for instance, many universities have restrictions about GPL v3 but not v2. The point is not to get into the weeds, but merely to signpost key issues.
  • The discussion of creative commons licenses is too detailed. (In my experience, students that see tables like the one shown try and memorize them, and may lose sight of the more important context). The section does not mention that creative commons licenses are not appropriate for code, which is the clear focus of the discussion. I would remove this table and emphasize that CC licenses should be considered for publications and other creative works (blog posts), not for data or code.
  • If it's desirable to discuss CC licenses, this should reflect the Budapest definition of Open Access, highlighting the fact that of the CC licenses, only CC-by meets that strict definition of open access publishing, though some publishers would like to define it otherwise.
  • Likewise, it is generally held that they are not appropriate for data. The CC0 declaration recommended for Data by the Panton principles and enforced by Dryad, CC0, is not mentioned at all, (instead the section mentions PD, which belies the fact that Creative Commons finds it not so simple to make a multi-national public domain declaration and has created a specific tool for that, with a lengthy text file, though it is not technically a license but a 'declaration').

Hosting

It's not entirely clear if this section is talking about distributing code or about more general work. Since we're not covering data repositories, preprint servers, etc in this section, I think it should be made more clear that the focus is on software hosting. I think this section could be much more concise and potentially more prescriptive: "While researchers frequently distribute code and software by hosting it on their own websites (either on a university or private server), hosting on a dedicated code repository has several advantages." and then briefly mention link rot as the main issue, but also versioning and issue/bug tracking tools available on the repositories you list. (see, e.g. JORS code repo requirements)

Conclusions

  • "People who incorporate GPL'd software into theirs must make theirs open" I feel like this would easily confuse a new user that if they use GPL software, they have to share the code openly online. The clause only impacts them if they want to redistribute their code, (e.g. as a binary).
  • I would edit the comment about the Creative Commons family of licenses to simply say: it does not apply to code (or data).
  • "Projects can be hosted on university servers, on personal domains, or on public forges" -> "projects hosted on public repositories are more likely to be still available in the future"
  • I would add a conclusion that: "open licenses are a way of choosing what rights you waive" and "one should always be clear who owns the copyright in the first place"

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions