Welcome to the Greene Laboratory

Onboarding Information

Mission Statement

We view our core purpose as the development of methodological advances and integrative systems that make analysis of big data, particularly gene expression data, as routine in wet-bench biology labs as PCR. To accomplish this, we will write good code, perform solid and reproducible analyses, and disseminate our results widely through approachable publications and webservers. We recognize that trust, both in the process and in our results, is of primary importance to the biologists that use our methods and webservers. Therefore, we strive to make our source code as open and accessible as possible. When we submit papers, we expect that the analytical code behind those papers will be something that we can be proud of. To these ends, we will provide reviewers and the scientific community with all source code required to generate figures in the paper that result from computational analyses.

Expectations

Your role: We expect that you will take primary responsibility for the success of your research project and career development. As a member of the lab, you are expected to participate fully in the team. When disagreements about methodological approaches arise, you recognize that these should be resolved through a solid and reproducible analysis of available data. In general, lab members are expected to be present from 9:30AM to 4:00PM on weekdays to facilitate discussion within the group. If you aren’t sure — ask.

Casey’s role: Casey’s goal is to facilitate your success as well as that of your project. Within your project, Casey will serve as a sounding board for ideas, will help you plan your project, and will help to devise experiments to test your hypotheses. To facilitate your success, Casey will help you to plan your training, to devise a career plan that can take you to where you want to go, to advise you on your project-risk portfolio, and to provide guidance on other elements of career and project development as needed.

Deadlines: Our lab has worked hard to develop a reputation for high-quality science that is well presented. We all benefit from this reputation, but we must also work to maintain it. In order to maintain the quality of our lab’s output, we’ve established deadlines for various outputs. Each of these applies to sharing a complete version that the author deems ready for submission in the Greene Lab slack’s #general channel.

The specific deadlines for various types of outputs are:

  • Manuscripts must be shared two weeks before any deadlines.
  • Posters must be shared one week before the deadline for printing.
  • Scientific talks based on a submitted abstract must be shared one week before the presentation.
  • Meeting abstracts must be shared one week before the deadline for submission.

Lab members are given a two-day period to provide feedback on the document. We expect that authors will then revise the document to incoporate feedback provided within the initial two-day period. Authors are encouraged but not required to address feedback received after the initial two days, as it may not always be practicable.

This does not eliminate the need for all coauthors to approve a document. Coauthors are not required to provide their feedback within the two-day window. Coauthors can hold submission of a document until they approve; however, according to ICMJE guidelines extreme holds may result in a change from authorship to acknowledgement.

In the case that all feedback received within the two-day period has been addressed and all coauthors approve, the submission can proceed.

Failure to abide by these guidelines will result in missing whatever the opportunity in question is.

Code of Conduct: All members of the lab, along with visitors, are expected to agree with this code of conduct. We will enforce this code. We expect cooperation from all members to help ensuring a safe environment for everybody. The lab is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of lab members in any form. Sexual language and imagery is generally not appropriate for any lab venue, including lab meetings, presentations, or discussions. However, do note that we work on biological matters so work-related discussions of e.g. animal reproduction are appropriate. Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Members asked to stop any harassing behavior are expected to comply immediately.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact Casey Greene immediately. If Casey is the cause of your concern, Dr. Deborah Hogan (Deborah.A.Hogan@dartmouth.edu) is a good informal point of contact; she does not work for Casey and has agreed to mediate. For official concerns, please see the University of Pennsylvania ombuds office. The code of conduct section is licensed under a Creative Commons Attribution 3.0 Unported License. http://2012.jsconf.us/#/about & The Ada Initiative. Please help by translating or improving: http://github.com/leftlogic/confcodeofconduct.com.

We expect members to follow these guidelines at any lab-related event.

Authorship: Our lab follows the Perelman School of Medicine Authorship Policy. These guidelines are derived from ICMJE’s Uniform Requirements for Manuscripts Submitted to Biomedical Journals.

Ethics: We expect lab members to be honest in scientific communications both within and outside the lab. We expect that lab members will design experiments in a manner that minimizes both bias and self deception. We expect that lab members will keep agreements, be careful, and share their code and results openly with the scientific community. We expect that credit will be given where credit is due, including in scientific writing. Plagiarism is not tolerated. While a full enumeration of ethical considerations is outside of the scope of this document, Penn provides a handbook that we recommend. In addition, please don’t hesitate to raise any questions or concerns that you have at any point with Casey.

Communication

General

Slack: We use slack for rapid communication within the lab. If you’d send an e-mail to someone within the lab, try a slack message instead. This helps to keep communications in one place, and Casey commits to respond to slacks (not necessarily immediately, but the same guarantee is not made for e-mail).

Bonus.ly: We recognize that people regularly go above and beyond lab expectations. We wanted a way to recognize each other when this happens. We now use bonus.ly. This allows lab members to send a quick virtual thank you note and/or pat on the back. If someone’s paper gets accepted or someone helps you out with a programming question, congratulate or thank them. Slack includes /give syntax that you should explore (or /give someone a point for helping you). When one member accumulates enough bonus.ly points, they take the lab out to lunch (Casey pays).

Social Media: Lab members are encouraged to communicate through public social media, and if you choose to do so then you are expected to follow our code of conduct.

Projects: By the nature of our research, lab members will often have the opportunity to participate projects managed via private or publicly accessible source code repositires. In these cases, lab members are expected to: follow the code of conduct; expect that private repositories will be world accessible; and to communicate via the project-specific medium (e.g. if Rene reported an issue on a project on bitbucket, it would not be appropriate for Casey to reply “I’ll drop by your desk and show you how to solve that.”).

IP/Openness: This is handled in accordance with the instructions from our research sponsors and university guidance. Lab members must follow the Penn Participation Agreement and the agreements with our sponsors. These often allow, encourage, or require openness. If you have concerns at any point, set up a meeting with Casey to discuss these concerns.

Shared Calendar: There is a shared google calendar for members of the Greene lab. This has the time and location of group meeting, and is considered the most up to date information about individual availability. If you will be out of town for work or vacation, note this in the calendar.

Accounts Lab members are expected to have accounts for the following and be members of the specified (organizations) if applicable:

  • BitBucket (GreeneLab)
  • GitHub (greenelab)
  • Google Calender (Shared Calendar)
  • Slack (GreeneLab)
  • Bonus.ly (GreeneLab)
  • Dropbox (permanent members)

Meetings

Scrum: The scrum is a 10 minute or less meeting that is held every day. It is currently scheduled for 10:00 AM. The goal of the scrum is to communicate recent progress and objectives. The scrum is held both in person and via a google hangouts link posted to the Slack #general channel. Those who work partial schedules (part-time employees, undergraduate students) are only expected to scrum on days that they work. In each scrum, every lab member provides a short summary of:

  1. What specific item(s) he/she accomplished yesterday.
  2. What specific item(s) he/she plans to accomplish today.
  3. Who, if anyone, is blocking him/her?
  4. Who, if anyone, is he/she blocking?

Lab Meeting: Lab meeting is held weekly at a location at Penn and also via the google hangouts link used for scrum. Scheduling is managed via a google spreadsheet. See the #general slack channel’s pinned items link. Lab meeting consists of three components described below.

  • Journal Club
  • Braintrust
  • Applied Imagination

Journal Club: We have a 15 minute journal club to start each lab meeting. For journal club, prepare a presentation of 4 papers. All except for one should have been published since your last journal club presentation. The content you discuss - specifically your summary of the papers - should be the product of thoughtful analysis. The presentation itself should be simple. During the discussion, please share why you picked the paper, its implication for your research, and any potential implications that touch on other research that is ongoing in the lab. For each paper, the presentation should consist of:

  1. A title slide
  2. An overview slide (usually a flow-chart of some sort from the paper, could also be an initial result that sets context).
  3. The results figure that convinced you to pick this paper.

Braintrust: This is an opportunity to share anything that you wish to talk about with the group. This could be a confounding result, an interesting result, an analysis that isn’t working, a demo of a cool new technology etc. This is your chance to have the group focus on and help you solve a challenge that you’re facing or to share something interesting that you’ve discovered with the group. Scheduling is voluntary, but each member of the lab is expected to share at least once every three months.

Applied Imagination: One hour per month, lab meetings will be dedicated to big ideas, brainstorming, extended discussion outside the scope of weekly lab meeting, and other team endeavors. Topics can be big questions like “How do we get rid of dark pools of gene expression data?” or the time can be used to discuss new methods and how they fit in with the lab mission (e.g., adversarial networks). Individual lab members are expected to do some brief preparation before the meeting (e.g., read provided papers/materials, come with a few ideas on the topic). The monthly meeting itself consists of group brainstorming and/or discussion and wraps up with a list of action items for follow up.

Individual Meetings: We schedule weekly individual meetings. Once you join the lab, contact Casey to set up a time. These are set up for a term to accommodate class schedules. We don’t reschedule these meetings by default if one of the parties (Casey or you) are out of town, so if you do want to meet in a week but travel conflicts, contact Casey to reschedule. The goal of the weekly meeting is to:

  1. Discuss challenges.
  2. Plan strategy (project related, personal career, etc).

Source Code, Data, and Reproducibility

Pride: We expect lab members to sign their code. To quote from The Pragmatic Programmer, “Craftsmen of an earlier age were proud to sign their work. You should be, too... People should see your name on a piece of code and expect it to be solid, well written, tested, and documented.” While some code will be proof-of-concept code, it should be of a form that inspires confidence.

Language: We write code for our analyses in Python or R, which allows everyone in the lab to know two languages and understand analytical code. Code for visualization can be Python, R, or javascript. Webserver interface code uses javascript.

Licensing: We expect code that we produce to be licensed under a 3-clause BSD license. Unless a funding agency requires something different, we’ll use this. If you have questions or concerns about licensing, feel free to raise them in Slack.

Version Control Services: We have Greenelab accounts on both bitbucket and github. We expect that lab members will maintain their code in repositories under these team accounts. We do not want lab members to commit directly to these though. Instead commits happen as described below. We will only publish using code that is held in a Greenelab repository that has gone through the review process described below.

Creating a Greenelab Repository:

  1. Create a repository under the team accout.
  2. Immediately fork this repository into one that your user account owns.
  3. Make commits to your own repository, and move code back to the Greenelab repository as described below.

Getting Code into Greenelab Repositories: Code moves from user repositories to Greenelab repositories through a process of code review. Code review is handled through pull requests. The process is described briefly below. Feel free to ask for guidance if you are uncomfortable with the process. We will revoke write access for failing to adhere to these rules.

  1. Make changes to your code and commit them in your own repository first.
  2. Create a pull request into the repository owned by Greenelab.
  3. Name potential reviewers for your pull request.
  4. Once at least one lab member has approved your pull request, you or a reviewer may merge your pull request. The only exception to this policy is this repository (“onboarding”) where, in addition to the above rules, Casey must also approve the pull request.

Composition of Pull Requests: Each pull request may contain one or more changesets. In keeping with good source control practice, each changeset or commit should contain all changes necessary for a particular fix or update. In addition, each pull request should relate to no more than one functional area in the code base you are updating. Keeping the pull request focused to one area makes it easier for your reviewers to provide thoughtful feedback.

Reviewing Pull Requests: We expect that all lab members will participate in review of pull requests. If you get named by the submitter, it’s courteous to review the request. We have created a checklist to facilitate review. As a reviewer, you are responsible for making sure that all checklist guidelines are followed.

Projects that didn’t work: We expect that repositories will contain failures (e.g. proof-of-concepts that didn’t work). This is ideal. Being able to find them will make sure we don’t make the same failure twice.

Non-Code Versioning: Non-code documents should be kept in a place that maintains version history (e.g. dropbox for word documents). We maintain a dropbox for business account for these purposes.

Data Management: For publicly available data, scripts used to download and process these data should be preserved, as should the versions of items used in processing (e.g. probe to gene mappings). These items should be version controlled. Where possible, intermediate files of reasonable size can be stored to facilitate re-use, but the process to regenerate these files from publicly available data should be preserved. When we generate data, they should be stored in a location where they are replicated and uploaded to the relevant database as soon as possible (e.g. GEO for gene expression, SRA for sequencing).

Reproducibility: We expect all lab members to maintain code that performs reproducible analyses. This can be in the form of makefiles, shell scripts, or other automation approaches that allow analyses to be automatically performed. We expect that these scripts, including those to generate figures in papers generated as a consequence of such analyses, will be included in source control repositories (see “Getting Code into Greenelab Repositories) and made publicly available before or concurrent with the submission of preprint (if submitted) or manuscripts. Combined with the review guidelines, this means that all code must have been reviewed for these documents to be submitted.

How to Modify this Document

This is a living document. The repository is at Bitbucket. To make changes, fork, edit the files you wish, and create a pull request. The pull request process is handled as described in the Getting Code into Greenelab Repositories section of “Source Code, Data, and Reproducibility.”

Additional Resources

Development Tutorials

The Tribe and ADAGE web servers make use of the following software tools and frameworks. We have made note of documentation and tutorials found to be helpful. Please submit a pull request if you have additional resources that should be listed here!

Code Review Checklist

Pride: We expect lab members to sign their code. The code is signed.

Licensing: A LICENSE file is in the root of the repository.

Using Other Code: Code taken from elsewhere is properly acknowledged and compatible with the license.

Style Guide: Python code follows PEP 8. R code follows Google’s R Style Guide. JavaScript code follows Google’s JavaScript Style Guide. HTML and CSS follow Google’s HTML/CSS Style Guide. We expect that each person runs a linter (if you’re not sure – ask!) as part of their development environment.

Variable and Function Names: Variable names are descriptive and interpretable to someone looking at this code for the first time (e.g. not “a”, “b”, “x”, etc.).

File Commenting: Each file has a comment at the top to broadly describe its function and how it is expected to be used (e.g. imported, run from command line, both).

Function Comments: Each function has a docstring which reports the computation that it intends to implement, its arguments, and its return value(s).

In-line Commenting: At least 2 spaces are placed between in-line comments (#) and source code.

Imports: All trivial imports are at the top of the file.

Column Length: Lines are 80 characters or fewer. This applies to all text under revision control with the exception of data files that must adhere to a particular file format that does not allow for line “folding” where necessary. This rule is already covered well in PEP 8 but called out here to clarify that we apply it to more than Python code. One reason for this is to aid in readability of diff output when performing code reviews.

Repositories may choose to specify a line limit up to 100 characters instead. If they choose to do so they must specify it within the README of the repository.

Whitespace: There is no unnecessary whitespace.

Code with constants Any constants are specified at the beginning of the file.

Code that uses a random seed [special case of constants] Code that uses a random seed is reproducible. This means that the seed can be set and a default value is specified.

API error handling APIs should catch and handle anticipated errors (e.g. key doesn’t exist, type mismatch in lookup) by identifying the source of the error (e.g. lookup failed with PK=XYZ) to the caller with as much precision as possible.

Deployment Checklist

  • If this deployment fixes a bug, a unit test has been written to check for regressions.
  • Unit tests have passed.
  • Documentation has been built and is ready for ReadTheDocs or an external provider.
  • All steps for deployment are written down or, ideally, fully automated.
  • All source files are under version control as described in “Source Code, Data, and Reproducibility.”
  • Deployment does not assume that code will be deployed from a default or master code branch. Ideally, the automated deployment scripts will accept a branch or specific revision as a parameter.

Infrastructure Guide

Compute Resources

Lab members are provided with a desktop. For rotation and undergraduate students, these machines may be shared. For full time lab members, the operating system should be reinstalled when the member joins. Large-scale computation and web hosting are performed via commercial cloud providers. Please discuss your project’s computing needs with Casey to best optimize resource usage.

Accounts

We have accounts with a number of key services. When a lab member joins, accounts should be created in:

In addition, lab members should have accounts with bitbucket and github. These accounts should get added to our:

Purchasing

There are two different purchasing procedures depending on whether or not the vendor is an approved university vendor. For approved vendors (check the Penn Marketplace), you need to fill out a pharmacology requisition form, and send it to phorders@mail.med.upenn.edu (Casey should be CCed on the email).

For other suppliers (such as Amazon.com) where a credit card purchase is required, you will need to:

  1. Fill out a paper copy of p-card order form.
  2. Get Casey to sign the Your Signature area of the form, then hand it to Camie Minieri or Roz Rucker.
  3. Instead of listing every item on the form, you can write “See attached below” and the total sum only in the form. If you do so, please also send the details of your order (such as Amazon URLs) to Roz (rucker@upenn.edu), who will compile a detailed order form.

Reimbursement

Reimbursement is done using the Concur Business Travel & Expense Management Software Solution:

  • First login to Concur Expense using your PennKey. Under Profile Settings > Personal Information, fill in required information. Set the default Travel Approver to Jason Molli.
  • Ask Carmela (Camie) Minieri to make you eligible for expense reports on Concur, which takes at least a day for processing. You should then be able to submit expenses for reimbursement.

There is mobile Concur app, which is useful for taking pictures of receipts. To setup the app, login to concur expense online and go to Profile Settings > Mobile Registration. You should see your username. Then click Create a mobile PIN, which will allow you to specify a password that you can use to login from the mobile app.

Tips for Newcomers to Penn and Philadelphia

1. Penn payroll system:

Penn offers two options for getting paid:

  • Direct deposit
  • ADP Aline Card

Most lab members use direct deposit to their US bank account. You can configure direct deposit here.

However, Penn can take over a month to activate direct deposit after you provide your bank account information. If your direct deposit isn’t active by time payments are scheduled, you will be paid via an ADP Aline Card. Check U@Penn My Pay to see whether you were paid to your bank account or ADP.

The ADP Aline Card is a sort of enhanced debit card. For your first ADP payment, the physical card is mailed to your postal address on record. You can call ADP Account Services at 877-237-4321 to inquire whether your card is in transit.

If you receive an Aline Card, you can activate it and create an account online. From the website, you can then transfer the full debit card balance to a bank account. You can also withdraw cash from an automated teller.

Special considerations for Postdocs

Postdocs at Penn have unique tax considerations. Specifically, some postdocs may not have Federal taxes withheld from their paycheck even though they are responsible for these taxes. See the Biomedical Postdoctoral Council Tax Issues Page for more information.

2. Find a place to live:

  • Penn off-Campus Housing Search: https://offcampushousing.upenn.edu/
  • PadMapper: https://www.padmapper.com/ A smartphone app and website that combines listings from all of the major real estate databases and shows them on a map. Allows you to zoom in on specific neighborhoods, and refine results by date available, #bedrooms, price, etc.
  • Facebook Group: https://www.facebook.com/groups/453686588142698/ Often more useful than craigslist for newcomers who are looking for roommates as well a place to live. (Not subsidized housing; mostly graduate students and young professionals looking to fill extra rooms or sublet entire apartments).

4. Public transportation:

  • Septa Bus Routes 30, 40, and 42 all have stops near Smilow.
  • LUCY Buses also stop near Smilow and are free to those with a Penn ID (details).
  • The Market-Frankford Line, 34th & Market St. station is ~0.6 mi walk to/from Smilow.
  • Paying for Septa: $2.25/ride cash, 1 token/ride (1 token = $1.80, sold in packs of 2 or more), monthly and weekly TransPasses and TrailPasses are available for purchase or for loading onto a Septa Key. The Penn Bookstore sells tokens and passes. Token vending machines can be found on campus in the basement of Houston Hall (on Spruce St, across from HUP) and the food court on 34th Walnut St (next to Starbucks).
  • Penn Commuter Programs: Buying tokens, TransPasses and TrailPasses with discount.