Welcome to the lab! Here are some information, resources and tips to help you get started.

Tag it in the Git repository.
Write release notes on GitHub.
Create a develop branch. After the first release, the master branch will always contain the latest release; develop will contain non-released commits.

See examples in /bin.

Lab cluster

The lab owns a compute cluster, primarily aimed for reproducible performance measurements in a controlled environment. The cluster is currently administrated by Valérie and Tristan.

How to get access

To get access, post your ssh public key and desired username to Slack channel #cluster.

The login node is rs-loy-slashbin.concordia.ca, a.k.a ct01. ssh access is available from the Concordia wireless and wired networks.
An example configuration to ssh into the cluster with a proxy through Concordia network is depicted below. You can copy it into your ~/.ssh/config file.

# ~/.ssh/config

Host slashbin
    Hostname rs-loy-slashbin.concordia.ca
    User CLUSTER_USERNAME  # TODO Change to match yours
    ProxyCommand ssh -q -W %h:%p encs
    IdentityFile ~/.ssh/slashbin  # TODO Change to match yours

Host encs
    Hostname login.encs.concordia.ca
    DynamicForward 10101
    User ENCS_USERNAME  # TODO Change to match yours
    IdentityFile ~/.ssh/encs  # TODO Change to match yours

Compute nodes are accessed through the sbatch and salloc commands. Read the man pages if you never used them.
Compute nodes have no internet access: only the login node can access hosts outside of the cluster.

Where to put data

Your home directory, located under /home, is mounted on the compute nodes. It is of limited capacity and should primarily be used to store config files, programs or small data files.
Compute nodes have 6 local disks of size 450GB, mounted as /disk[0-5]. You can use them as you wish during your SLURM allocation, but data may be cleaned up once your allocation expires.
A shared (Lustre) file system of higher capacity is being configured.
No back up is or will be configured. Disk failure resulting in data loss may happen at any time. Make sure your important files are saved elsewhere.

DONTs

DONT ssh directly from the login node to the compute nodes. Always use SLURM to make a reservation.
DONT make unreasonable reservations before discussing them on #cluster. An unreasonable reservation is longer than a week or requests more than 1 entire node.
DONT run compute-intensive jobs on the login node.

Gallery

Overview (back)
From top to bottom:

2 network switches: this will allow us to dedicate a network for an experiment while still allowing other users to use the cluster.
8 compute nodes: each with 6 SSDs, 32 cores and 256GB of RAM.
1 control node: login node with external network access.
1 control node: Lustre metadata server.
4 storage nodes: each with 12 HDDs and 2 SSDs.

Compute nodes (front):
compute-back

Control and storage nodes (front):
storage-back

Compute Canada

Having a Compute Canada (CCDB) account gives you access to storage and computing resources on Compute Canada, in particular to our compute and storage allocation on beluga.computecanada.ca. Compute Canada is our primary platform for data processing.

To create a Compute Canada account:

Register here
Review and accept the Compute Canada Acceptable Use Policy (AUP)
Enter your user information. Use Tristan's CCRI: bwf-484-02
Submit your application
After 2-3 days, confirm the Group Member's Application
You can find more details here
Once your Compute Canada account is active (see procedure above), you can request for a cloud account with this form. You may be contacted via email to ask for your PI. In that case, reply to the email, indicate that Tristan (CCRI: bwf-484-02) is your PI and asking for access to his allocation. Keep Tristan in cc of this email.

How to submit a PySpark job?

Documentation is available here

Printing

For all work-related printing needs, there's a printer available in EV 8.401
It is possible to connect through this printer through USB (in the lab) or by accessing the UI using the following IP or hostname through a web browser connected to the internet using a wired Concordia connection:

Hostname: pr-tidal.encs.concordia.ca
IP address: 132.205.98.160

Scientific methodology

Writing

Adopt a writing schedule as soon as possible and comply to it. Suggestion to start: 4 hours per week.
Suggestions on what to write:

Write your own summary any time you read a paper.
Write a few paragraphs on your current work or ideas.
Outline your next paper or thesis.

Create detailed outlines of important documents (papers, theses) as early as possible.
Tools

Use Latex by default. Use Google Docs when heavy collaboration is expected (e.g., brainstorming document). Here is a Latex template for a Concordia Master's thesis.
Create a Git repository for papers and theses, containing:

The Latex/Bibtex source.
Any script (matplotlib strongly recommended) and data required to reproduce Figures. You might loose a few hours cleaning up your scripts but it will save you days when you need to update your manuscript.
See example here.
Push the Git repository on GitHub and encourage collaborators to fork/PRs (see Code section).

General recommendations

Create vectorial figures using a vectorial format (pdf, svg, ps) rather than a bitmap one (png, jpeg).
Create a single script to generate all the figures in the paper. This script shouldn't have any parameter. In this situation, it is ok to hard-code file paths relative to the root of the Git repo. Don't use absolute paths, they will work only on your computer.
Don't include figures in the Git repo, as it would rapidly make it bulky. Instead, write clear instructions on how to generate them.
Don't include (too much) binary data in the Git repo. If your scripts require binary data, put it on Zenodo and use Zenodo's permanent link in your scripts. Don't use your personal web/ftp server, Dropbox or Google Drive.

Useful books and references about writing:

How to Write a Lot, Paul J. Silvia.
The Elements of Style, Wiliam Strunk Jr and E.B. White.
Concordia Writing Assistance

Pre-prints

All papers under review must be submitted as pre-prints to arXiv or bioRxiv, unless otherwise mentioned. A pre-print is a version of a paper that is posted to a repository and can be accessible to readers before its publication in a peer-reviewed journal or conference. There are well-known pre-print databases such as arXiv.org (for Computer Science, Engineering and many other scientific fields), and bioRxiv (for Biology researches). Pre-prints are important because they are:

Free for both readers and authors.
Accessible to everyone while it is on the process of reviewing by a journal which mostly takes several months.
Immediately citable.
Safely archived and gets a date stamped.

To get familiar with the procedure of submitting a paper to arXiv you might find this YouTube video useful. Please note that submitting paper as a PDFLaTeX wrapper, using pdfpages, is not acceptable and it will end up to Incomplete status after a long period of waiting for getting the permanent identifier code. Instead, create an archive containing your TeX source file with all the necessary files for generating the PDF format of your paper, and upload this archive to arXiv.

When you submit a paper, make sure to link the GitHub repository for the project if relevant.

Example of
linking an arXiv paper to a GitHib repository.

After receiving the permanent arXiv identifier (e.g.: 1809.10139) by email, please update the lab website (Pre-prints/submitted papers section under the publications tab) with the arXiv number.

Experimentation

Most of your papers will be based on experiments conducted with your developed software. Be meticulous and patient, it takes time to get a good experimental setup. Make yours this quote by David Donoho et al (2009):

the scientific method's central motivation is the ubiquity of error -

the awareness that mistakes and self-delusion can creep in absolutely anywhere

and that the scientist' effort is primarily expended in recognizing and rooting out error.

In other words, think of all possible causes that might corrupt your results: background tasks running on computers, software bugs, data corruption, etc

Presentation tips

General tips to prepare slides for a presentation:

Prepare a slide-by-slide outline of the presentation before doing the slides.
Prepare 1 slide per minute, including title and transition slides.
Use citations whenever relevant. In format [author et al, year], not [1]. Don't show a slide containing a list of references, this is useless.
Add figures wherever you can, they are usually way clearer than text.
Make sure that all figures have a caption.
Bullet points shouldn't span multiple lines.
Don't use more than 2 levels of bullet points.
Don't use more than 3 level-1 bullet points per slide.
Don't use more than 3 level-2 bullet points per level-1 bullet point.
Start every bullet point with a capital.
Make sure your slides have numbers.
On your first slide, add date, affiliation, logo, venue, etc
If you are presenting a paper (reading club) add title, authors, year, and publication venue of the paper on the first slide.

General tips to prepare a poster for a presentation:

Free printing service for all faculty, staff, and students in the Gina Cody School of Engineering and Computer Science.
Verify the conference instructions for poster size. If not mentioned, the most common poster size is 48"x36" (lxh).
Keep it simple and easy to read; i.e. stick to bullet point.
Include authors under the title

Lab culture

Core values

The lab is committed to the following values:

High quality is preferable to high quantity.
Technical quality is a requirement to scientific quality.
Openness leads to better content.

The target lab culture is to promote frequent informal interactions, personal freedom, academic integrity, gender equality, cultural diversity and ... having fun doing research!

Communication and interactions

Never hesitate to ask a question to anyone.
Register to Slack (in the future me might use Mattermost instead).
Share information with others in the lab. It includes ideas, code snippets, technical tips, etc Your co-workers are not your competitors, you are on the same side.
Communicate regularly with Tristan. On Slack, by email or by requesting a meeting whenever required. Don't let any issue block your work or bother you for too long without talking about it.
Attend hackathons, in particular those organized by BrainHack in Montreal. Use hackathons to demonstrate your project, collect feedback on it, and stay up-to-date on technology.

Code of conduct

This section is largely copied from Whitaker's lab Code of Conduct.

Harassment by and/or of members of our community in any form will not be tolerated. Harassment includes offensive verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of discussions, inappropriate physical contact, and unwelcome sexual attention.
Work hours: The hours that members of the lab choose to work is up to them. We are each welcome to send work-related emails, pull requests or Slack messages over the weekend or late at night, but no lab members are required to reply to them outside of their typical work hours. Lab members are welcome to work flexibly for any reason. Ideally, all lab members will have at least a few hours each week to overlap with Tristan in order to stay in touch, but it is the policy of the lab that every member is already self-motivated and doesn't need to work a traditional 9 to 5 day in order to meet their goals.
If you experience any challenges of any kind related those topics, please contact Tristan. All communication will be treated as confidential.

Academic integrity

Sharing data, code and text through Git repositories hosted on GitHub is a good way to protect us against scientific misconduct.
Reusing text or code from others' work is fine (even encouraged) as long as the source is properly credited. Omitting to cite the source is plagiarism.
Data fabrication or falsification is evil. Don't even think about it. If your data looks strange, don't delete or omit it. Repeat the experiment and try to understand what is going on, you will learn more. If your graph is missing a point or two and the submission deadline is coming too soon, let the graph be incomplete. You will feel better and it will improve the paper. There is no such thing as a good or a bad result, there are just results.