I’m here as a science-fraud and lurker, and I do not know what a PI is in this context.
after the Great Science Cockup of '89 all scientists are overseen by a noir-style private investigator, the whiskey and cigar costs are high but they have solved ~8 murders per year since
https://en.wikipedia.org/wiki/Principal_investigator
In many countries, the term principal investigator (PI) refers to the holder of an independent grant and the lead researcher for the grant project, usually in the sciences, such as a laboratory study or a clinical trial. The phrase is also often used as a synonym for “head of the laboratory” or “research group leader”. While the expression is common in the sciences, it is used widely for the person or persons who make final decisions and supervise funding and expenditures on a given research project.
PI = principal investigator, or the leader of the lab. Also known as the professor. The closest comparison to regular jobs would be that the PI is the manager. They typically no longer do actual lab work and typically fulfill a role that is entirely managerial, so they’re often removed from and therefore entirely oblivious to goings-on of the lab. It’s a somewhat common occurrence for lab members to have a concern that the PI dismisses because they’re unaware of the severity of the concern, or for the PI to have a concern that lab members have already addressed
The PI’s I deal with on Research Vessels almost always get their hands dirty. Both in the lab and on deck, so managing is more of an added responsibility, as opposed to a managerial shift to desk bound life.
Maybe not true for all areas.
this is literally my life
months of ‘the inputs are shit, this will be a problem’ with either no reply or being brushed off
months later after the problem becomes a dumpster fire ‘we need this data yesterday and it looks real bad’
powerful
No data is better than bad data.
Nah, bad data is valuable to science because you learned what you did had a confounding variable.
Maybe not exactly what it was, but investigating and identifying it is literally the scientific method.
Even early psych experiments like Standford Prison, completely useless data, but why it was useless lead to modern rules about experiments to control for all the ways Zimbardo fucked up.
But when talking about funding and/or employment…
A good scientist can defend their position with any data. The complete absence of any data would be when you’re fucked.
It’s not even just in science, metrics are a thing almost everywhere, and that’s just statistical analysis done by people who never heard those two words put together. It’s trivial to exploit the metrics to make things look better, but what’s better is to explain why the problem is the metrics.
If you’re less than ethical you could do that even if you’ve not been doing your job.
You’re talking about data that doesn’t back the initial hypothesis. That isn’t bad data in this context, and you’re correct that it is still valuable for reforming hypotheses and re-running the experiment.
Bad data in this context is referring to data quality - things like inconsistent collection, inadequate/missing data, free text vs controlled input, etc. In those cases the data can become almost useless (and this is usually known by the people working on a project but not necessarily by their management). This causes pressure to turn shit into gold when that just isn’t possible.
Imagine that your boss wanted you to predict what the temperature will be next Tuesday. In order to do this, your company has provided you the temperature from every Tuesday for the past 12 years. If that wasn’t bad enough, at first they recorded the date in DDMMYY format but 10 years ago they switched to MMDDYY. However, some records were still collected in the legacy DDMMYY format due to lack of training in the temperature collection department, and there is no way to distinguish the correct date. Also, one employee who was close to retirement only collected the temperature as “Hot” or “Cold” because that is how he was trained to do it when he was first hired 50 years ago and he never bothered to learn the new system. Now, you can probably build a model that tracks weekly temperature over time and approximates the next Tuesday’s temperature based on something like seasonality, the historical average, and the most recent Tuesday. But you’ll know that it’s not the best estimate, you’ll know there is way better data out there, and you’ll probably be able to make a simpler, more accurate estimate just by averaging the temperature from Saturday/Sunday/Monday.
That’s bad data.
This guy datas.
Sorry, I maintain that processing data that is full of (known) systematic problems or data that is known to be insufficiently sensitive to detect the goal is a drain on limited resources.
You’re exactly right.
because you learned
here’s the thing …
I actually published two papers for my PhD inno time using garbage data collected 3 years before, lol.