This presentation has considered data gathered by psychiatrists located in the United Kingdom. The data consists of 1885 observations of individuals all over the world, each of whom provided answers to questions regarding their lifelong drug use and other psychiatric components.

Variables in the original data set included age, gender, country of residence, education, scores on various mental health assessments, a score for an impulsivity assessment, and then ratings(0-6) for each drug category(0 being never used, and 6 being used yesterday).

Some limitations of this data set would stem from the fact that the data was collected by three pyschiatrists. For that reason, I did not attempt to find relationships between scores on mental health assessments and the corresponding drug use levels. I do believe that the people who participated in answering questions were somewhat more likely to have experienced mental health struggles than an average person, and that correlating illegal drug use with mental health struggles in this data set could be misleading.

Null Hypothesis: There is no difference in recent illegal drug use between people who have never used cannabis and people who have ever used cannabis.

The first step in preparing this presentation, was to clean the data. The data I loaded in and would need was categorical, and needed to be converted to binary values. First I converted the values, which where combinations of letters and one number, into the number that was in each value. I created a new feature which swapped the rating value for cannabis use(0 being never used, 6 being used yesterday), with either 0 or 1(0 being have not used, 1 being have used at some point). Then I created a feature which indicated if a person had used illegal drugs within the last month.

The statistical method used to test this hypothesis was the chi-square test. Other data presented was acheived by calculating the Pearson correlation between a select few variables.

The resulting p-value of the chi-square test was 7.645188100768056e-41.

This chart shows the distribution of cannabis use in the sample

cannabis_histogram.png

A couple of things I should mention before proceeding: The “Meth” category refers to methadone, a substance used to help treat heroin addiction. Unfortunately there was no data for methamphetamine use available to analyze here. “Pot_drugs” is the category that indicates ever having used cannabis. “Danger” indicates having used illegal drugs within the last month. “VSA” stands for a class of volatile substance abuse/consumption.

This chart shows the association of separate drug use and the use of heroin

final_heroin.png

This chart maps the association of separate drug use and the use of crack cocaine

final_crack.png

Shown below is a heatmap, which color-codes the percentage-wise correlation of variables from the orginal data set. I’ve included the scores from mental health assessments here in order to note if anything catches the eye.

Here are the scores for the mental health assessments and what they represent:

Nscore is neuroticism(a high Nscore would indicate that a person is very neurotic), Escore is extraversion, Oscore is openness to experience, Ascore is agreeableness, Cscore is conscientiousness, Impulsive is impulsiveness, and SS is sensation seeking(The Kaggle web page said ‘sensation seeing’, I believe it was supposed to say ‘seeking’).

final_heatmap.png

We can infer from the heatmap, that cannabis’ correlation to both crack cocaine and heroin is vaguely present in this sample, but it exists at a low 23% for each. The greatest correlation that cannabis did have to another drug, was to mushrooms, at 58%. Of all data in the heatmap, the greatest correlation found was between LSD and mushrooms, at 67%.

At the .05 significance level, based off of the information in this data set, we can reject the null hypothesis that there is no difference in recent illegal drug use between those who have ever used cannabis and those who have not, and conclude that there is a statistically significant difference in recent illegal drug use between those who have ever used cannabis and those who have not.

The features added to create this presentation allowed for easy grouping of people who had or had not ever used cannabis, as well as illegal drugs. The limitations faced in this research where related to mental health. While I was able to test the hypothesis with statistically correct procedure, one can never be certain as to what role mental health does or does not play in ANY decision making, and so it is safer in this situation, I think, to leave mental health unvisited as a factor leading to cannabis and illegal drug use.

New questions that have arrived for me, especially if I had access to a larger and completely random sample, would be to find out what correlations exist between age, gender, mental health and education, when correlated to variables like illegal drug use, nicotine use, cannabis use, and alcohol use.