3rr4tt1c

joined 2 months ago
[–] 3rr4tt1c 2 points 2 weeks ago

If this is for a job interview, I’d err on the side of verbosity. Break it all into distinct, easy to read steps: load, process, output, logging, exception handling, comments, etc.

I turned it in days ago so there's nothing I can do about it. But I'll keep that in mind for the future.

[–] 3rr4tt1c 2 points 2 weeks ago (1 children)

Thanks, I set inplace=False like you suggested. I thought setting it to true meant that it modified the original dataframe. Why does it work with false?

[–] 3rr4tt1c 3 points 3 weeks ago (1 children)

I'm trying to do the job specs, but the net is just so wide. But yeah, it looks like personal projects is the way to go.

[–] 3rr4tt1c 2 points 3 weeks ago

Currently working on the responsive web development course. I like the guitar sounds. 😅

 
  • I was applying to a job, and then I had to answer a question about web scraping, which I'm not familiar with. I answered all the other questions with no issue, so I decided might as well put in the effort to learn the basics and see if I can do it in a day.
  • Yes, it was *somewhat * easier than I expected, but I still had to watch like 4 YouTube videos and read a bunch of reddit and stack overflow posts.
  • I got the code working, but I decided to run it again to double-check. It stopped working. Not sure why.
  • Testing is also annoying because the "web page" is a google doc and constantly reloads or something. It takes forever to get proper results from my print statements.
  • I attached an image with the question. I haven't heard back from them, and I've seen other people post what I think might be this exact question online, so hopefully I'm not doing anything illegal.
  • At this point, I just want to solve it. Here's the code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

def createDataframe(url): #Make the data easier to handle
    #Get the page's html data using BeautifulSoup
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')

    #Extract the table's headers and column structure
    table_headers = soup.find('tr', class_='c8')
    table_headers_titles = table_headers.find_all('td')
    headers = [header.text for header in table_headers_titles]

    #Extract the table's row data
    rows = soup.find_all('tr', class_='c4')
    row_data_outer = [row.find_all('td') for row in rows]
    row_data = [[cell.text.strip() for cell in row] for row in row_data_outer]

    #Create a dataframe using the extracted data
    df = pd.DataFrame(row_data, columns=headers)
    return df

def printMessage(dataframe): #Print the message gotten from the organised data
    #Drop rows that have missing coordinates
    dataframe = dataframe.dropna(subset=['x-coordinate', 'y-coordinate'], inplace=True)

    #Convert the coordinate columns to integers so they can be used
    dataframe['x-coordinate'] = dataframe['x-coordinate'].astype(int)
    dataframe['y-coordinate'] = dataframe['y-coordinate'].astype(int)

    #Determine how large the grid to be printed is
    max_x = int(dataframe['x-coordinate'].max())
    max_y = int(dataframe['y-coordinate'].max())

    #Create an empty grid
    grid = np.full((max_y + 1, max_x + 1), " ")

    #Fill the grid with the characters using coordinates as the indices
    for _, row in dataframe.iterrows():
        x = row['x-coordinate']
        y = row['y-coordinate']
        char = row['Character']
        grid[y][x] = char
    for row in grid:
        print("".join(row))

test = 'https://docs.google.com/document/d/e/2PACX-1vQGUck9HIFCyezsrBSnmENk5ieJuYwpt7YHYEzeNJkIb9OSDdx-ov2nRNReKQyey-cwJOoEKUhLmN9z/pub'
printMessage(createDataframe(test))

My most recent error:

C:\Users\User\PycharmProjects\dataAnnotationCodingQuestion\.venv\Scripts\python.exe C:\Users\User\PycharmProjects\dataAnnotationCodingQuestion\.venv\app.py 
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\dataAnnotationCodingQuestion\.venv\app.py", line 50, in <module>
    printMessage(createDataframe(test))
  File "C:\Users\User\PycharmProjects\dataAnnotationCodingQuestion\.venv\app.py", line 30, in printMessage
    dataframe['x-coordinate'] = dataframe['x-coordinate'].astype(int)
                                ~~~~~~~~~^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Process finished with exit code 1

 

I know this seems like a very obvious question. But I mean with regards to job searches. Even internships seem to require a variety of skills these days. I'm interested in both web development and just recently have considered data analysis. Should I work on tutorials and personal projects for a single skill or framework at a time? Or make small projects across a wide variety of things so I can put those skills on my resume?

[–] 3rr4tt1c 1 points 1 month ago (1 children)

I honestly never realised the terminology was that important (so I never put much effort into remembering any of it). Yes I meant logging. I'm having trouble understanding the rest of what you mean though.

20
submitted 1 month ago* (last edited 1 month ago) by 3rr4tt1c to c/no_stupid_questions
 

Like if I'm using print statements to test my code. Is it okay to leave stuff like that in there when "publishing" the app/program?

Edit: So I meant logging. Not "tests". Using console.log to see if the code is flowing properly. I'll study up on debugging. Also, based on what I managed to grasp from your very helpful comments, it is not okay to do, but the severity of how much of an issue it is depends on the context? Either that or it's completely avoidable in the first place if I just use "automated testing" or "loggers".

[–] 3rr4tt1c 2 points 1 month ago* (last edited 1 month ago) (1 children)

Sounds like they're still very relevant and very important. Python isn't a language I've used a lot but I'm still surprised I've never heard about docstrings till this tutorial. Thanks for the info.

24
submitted 1 month ago by 3rr4tt1c to c/python
 

Was going through a Python tutorial, but it seems kinda dated. Wanted to know if people regularly use docstrings in the workforce. Or are they somewhat irrelevant because you can convey most of that info via comments and context? Do jobs make you use them?