Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

Recent questions tagged pyspark

0 votes
943 views
1 answer
    My question is triggered by the use case of calculating the differences between consecutive rows in a spark ... this can cause serious performance degradation. Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.3k views
1 answer
    I've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more ... want these new columns to be named as well. Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
855 views
1 answer
    I Started getting the following error anytime I try to collect my rdd's. It happened after I installed Java 10.1 So of ... 'new' is not defined >>> sc.stop() Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
821 views
1 answer
    I'm trying to use Spark dataframes instead of RDDs since they appear to be more high-level than RDDs and tend to ... and I should just go back to using RDDs. Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.1k views
1 answer
    Consider the following DataFrame: #+------+---+ #|letter|rpt| #+------+---+ #| X| 3| ... a way to replicate this behavior using the spark DataFrame functions? Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.3k views
1 answer
    Is there an equivalent of Pandas Melt Function in Apache Spark in PySpark or at least in Scala? I was ... Spark for the entire dataset. Thanks in advance. Question&Answers:os...
asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
810 views
1 answer
    When I login to my edge node and run the below command, my application is submitted successfully and completes ... -to-run-spark-submit-on-remote-server-though-shell-action...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
963 views
1 answer
    My Scenario I have a spark data frame in a AWS glue job with 4 million records I need to write it as a ... questions/65832736/writing-large-spark-data-frame-as-parquet-to-s3-bucket...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.5k views
1 answer
    I am trying to increase the heartbeat interval parameter in pyspark configuration but keep getting this error. Is there any ... -must-be-no-less-than-the-value-of-spark-execu...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
956 views
1 answer
    I'm trying to capture the string representation generated by the show() function as suggested here ... dataframe-show-string-representation-fails-with-showstringinteger-boolean-boo...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
944 views
1 answer
    Closed. This question needs details or clarity. It is not currently accepting answers. question from:https://stackoverflow.com/questions/65841356/how-to-pair-rows-with-the-same-id...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
866 views
1 answer
    I am running a spark standalone cluster. My os is centos7 on master as well as on worker. Have set ... https://stackoverflow.com/questions/65842650/spark-worker-just-cannot-connect...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
885 views
1 answer
    I need to encode parquet files which are produced by my pyspark script, so that the encoding is ... .com/questions/65844890/spark-parquet-compression-and-encoding-schemes...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
970 views
1 answer
    I am new to coding and would like to know where "0" holding the database name in {0} is supposed to be in ... -in-a-from-clause-in-sql-query-like-in-python-string-formating...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
757 views
1 answer
    I have a dataset that is around 190GB that was partitioned into 1000 partitions. my EMR cluster allows a ... /65866586/optimizing-spark-resources-to-avoid-memory-and-space-usage...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
851 views
1 answer
    I am trying to split a column of total count into different ranges of columns using pyspark. I am ... stackoverflow.com/questions/65867294/spark-count-records-into-specified-ranges...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
920 views
1 answer
    I have a bit of a question around PySpark. After aggregating, I have really skewed data (some ... //stackoverflow.com/questions/65869200/repartitioning-skewed-dataframes-in-spark...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
817 views
1 answer
    My setup is simple, centos master, centos worker. In master spark-env.sh export STANDALONE_SPARK_MASTER_HOST= ... -initially-connecting-and-then-disconnecting-trying-to-reconnect...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
830 views
1 answer
    When I run the following command: spark-submit --name "My app" --master "local[*]" --py-files main ... questions/65873182/why-driver-memory-is-not-in-my-spark-context-configuration...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
904 views
1 answer
    I get the following failed error for some of my tasks when running my job. But the job finishes successfully on ... .com/questions/65889696/spark-exit-status-134-what-does-it-mean...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
955 views
1 answer
    I am new to snowflake. I'm writing a spark df to snowflake, using this code. var = dict(sfUrl=" ... ://stackoverflow.com/questions/65901227/from-spark-to-snowflake-data-types...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
801 views
1 answer
    Closed. This question needs to be more focused. It is not currently accepting answers. question from:https:// ... -it-appropriate-to-use-a-udf-vs-using-spark-functionality...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.1k views
1 answer
    Im using pyspark and I have a large data source that I want to repartition specifying the files size per partition ... /65912908/how-to-specify-file-size-using-repartition-in-spark...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
783 views
1 answer
    I have a Dataset below like: +----------------------------------+------------ ... ://stackoverflow.com/questions/65915468/how-to-perform-group-by-and-aggregate-operation-on-spark...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.4k views
1 answer
    When trying to run the following code: val1_index = df_playlists['pid'].isin(val1_playlist[0]) I received this ... /questions/65915669/why-cant-pandass-isin-work-with-numpy-int64...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
917 views
1 answer
    According to this question - --files option in pyspark not working the sc.addFiles option should work for accessing files ... way-to-access-file-contents-in-both-the-driver-and-ex...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
833 views
1 answer
    I have a list of data frames, on each location of a list, I have one dataframe I need to ... stackoverflow.com/questions/65923884/make-single-dataframe-from-list-of-dataframes...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
965 views
1 answer
    I am trying to run spark-shell command locally and I am getting below error java.net.BindException: ... stackoverflow.com/questions/65928852/spark-shell-command-failing-on-local...
asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)
To see more, click for the full list of questions or popular tags.
Ask a question:
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...